CN112632614A - Preference perception track anonymization method and system - Google Patents

Preference perception track anonymization method and system Download PDF

Info

Publication number
CN112632614A
CN112632614A CN202011599257.5A CN202011599257A CN112632614A CN 112632614 A CN112632614 A CN 112632614A CN 202011599257 A CN202011599257 A CN 202011599257A CN 112632614 A CN112632614 A CN 112632614A
Authority
CN
China
Prior art keywords
user
track
sequence
privacy
location
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011599257.5A
Other languages
Chinese (zh)
Inventor
朱亮
蔡增玉
陈燕
张建伟
余丽萍
刘啸威
张卓
冯媛
王景超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University of Light Industry
Original Assignee
Zhengzhou University of Light Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University of Light Industry filed Critical Zhengzhou University of Light Industry
Priority to CN202011599257.5A priority Critical patent/CN112632614A/en
Publication of CN112632614A publication Critical patent/CN112632614A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2111Location-sensitive, e.g. geographical location, GPS

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Remote Sensing (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a preference perception track anonymization method and a preference perception track anonymization system, which are used for solving the problems of customizable and quantifiable privacy protection strength in position data privacy protection. The method comprises the following steps: firstly, acquiring movement track data of a user, and generalizing original points contained in the movement track data of the user by using a position space anonymization method to obtain a position sequence of the user; secondly, acquiring the familiarity of the user with the semantic types and the popularity of each position in the semantic types; then setting a familiarity threshold value and a popularity threshold value to obtain privacy classification of the position, and adopting different position anonymity methods to obtain an anonymity track sequence of the user; and finally, obtaining the track privacy degree of the user by calculating the information entropy of the anonymous track sequence. The invention provides customizable privacy protection for the user, and through analyzing the interest and preference characteristics of the user, the sensitive information of the user is hidden in a personalized manner, the usability of data is improved, and the quantification of the privacy protection intensity is realized.

Description

Preference perception track anonymization method and system
Technical Field
The invention relates to the technical field of network communication, in particular to a preference perception track anonymization method and a preference perception track anonymization system.
Background
Location-based social networks (LBSSNs) (such as Foursquare, faceBookplace, Twitter, roadside, and the like) are characterized in that online social networks and physical locations are combined by using check-in information of users to realize sharing and spreading of location-based service resources in the virtual world. In recent years, lbs ns have been unprecedentedly developed due to the wide application of smart mobile devices in which a large number of sensors are embedded. However, the privacy disclosure problem is an important issue that needs to be considered at present. The user issues the real position data to the LBNS server, and the untrusted third party may steal the position data of the user and make some illegal activities. For the user, only some incomplete GPS track data need to be issued to protect the privacy of the user. Nevertheless, the attacker still can deduce sensitive personal information (such as family address, work place or living habits and the like) of the victim by adopting a relevant technology of data analysis through the spatiotemporal relation between the geographic positions. Even, the attacker excavates the moving behavior pattern of the victim through the GPS track data, and predicts the position to be visited by the victim at the next moment, so that the personal safety of the user is seriously affected. Therefore, once a user finds that a certain privacy threat exists in the location social network, the user can no longer use the service provided by the location social network, and the credibility of the service is reduced.
The trace privacy protection is a new privacy protection form in the LBSs. Unlike the location privacy protection method, the track protection method aims to protect sensitive location information of a user from being leaked, and the sensitive information can reflect personalized interests or preferences of the user. The traditional track privacy protection mainly comprises: spurious data, spatial anonymity, and suppression techniques. The track privacy protection method based on the false data is characterized in that some wrong position data are added in original GPS track data, so that an attacker cannot acquire real position information of a user from uploaded track data; the track privacy protection method based on the space hiding is to generalize sensitive position data in original GPS track data so as to reduce the probability of attackers obtaining real position information; a track privacy protection method based on a suppression technology is used for prohibiting the release of some sensitive position data in GPS track data, so that the personal privacy of a user is protected.
Therefore, the prior art has at least the following disadvantages:
firstly, under the actual condition, only considering position anonymity can not effectively realize track privacy protection, and an attacker can use relevant technologies such as association attack, data analysis and the like to reason out sensitive information of a user; secondly, the track anonymization method does not consider the preference and background knowledge of the user, so that the loss of effective data is caused, and the user cannot enjoy personalized service experience; thirdly, different track privacy protection methods cannot be adopted in a self-adaptive manner according to different privacy risk degrees, and the service accuracy is reduced.
Disclosure of Invention
Aiming at the defects in the background technology, the invention provides a preference perception track anonymization method and a preference perception track anonymization system, and solves the technical problem that the privacy protection service accuracy is low due to the fact that different privacy protection methods cannot be customized according to privacy protection strength in the existing position data privacy protection.
The technical scheme of the invention is realized as follows:
a preference-aware track anonymization method comprises the following steps:
s1, obtaining semantic information and movement track data of the position accessed by the user;
s2, sequentially carrying out stay region generalization and position region generalization on original points contained in the movement track data of the user by using a position space anonymization method to obtain a position sequence of the user;
s3, acquiring familiarity of the user with semantic types and popularity of each position in the semantic types by analyzing the position sequence of the user;
s4, setting a user familiarity threshold value and a position popularity threshold value, obtaining privacy classification of the position of the user according to the familiarity of the user to semantic types and the relationship between the popularity of each position in the semantic types and the user familiarity threshold value and the position popularity threshold value, and obtaining an anonymous track sequence of the user by adopting different position anonymity methods according to the privacy classification of the position of the user;
and S5, acquiring the track privacy degree of the user by calculating the information entropy of the anonymous track sequence of the user.
The method for sequentially carrying out dwell region generalization and position region generalization on the original points contained in the movement track data of the user by using the position space anonymity method comprises the following steps:
s21, the trusted third party extracts the stop points from the moving track data of the user, the stop points reflect the moving behavior of the user, and one stop point can be expressed as:
Figure BDA0002870620710000021
Figure BDA0002870620710000022
wherein the content of the first and second substances,
Figure BDA0002870620710000023
denotes the i-th dwell-point anonymous region, px(lon) represents origin pxLongitudinal coordinate of (a), px(lat) indicates an origin point pxLatitude coordinate of (S)i(lon) denotes the stopping point SiLongitude coordinate of (1), Si(lat) denotes the dwell point SiM represents the starting point of the user moving track in the anonymous area of the stop point, n represents the end point of the user moving track in the anonymous area of the stop point, x represents the serial number of the original point in the moving track, and i represents the serial number of the stop point in the moving track;
s22, reconstructing a generalized dwell point sequence Tra _ S by connecting the extracted dwell points according to the sequence of the original points in the movement track data of the user: tra _ S ═ S1→S2→…→SnWherein S isnAn nth dwell point representing a user;
s23, the trusted third party extracts positions from the generalized dwell point sequence, and the positions reflect the personalized behaviors and preferences of the user, wherein one position can be expressed as:
Figure BDA0002870620710000031
Figure BDA0002870620710000032
wherein the content of the first and second substances,
Figure BDA0002870620710000033
denotes the j-th location-anonymous region, Lj(lon) represents the position LjLongitude coordinate of (1), Lj(lat) represents the position LjThe latitude coordinate of (a) is determined,
Figure BDA0002870620710000034
indicating anonymous location
Figure BDA0002870620710000035
Set of medium dwell points, j ═ 1,2, …, n;
s24, reconstructing a generalized position by connecting the extracted positions according to the sequence of the stop points in the stop point sequenceSequence Tra _ L: tra _ L ═ L1→L2→…→LnWherein L isnRepresenting the nth position of the user.
The method for acquiring the familiarity of the user with the semantic types and the popularity of each position in the semantic types by analyzing the position sequence of the user comprises the following steps:
s31, calculating the geographic similarity between the two positions of the user by using a Gaussian formula:
Figure BDA0002870620710000036
wherein, Simgeo(Li',Lj') Indicates the position Li'And position Lj'Geographical similarity between them, D (L)i',Lj') Indicates the position Li'And position Lj'The euclidean distance between i '═ 1,2, …, n, j' ═ 1,2, …, n;
s32, let His (u)k)={L1,L2,…,LnDenotes the sequence of positions of users, user ukVisited location Li'Then, user u is calculatedkAccessing location Lj'Probability of (c):
Figure BDA0002870620710000037
wherein, Pgeo(Lj'|Li',uk) Representing user ukVisited location Li'Followed by location Lj'A represents a weight value, 0. ltoreq. a.ltoreq.1, LkRepresenting user ukThe visited historical location of;
s33, constructing a position transition probability matrix according to the formula in the step S32
Figure BDA0002870620710000038
Wherein the content of the first and second substances,
Figure BDA0002870620710000039
Figure BDA00028706207100000310
indicating user slave position Li'Transferred to the position Lj'The probability of (d);
s34, according to the user slave position Li'Transferred to the position Lj'Probability of (2)
Figure BDA00028706207100000311
Calculating user familiarity with semantic types
Figure BDA00028706207100000312
And popularity of each location in semantic types
Figure BDA00028706207100000313
Figure BDA00028706207100000314
Figure BDA0002870620710000041
Wherein the content of the first and second substances,
Figure BDA0002870620710000042
indicating the popularity of the n-1 th round of calculation,
Figure BDA0002870620710000043
indicating the familiarity of the n-1 th round of calculation, C indicating the semantic type of the location, and u indicating the user.
The method for obtaining the anonymous track sequence of the user comprises the following steps:
let λ represent the user familiarity threshold, τ represent the location popularity threshold;
when the familiarity of the user to the semantic type is less than lambda and the popularity of each position in the semantic type is greater than or equal to tau, the privacy classification of the position of the user is classified into an unfamiliar and popular class, and a trusted third party does not need to carry out privacy protection on the position of the user;
when the familiarity of a user to the semantic type is less than lambda and the popularity of each position in the semantic type is less than tau, the privacy classification of the position of the user belongs to the unfamiliar and non-popular classes, and a trusted third party needs to adopt a fake data method to protect the sensitive position anonymous space of the user;
when the familiarity of a user to a semantic type is greater than or equal to lambda and the popularity of each position in the semantic type is greater than or equal to tau, the privacy classification of the position of the user belongs to the familiar and popular classes, and a trusted third party needs to adopt a space hiding method to protect the sensitive position anonymous space of the user;
when the familiarity of the user with the semantic type is greater than or equal to lambda and the popularity of each position in the semantic type is less than tau, the privacy classification of the position of the user belongs to the familiar and unpopular classes, and a trusted third party needs to adopt suppression technology to forbid the anonymous space of the position of the user from being published to a position social network server so as to protect the personal privacy of the user;
according to the four privacy classifications, a trusted third party selects different position anonymization methods in a self-adaptive mode, and finally an anonymization track sequence of the user is generated.
The method for acquiring the track privacy degree of the user by calculating the information entropy of the anonymous track sequence of the user comprises the following steps:
calculating the information entropy H of the anonymous track sequence of the user in the (t, t +1) time interval(t,t+1)
Figure BDA0002870620710000044
Wherein p isi"is the probability that the user visited location i at time t + 1", p0I ″, which is the probability that the user remains at the location at time t +1, at time t, …, k';
calculating the probability of the user accessing all candidate positions at the moment of t +1 when the probability of the user accessing all candidate positions is the sameMaximum information entropy MaxH of anonymous track sequence in (t, t +1) time interval(t,t+1)
Figure BDA0002870620710000045
Entropy of information H(t,t+1)And maximum information entropy MaxH(t,t+1)Is taken as the track privacy degree H of the user
Figure BDA0002870620710000051
Thus, the track privacy degree HThe greater the value of (c), the greater the strength of the track privacy protection.
A track anonymization system adopted by a preference perception track anonymization method comprises a position space generating module, a semantic description module, a behavior pattern extracting module, a privacy risk rating module and a track anonymization module; the privacy risk rating module is connected with the track anonymity module; the position space generation module is connected with the semantic description module, the semantic description module is connected with the behavior pattern extraction module, and the behavior pattern extraction module is connected with the track anonymization module; the track anonymization module converts the original track sequence into an anonymity track sequence, and the privacy risk rating module adjusts the original track sequence according to the anonymity track sequence.
The position space generation module is used for clustering original points in the historical data set into positions so as to construct a position space set of the user;
the semantic description module is used for converting the geographical position information of the user into semantic position information;
the behavior pattern extraction module is used for mining the moving behavior habits and the motion patterns of the user;
the privacy risk rating module is used for dividing different privacy risk ratings according to behavior preference and familiarity of the user;
and the track anonymization module is used for adopting a position anonymization method of response in a self-adaptive manner according to different privacy risk grades so as to construct an anonymized track sequence.
Compared with the prior art, the invention has the following beneficial effects: according to the method, a sensitive position attack model is built, a position space anonymization method is utilized, original points are generalized into position areas, semantic description is carried out on the position areas, the familiarity of a user on semantic types and the popularity of positions in the semantic types are calculated, different privacy risk ratings are divided, and different position anonymization methods are adopted in a self-adaptive mode according to different privacy risk degrees corresponding to the positions in a user track; the method provides customizable privacy protection for the user, personally hides the sensitive information of the user by analyzing the interest and preference characteristics of the user, improves the usability of data, and simultaneously divides four privacy risk ratings according to the familiarity of the user and the popularity of the position, realizes the quantifiability of the privacy protection strength, and provides a beneficial solution idea for the anonymization of personalized tracks in the future position data release.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic diagram of the present invention of a sensitive location attack;
FIG. 3 is a schematic diagram of the trace generalization process of the present invention;
fig. 4 is a schematic diagram of the system architecture of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
Embodiment 1, as shown in fig. 1, a preference-aware trajectory anonymization method includes the following specific steps:
s1, obtaining semantic information and movement track data of the position accessed by the user;
assuming that the attacker can obtain some a priori knowledge of the victim, including: semantic information of the location visited by the victim and a sequence of movement trajectories arranged in time. From the semantic description of each location, the sequence of semantic trajectories of the victim can be represented as: c1→C2→...→Cn. Therefore, according to the semantic track sequence, an attacker can analyze the individual interest or preference of the victim by using a frequent subsequence mining algorithm. For example: if the attacker finds that the frequent subsequence in the semantic track sequence of victim a is "school → stadium → restaurant", when victim a has visited the "school → stadium" position sequence, the attacker can reason out with a high probability that the position that victim a will visit at the next moment is "restaurant".
Fig. 2 is a schematic diagram of a sensitive location attack. As can be seen from fig. 2, user a, user B, and user C each have a respective movement pattern. Wherein, the moving mode of the user A is the same as that of the user B, namely: school → gymnasium → restaurant. Suppose that for user a, "restaurant" is his sensitive location information and does not want others to know it. However, if the attacker knows the movement pattern of the user B and knows that the user a and the user B have similarities, when the user a visits "school → gymnasium", the attacker will reason out with a high probability that the user a will visit "restaurant", thereby revealing the personal privacy of the user a.
S2, sequentially carrying out stay region generalization and position region generalization on original points contained in the movement track data of the user by using a position space anonymization method to obtain a position sequence of the user; and (3) generalizing the original points into a position area by using a position space anonymity method, and performing semantic description on the position area.
Definition 1: each location anonymous space
Figure BDA0002870620710000061
From a doublet<k0,l0>Is formed of (i) k0Indicating the number of positions, l, contained in the anonymous space0Indicating the strength of privacy protection.
In the location anonymous space, the trusted third party realizes privacy protection with different strengths by adjusting the size of the anonymous space. And, trusted third parties can measure their privacy-preserving data utility using anonymous space, namely: the amount of information lost during the track anonymization phase.
Hiding the original point of the user through two generalization processes, comprising: a dwell region generalization and a location region generalization. As shown in fig. 3, which is a schematic diagram of track generalization processing, as can be seen from fig. 3, the trusted third party reconstructs the original point to the stop point, then reconstructs the stop point to the position, and finally connects the positions in time sequence to generate a generalized track sequence.
S21, regarding the generalization of the stay area, the trusted third party extracts stay points from the moving track data of the user, and the movement behavior of the user is reflected through the stay points, wherein one stay point can be expressed as:
Figure BDA0002870620710000071
wherein the content of the first and second substances,
Figure BDA0002870620710000072
representing the dwell point anonymous region, i.e.: set of origin points, px(lon) represents origin pxLongitudinal coordinate of (a), px(lat) indicates an origin point pxLatitude coordinate of (S)i(lon) denotes the stopping point SiLongitude coordinate of (1), Si(lat) denotes the dwell point SiM denotes the start of the user's movement trajectory in the region where the dwell point is anonymousAnd n represents the end point of the user moving track in the anonymous area of the stop point, x represents the serial number of the original point in the moving track, and i represents the serial number of the stop point in the moving track.
S22, reconstructing a generalized dwell point sequence Tra _ S by connecting the extracted dwell points according to the sequence of the original points in the movement track data of the user: tra _ S ═ S1→S2→…→SnWherein S isnAn nth dwell point representing a user;
s23, for the generalization of the position area, the trusted third party extracts the position from the generalized stop point sequence, and reflects the personalized behavior and preference of the user through the position, wherein one position can be expressed as:
Figure BDA0002870620710000073
wherein the content of the first and second substances,
Figure BDA0002870620710000074
denotes the j-th location-anonymous region, Lj(lon) represents the position LjLongitude coordinate of (1), Lj(lat) represents the position LjThe latitude coordinate of (a) is determined,
Figure BDA0002870620710000075
indicating anonymous location
Figure BDA0002870620710000076
J is 1,2, …, n.
S24, reconstructing a generalized position sequence Tra _ L by connecting the extracted positions according to the sequence of the stop points in the stop point sequence: tra _ L ═ L1→L2→…→LnWherein L isnRepresenting the nth position of the user.
According to the two generalization processes, even if an attacker acquires the position sequence of the user, the attacker cannot deduce the original point information of the user. However, through the position sequence, the attacker can use the background knowledge information to mine the frequent movement patterns of the user, so as to deduce the daily activity law or behavior preference of the victim. Therefore, in the track privacy protection, the leakage problem of the frequent movement pattern of the user needs to be considered.
S3, acquiring familiarity of the user with semantic types and popularity of each position in the semantic types by analyzing the position sequence of the user; the specific method comprises the following steps:
s31, the distance between the two locations in the geographic space can reflect the browsing behavior of the user. In general, two Li'And position Lj'The greater the distance between, the user has visited Li'Then followed by accessing Lj'The smaller the probability of (c). Thus, the correlation between the two positions decreases with increasing distance. Calculating the geographical similarity between two locations of the user using the gaussian formula:
Figure BDA0002870620710000081
wherein, Simgeo(Li',Lj') Indicates the position Li'And position Lj'Geographical similarity between them, D (L)i',Lj') Indicates the position Li'And position Lj'The euclidean distance between i '1, 2, …, and n, j' 1,2, …, n.
The movement patterns and preferences of the user can be mined by analyzing the historical location sequence of the user. Therefore, on the premise of acquiring the user historical position sequence, the user ukAfter the current position is visited, the position to be visited at the next time can be predicted.
S32, let His (u)k)={L1,L2,…,LnDenotes the sequence of positions of users, user ukVisited location Li'Then, user u is calculatedkAccessing location Lj'Probability of (c):
Figure BDA0002870620710000082
wherein, Pgeo(Lj'|Li',uk) Representing user ukVisited location Li'Followed by location Lj'A represents a weight value, 0. ltoreq. a.ltoreq.1, LkRepresenting user ukThe visited historical location.
In the semantic space, semantic description needs to be carried out on the constructed location anonymous space so as to mine interest or preference information of the user. And marking semantic information of each position anonymous space. First, each semantic type i is in a region where the dwell point is anonymous
Figure BDA0002870620710000088
The weight value within may be calculated as:
Figure BDA0002870620710000083
wherein N represents the total number of interest points in the anonymous region of the stop point, and NiIndicating the number of interest points of type k in the anonymous area,
Figure BDA0002870620710000084
indicating the number of anonymous regions of the stop point,
Figure BDA0002870620710000085
indicating the number of dwell anonymized regions where there is a point of interest of type k.
Thus, the feature vector for each dwell point anonymous region may be represented as fs=<w1,w2,...,wn>. In a position anonymous area consisting of all the dwell points, utilizing the number of nonzero weights of the position anonymous area to each semantic type, wherein each semantic type i is in the position anonymous area
Figure BDA0002870620710000086
The weight value within may be calculated as:
Figure BDA0002870620710000087
after normalization processing is performed on the weighted values, the following results can be obtained:
Figure BDA0002870620710000091
thus, the feature vector for each location-anonymous region may be represented as fL=<W1,W2,…,Wk>。
The semantic description method provided by the invention selects the feature vector f of the position anonymous areaLW with the largest medium weight valueiAs its semantic information, and thus corresponds to a position sequence in the position anonymity space, its semantic sequence can be expressed as: tra _ C ═ C1→C2→…→Cn
S33, constructing a position transition probability matrix according to the formula in the step S32
Figure BDA0002870620710000092
Wherein the content of the first and second substances,
Figure BDA0002870620710000093
Figure BDA0002870620710000094
indicating user slave position Li'Transferred to the position Lj'The probability of (d); meanwhile, according to the semantic information calculated by the formula (7), each position anonymous area has certain semantic description.
S34, the number of the central nodes represents the familiarity of the user, so that the familiarity of the user to semantic types can be calculated through the sum of the values of the authority nodes, and the user slave position L is determined according to the familiarity of the useri'Transferred to the position Lj'Probability of (2)
Figure BDA0002870620710000095
Calculating user familiarity with semantic types
Figure BDA0002870620710000096
Figure BDA0002870620710000097
The number of authoritative nodes represents the popularity of the location, so that the popularity of the location in the semantic type can be calculated through the sum of the values of the central nodes, and the popularity of the location in the semantic type can be calculated according to the user slave location Li'Transferred to the position Lj'Probability of (2)
Figure BDA0002870620710000098
Calculating the popularity of each location in semantic types
Figure BDA0002870620710000099
Figure BDA00028706207100000910
Wherein the content of the first and second substances,
Figure BDA00028706207100000911
indicating the popularity of the n-1 th round of calculation,
Figure BDA00028706207100000912
indicating the familiarity of the n-1 th round of calculation, C indicating the semantic type of the location, and u indicating the user.
By means of an iterative method, it is possible to,
Figure BDA00028706207100000913
and
Figure BDA00028706207100000914
can be calculated as:
Figure BDA00028706207100000915
Figure BDA00028706207100000916
where n represents the number of iterations.
Initialization
Figure BDA00028706207100000917
Until the iterative process terminates when equation (11) is satisfied.
Figure BDA00028706207100000918
Through the process, the preference model of the user to the location anonymous space is constructed.
S4, setting a user familiarity threshold value and a position popularity threshold value, obtaining privacy classification of the position of the user according to the familiarity of the user to semantic types and the relationship between the popularity of each position in the semantic types and the user familiarity threshold value and the position popularity threshold value, and obtaining an anonymous track sequence of the user by adopting different position anonymity methods according to the privacy classification of the position of the user;
the method for obtaining the anonymous track sequence of the user comprises the following steps:
let λ represent the user familiarity threshold, τ represent the location popularity threshold;
(1) familiarity and Prevalence (NFP)
When the familiarity of the user to the semantic type is less than lambda and the popularity of each position in the semantic type is greater than or equal to tau, the privacy classification of the position of the user is classified into an unfamiliar and popular class, the class refers to the fact that the user is not an expert of the semantic type to which the position anonymous space belongs, and if an attacker acquires the position anonymous space, preference information of the user cannot be deduced. Moreover, because the popularity of the location is high, the location anonymous area is referred to by a plurality of users, and therefore, the trusted third party does not need to protect the privacy of the location of the user.
(2) Familiarity and non-prevalence (NFNP)
When the familiarity of the user to the semantic type is less than lambda and the popularity of each position in the semantic type is less than tau, the privacy classification of the position of the user belongs to a non-familiar and non-popular class, the class refers to that the user is not an expert of the semantic type to which the position anonymous space belongs, but the position popularity is low, and an attacker can deduce the user identity information of accessing the position anonymous space through background knowledge information, so that a trusted third party needs to adopt a false data method to protect the sensitive position anonymous space of the user.
(3) Familiar and Popular (FP)
When the familiarity of a user with a semantic type is greater than or equal to lambda and the popularity of each position in the semantic type is greater than or equal to tau, the privacy classification of the position of the user belongs to a familiar and popular class, the class refers to that the user is an expert of the semantic type to which the position anonymous space belongs, and an attacker can deduce the preference information of the user according to the position anonymous space accessed by the user, so that a trusted third party needs to adopt a space hiding method to protect the sensitive position anonymous space of the user;
(4) familiar and non-popular (FNP)
When the familiarity of a user with a semantic type is greater than or equal to lambda and the popularity of each location in the semantic type is less than tau, the privacy classification of the location of the user belongs to a familiar and non-popular class, in the class, since the location anonymous space has high user familiarity to the user and low location popularity in the semantic type thereof, an attacker can not only deduce the preference information of the user from the location anonymous space accessed by the user, but also can identify the identity information of the user, and therefore, a trusted third party needs to adopt a suppression technology to prohibit the location anonymous space of the user from being published to a location social network server so as to protect the personal privacy of the user.
According to the four privacy classifications, a trusted third party selects different position anonymization methods in a self-adaptive mode, and finally an anonymization track sequence of the user is generated.
And S5, acquiring the track privacy degree of the user by calculating the information entropy of the anonymous track sequence of the user.
Definition of reference information entropy, for a set of probability distributions p1,p2,...,pnThe information entropy can be calculated as:
H=-∑pilog2pi (13)
assuming that the position visited by the user at the time t +1 is a sensitive position, k-1 candidate positions are selected for the sensitive position at the time t +1 by the proposed privacy protection algorithm. We define the probability that a user visits one of the k locations at time t +1 as p1,p2,...,pkAnd the probability that the user remains at the position at time t +1 is p0. Thus, the information entropy H of the anonymous track sequence of the user in the (t, t +1) time interval is calculated(t,t+1)
Figure BDA0002870620710000111
Wherein p isi"is the probability that the user visited location i at time t + 1", p0I ″, which is the probability that the user remains at the position at time t +1, at time t, …, k'.
According to the characteristics of entropy, when the probability that the user visits all candidate positions at the moment of t +1 is the same, calculating the maximum information entropy MaxH of the anonymous track sequence of the user in the (t, t +1) time interval(t,t+1)
Figure BDA0002870620710000112
Entropy of information H(t,t+1)And maximum information entropy MaxH(t,t+1)Is taken as the track privacy degree H of the user
Figure BDA0002870620710000113
Thus, the track privacy degree HThe greater the value of (c), the greater the strength of the track privacy protection.
The preference perception track anonymity method can self-adaptively customize the personalized privacy protection method for the user according to the preference of the user to the position, so that the personal privacy of the user is prevented from being leaked, and the usability of track data is improved.
Embodiment 2, as shown in fig. 4, a trajectory anonymization system adopted by a preference-aware trajectory anonymization method includes a location space generation module, a semantic description module, a behavior pattern extraction module, a privacy risk rating module, and a trajectory anonymization module; the privacy risk rating module is connected with the track anonymity module; the position space generation module is connected with the semantic description module, the semantic description module is connected with the behavior pattern extraction module, and the behavior pattern extraction module is connected with the track anonymization module.
The position space generation module is used for clustering original points in the historical data set into positions so as to construct a position space set of the user.
And the semantic description module is used for converting the geographical position information of the user into semantic position information.
And the behavior pattern extraction module is used for mining the mobile behavior habits and the motion patterns of the user.
And the privacy risk rating module is used for dividing different privacy risk ratings according to the behavior preference and familiarity of the user.
And the track anonymization module is used for adopting a position anonymization method of response in a self-adaptive manner according to different privacy risk grades so as to construct an anonymized track sequence.
In the preference-aware track anonymization system provided in embodiment 2, when performing track privacy protection, only the division of the above function modules is used for illustration, and in practical applications, the function distribution may be completed by different function modules according to needs, that is, the internal structure of the device is divided into different function modules, so as to complete all or part of the functions described above. In addition, the data transmission device provided in embodiment 2 and the data transmission method belong to the same concept, and the specific implementation process is described in embodiment 1.
In summary, in the embodiment of the present invention, a sensitive location attack model is constructed, a location space anonymization method is used to generalize an origin point into a location area, semantic description is performed on the location area, the familiarity of a user with a semantic type and the popularity of a location in the semantic type are calculated, different privacy risk ratings are divided, and different location anonymization methods are adaptively adopted according to different privacy risk degrees corresponding to locations in a user trajectory. The scheme provided by the embodiment of the invention provides customizable privacy protection for the user, and through analyzing the interest and preference characteristics of the user, the sensitive information of the user is hidden in a personalized manner, so that the usability of data is improved, and meanwhile, four privacy risk grades are divided according to the familiarity and the position popularity of the user, so that the quantification of the privacy protection strength is realized, and a beneficial solution thought is provided for the anonymization of a personalized track in the future position data release.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (7)

1. A preference-aware track anonymization method is characterized by comprising the following steps:
s1, obtaining semantic information and movement track data of the position accessed by the user;
s2, sequentially carrying out stay region generalization and position region generalization on original points contained in the movement track data of the user by using a position space anonymization method to obtain a position sequence of the user;
s3, acquiring familiarity of the user with semantic types and popularity of each position in the semantic types by analyzing the position sequence of the user;
s4, setting a user familiarity threshold value and a position popularity threshold value, obtaining privacy classification of the position of the user according to the familiarity of the user to semantic types and the relationship between the popularity of each position in the semantic types and the user familiarity threshold value and the position popularity threshold value, and obtaining an anonymous track sequence of the user by adopting different position anonymity methods according to the privacy classification of the position of the user;
and S5, acquiring the track privacy degree of the user by calculating the information entropy of the anonymous track sequence of the user.
2. The preference-aware track anonymization method according to claim 1, wherein the method for successively performing dwell region generalization and location region generalization on the original point included in the movement track data of the user by using the location space anonymization method comprises:
s21, the trusted third party extracts the stop points from the moving track data of the user, the stop points reflect the moving behavior of the user, and one stop point can be expressed as:
Figure FDA0002870620700000011
Figure FDA0002870620700000012
wherein the content of the first and second substances,
Figure FDA0002870620700000013
denotes the i-th dwell-point anonymous region, px(lon) represents origin pxLongitudinal coordinate of (a), px(lat) indicates an origin point pxLatitude coordinate of (S)i(lon) denotes the stopping point SiLongitude coordinate of (1), Si(lat) denotes the dwell point SiM represents the starting point of the user moving track in the anonymous area of the stop point, n represents the end point of the user moving track in the anonymous area of the stop point, x represents the serial number of the original point in the moving track, and i represents the serial number of the stop point in the moving track;
s22, reconstructing a generalized dwell point sequence Tra _ S by connecting the extracted dwell points according to the sequence of the original points in the movement track data of the user: tra _ S ═ S1→S2→…→SnWherein S isnAn nth dwell point representing a user;
s23, the trusted third party extracts positions from the generalized dwell point sequence, and the positions reflect the personalized behaviors and preferences of the user, wherein one position can be expressed as:
Figure FDA0002870620700000021
Figure FDA0002870620700000022
wherein the content of the first and second substances,
Figure FDA0002870620700000023
denotes the j-th location-anonymous region, Lj(lon) represents the position LjLongitude coordinate of (1), Lj(lat) represents the position LjThe latitude coordinate of (a) is determined,
Figure FDA0002870620700000024
indicating anonymous location
Figure FDA0002870620700000025
Set of medium dwell points, j ═ 1,2, …, n;
s24, connecting the extracted positions according to the sequence of the stop points in the stop point sequence, and repeatingConstructing a generalized position sequence Tra _ L: tra _ L ═ L1→L2→…→LnWherein L isnRepresenting the nth position of the user.
3. The preference-aware track anonymization method of claim 2, wherein the method for obtaining the familiarity of the user with the semantic type and the popularity of each location in the semantic type by analyzing the location sequence of the user is as follows:
s31, calculating the geographic similarity between the two positions of the user by using a Gaussian formula:
Figure FDA0002870620700000026
wherein, Simgeo(Li',Lj') Indicates the position Li'And position Lj'Geographical similarity between them, D (L)i',Lj') Indicates the position Li'And position Lj'The euclidean distance between i '═ 1,2, …, n, j' ═ 1,2, …, n;
s32, let His (u)k)={L1,L2,…,LnDenotes the sequence of positions of users, user ukVisited location Li'Then, user u is calculatedkAccessing location Lj'Probability of (c):
Figure FDA0002870620700000027
wherein, Pgeo(Lj'|Li',uk) Representing user ukVisited location Li'Followed by location Lj'A represents a weight value, 0. ltoreq. a.ltoreq.1, LkRepresenting user ukThe visited historical location of;
s33, constructing a position transition probability matrix according to the formula in the step S32
Figure FDA0002870620700000028
Wherein the content of the first and second substances,
Figure FDA0002870620700000029
Figure FDA00028706207000000210
indicating user slave position Li'Transferred to the position Lj'The probability of (d);
s34, according to the user slave position Li'Transferred to the position Lj'Probability of (2)
Figure FDA00028706207000000211
Calculating user familiarity with semantic types
Figure FDA00028706207000000212
And popularity of each location in semantic types
Figure FDA00028706207000000213
Figure FDA00028706207000000214
Figure FDA00028706207000000215
Wherein the content of the first and second substances,
Figure FDA0002870620700000031
indicating the popularity of the n-1 th round of calculation,
Figure FDA0002870620700000032
indicating the familiarity of the n-1 th round of calculation, C indicating the semantic type of the location, and u indicating the user.
4. The preference-aware track anonymization method of claim 1, wherein the anonymous track sequence of the user is obtained by:
let λ represent the user familiarity threshold, τ represent the location popularity threshold;
when the familiarity of the user to the semantic type is less than lambda and the popularity of each position in the semantic type is greater than or equal to tau, the privacy classification of the position of the user is classified into an unfamiliar and popular class, and a trusted third party does not need to carry out privacy protection on the position of the user;
when the familiarity of a user to the semantic type is less than lambda and the popularity of each position in the semantic type is less than tau, the privacy classification of the position of the user belongs to the unfamiliar and non-popular classes, and a trusted third party needs to adopt a fake data method to protect the sensitive position anonymous space of the user;
when the familiarity of a user to a semantic type is greater than or equal to lambda and the popularity of each position in the semantic type is greater than or equal to tau, the privacy classification of the position of the user belongs to the familiar and popular classes, and a trusted third party needs to adopt a space hiding method to protect the sensitive position anonymous space of the user;
when the familiarity of the user with the semantic type is greater than or equal to lambda and the popularity of each position in the semantic type is less than tau, the privacy classification of the position of the user belongs to the familiar and unpopular classes, and a trusted third party needs to adopt suppression technology to forbid the anonymous space of the position of the user from being published to a position social network server so as to protect the personal privacy of the user;
according to the four privacy classifications, a trusted third party selects different position anonymization methods in a self-adaptive mode, and finally an anonymization track sequence of the user is generated.
5. The preference-aware track anonymization method according to claim 1, wherein the method for obtaining the track privacy degree of the user by calculating the information entropy of the anonymous track sequence of the user comprises:
calculating the anonymous track sequence of the user at (t, t +1)Entropy H of information within intervals(t,t+1)
Figure FDA0002870620700000033
Wherein p isi"is the probability that the user visited location i at time t + 1", p0I ″, which is the probability that the user remains at the location at time t +1, at time t, …, k';
when the probability that the user visits all candidate positions at the moment of t +1 is the same, calculating the maximum information entropy MaxH of the anonymous track sequence of the user in the (t, t +1) time interval(t,t+1)
Figure FDA0002870620700000034
Entropy of information H(t,t+1)And maximum information entropy MaxH(t,t+1)Is taken as the track privacy degree H of the user
Figure FDA0002870620700000041
Thus, the track privacy degree HThe greater the value of (c), the greater the strength of the track privacy protection.
6. The track anonymization system used in the preference-aware track anonymization method according to any one of claims 1 to 5, comprising a location space generating module, a semantic description module, a behavior pattern extracting module, a privacy risk rating module and a track anonymization module; the privacy risk rating module is connected with the track anonymity module; the position space generation module is connected with the semantic description module, the semantic description module is connected with the behavior pattern extraction module, and the behavior pattern extraction module is connected with the track anonymization module; the track anonymization module converts the original track sequence into an anonymity track sequence, and the privacy risk rating module adjusts the original track sequence according to the anonymity track sequence.
7. The preference-aware track anonymity system of claim 6, wherein the location space generating module is configured to cluster raw points in the historical dataset into locations, thereby constructing a set of location spaces for the user;
the semantic description module is used for converting the geographical position information of the user into semantic position information;
the behavior pattern extraction module is used for mining the moving behavior habits and the motion patterns of the user;
the privacy risk rating module is used for dividing different privacy risk ratings according to behavior preference and familiarity of the user;
and the track anonymization module is used for adopting a position anonymization method of response in a self-adaptive manner according to different privacy risk grades so as to construct an anonymized track sequence.
CN202011599257.5A 2020-12-30 2020-12-30 Preference perception track anonymization method and system Pending CN112632614A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011599257.5A CN112632614A (en) 2020-12-30 2020-12-30 Preference perception track anonymization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011599257.5A CN112632614A (en) 2020-12-30 2020-12-30 Preference perception track anonymization method and system

Publications (1)

Publication Number Publication Date
CN112632614A true CN112632614A (en) 2021-04-09

Family

ID=75287614

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011599257.5A Pending CN112632614A (en) 2020-12-30 2020-12-30 Preference perception track anonymization method and system

Country Status (1)

Country Link
CN (1) CN112632614A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112035880A (en) * 2020-09-10 2020-12-04 辽宁工业大学 Track privacy protection service recommendation method based on preference perception
CN113946867A (en) * 2021-10-21 2022-01-18 福建工程学院 Position privacy protection method based on space influence
CN114760146A (en) * 2022-05-05 2022-07-15 郑州轻工业大学 Customizable location privacy protection method and system based on user portrait

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016145595A1 (en) * 2015-03-16 2016-09-22 Nokia Technologies Oy Method and apparatus for discovering social ties based on cloaked trajectories
CN112035880A (en) * 2020-09-10 2020-12-04 辽宁工业大学 Track privacy protection service recommendation method based on preference perception
CN112118531A (en) * 2020-09-12 2020-12-22 上海大学 Privacy protection method of crowd sensing application based on position

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016145595A1 (en) * 2015-03-16 2016-09-22 Nokia Technologies Oy Method and apparatus for discovering social ties based on cloaked trajectories
CN112035880A (en) * 2020-09-10 2020-12-04 辽宁工业大学 Track privacy protection service recommendation method based on preference perception
CN112118531A (en) * 2020-09-12 2020-12-22 上海大学 Privacy protection method of crowd sensing application based on position

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LIANG ZHU ET AL.: "PTPP: Preference-Aware Trajectory Privacy-Preserving over Location-Based Social Networks", 《HTTPS://WWW.ACADEMIA.EDU》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112035880A (en) * 2020-09-10 2020-12-04 辽宁工业大学 Track privacy protection service recommendation method based on preference perception
CN112035880B (en) * 2020-09-10 2024-02-09 辽宁工业大学 Track privacy protection service recommendation method based on preference perception
CN113946867A (en) * 2021-10-21 2022-01-18 福建工程学院 Position privacy protection method based on space influence
CN113946867B (en) * 2021-10-21 2024-05-31 福建工程学院 Position privacy protection method based on space influence
CN114760146A (en) * 2022-05-05 2022-07-15 郑州轻工业大学 Customizable location privacy protection method and system based on user portrait
CN114760146B (en) * 2022-05-05 2024-03-29 郑州轻工业大学 Customizable position privacy protection method and system based on user portrait

Similar Documents

Publication Publication Date Title
Jiang et al. A utility-aware general framework with quantifiable privacy preservation for destination prediction in LBSs
Primault et al. The long road to computational location privacy: A survey
Shaham et al. Privacy preservation in location-based services: A novel metric and attack model
Albouq et al. A double obfuscation approach for protecting the privacy of IoT location based applications
Sun et al. Location privacy preservation for mobile users in location-based services
CN112632614A (en) Preference perception track anonymization method and system
Primault et al. Time distortion anonymization for the publication of mobility data with high utility
Jin et al. A survey and experimental study on privacy-preserving trajectory data publishing
Beg et al. A privacy-preserving protocol for continuous and dynamic data collection in IoT enabled mobile app recommendation system (MARS)
Yin et al. GANs Based Density Distribution Privacy‐Preservation on Mobility Data
Wu et al. A novel dummy-based mechanism to protect privacy on trajectories
CN113254999A (en) User community mining method and system based on differential privacy
CN110602631A (en) Processing method and processing device for location data for resisting conjecture attack in LBS
CN105578412A (en) Position anonymization method based on position service and system
Khazbak et al. Deanonymizing mobility traces with co-location information
CN110502919B (en) Track data de-anonymization method based on deep learning
Zhang et al. RPAR: location privacy preserving via repartitioning anonymous region in mobile social network
Li et al. Quantifying location privacy risks under heterogeneous correlations
Wang et al. Privacy preserving for continuous query in location based services
Zhao et al. A Privacy‐Preserving Trajectory Publication Method Based on Secure Start‐Points and End‐Points
Qiu et al. Behavioral-semantic privacy protection for continual social mobility in mobile-internet services
Xing et al. An optimized algorithm for protecting privacy based on coordinates mean value for cognitive radio networks
Gupta et al. Mobility-Aware prefetching and replacement scheme for location-based services: MOPAR
Zhao et al. EPLA: efficient personal location anonymity
Domingues et al. Social Mix-zones: Anonymizing Personal Information on Contact Tracing Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210409