CN111797433B - LBS service privacy protection method based on differential privacy - Google Patents

LBS service privacy protection method based on differential privacy Download PDF

Info

Publication number
CN111797433B
CN111797433B CN202010690224.5A CN202010690224A CN111797433B CN 111797433 B CN111797433 B CN 111797433B CN 202010690224 A CN202010690224 A CN 202010690224A CN 111797433 B CN111797433 B CN 111797433B
Authority
CN
China
Prior art keywords
user
privacy
query
cluster
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010690224.5A
Other languages
Chinese (zh)
Other versions
CN111797433A (en
Inventor
史伟
张青云
张兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liaoning University of Technology
Original Assignee
Liaoning University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning University of Technology filed Critical Liaoning University of Technology
Priority to CN202010690224.5A priority Critical patent/CN111797433B/en
Publication of CN111797433A publication Critical patent/CN111797433A/en
Application granted granted Critical
Publication of CN111797433B publication Critical patent/CN111797433B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Remote Sensing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The application discloses an LBS service privacy protection method based on differential privacy. According to DP- (k) 1 And (3) constructing a query k-anonymized set conforming to differential privacy by using the background knowledge of the cluster where the user is and the area where the user is obtained in the l) -means algorithm, and sending the anonymized set to the LSP to avoid revealing the real query request of the user, so as to protect the query privacy of the user and avoid inference attack and space-time correlation attack of an attacker. DP-k 2 In the anonymity algorithm, a query request sent out in the same time period t is selected to construct a query k-anonymity set by combining the time characteristics, and the data set is processed by using an exponential mechanism, so that the reasonability of the query request in time is ensured, and the privacy of a user is also protected from being revealed.

Description

LBS service privacy protection method based on differential privacy
Technical Field
The application relates to a privacy protection method, in particular to an LBS service privacy protection method based on differential privacy.
Background
With the popularization of mobile networks and the development of Location technology, location-Based services (LBS) have gained acceptance by a large number of users. The service systems are greatly convenient for people to live, but often collect request data sent by users when the users are not aware of the users, analyze and process the data, and cause the problems of privacy information disclosure of the users and the like. The LBS-based service request contains a lot of privacy information, such as identity information, location information, interest point information, etc., of the user, which may be revealed while the query request is issued. Therefore, how to protect the privacy information of the user on the premise of ensuring that the query result of the user is not influenced becomes the current key research and development direction.
LBS request services in social networks have strict requirements on background knowledge, which refers to historical query probabilities of points of interest (Point Of Interest, POIs) in a map in existing LBS privacy protection schemes. If the attacker has a certain background knowledge, the accuracy of the attack can be greatly improved. Therefore, how to avoid background knowledge attacks becomes extremely important when designing LBS privacy protection schemes. The differential privacy is a privacy protection mechanism which is not influenced by background knowledge of an attacker and is not influenced by specific data change, and the problem that the traditional privacy protection method cannot resist background knowledge attack can be better solved by combining the mechanism with the existing privacy protection scheme.
The Location service request sent by the user to the LBS service platform (Location-based Services Platform, LSP) includes sensitive data such as identity information of the user, current Location information, sending time of the query request, and information of the queried interest point. The location information and other information deduced from the location information belong to location privacy, information related to the requested content belongs to query privacy, and both the location privacy and the query privacy belong to the category of LBS privacy protection.
To better protect the private information in LBS requests, grutereser et al first applied k-anonymization techniques to location privacy preservation, resulting in a location k-anonymization model: when the position of the moving object at a certain moment cannot be distinguished from the positions of other k-1 users, the position is said to satisfy the position k-anonymity. Niu et al, based on a randomization method, compromise road network environments, select candidate regions by way of circular region segmentation and grid expansion, and then generate pseudo-locations according to location semantic information. The above algorithm is continuously improved, but the common disadvantages are that the background knowledge possibly owned by an attacker is not considered, and the position information sent to the LSP by a user is not considered for position semantics, so that the generated position anonymity set contains a plurality of false position points which can be directly eliminated, the inference attack of the attacker cannot be resisted, the communication overhead is increased, and the service quality is influenced.
For query privacy, the more authentic and diversified the generated false query is, the higher the protection level of the user query privacy is. The Dummy-Q model proposed by Pinglery et al constructs a Dummy query according to relevant conditions such as query context, user motion model, query semantics, etc., so that an attacker cannot distinguish the difference between a user's real query and the Dummy query. In the above query privacy protection algorithm, the real query request of the user is not combined, and if the real query request of the user is not contained in the POI with higher unit query probability, the returned query result is useless for the user.
Aiming at the problems, the application provides a DPLQ privacy protection scheme which combines a differential privacy mechanism, a k-means algorithm, a l-diversity algorithm and a k-anonymous algorithm, and effectively protects the position privacy and the query privacy of a user under the condition of not being influenced by background knowledge owned by an attacker.
Disclosure of Invention
The application designs and develops an LBS service privacy protection method based on differential privacy, which effectively protects the position privacy and query privacy of a user under the condition of not being influenced by background knowledge owned by an attacker.
An LBS service privacy protection method based on differential privacy comprises the following steps:
step one, constructing a Voronoi diagram according to map information, so that each Voronoi polygon only contains one POI;
step two, calculating the number of users contained in each Voronoi polygon by combining the position data set X, and arranging the Voronoi polygons in a descending order according to the number of the users;
step three, selecting the front k with a large number of users 1 A Voronoi polygon with its centroid O j As an initial cluster center of the k-means algorithm;
step four, calculating the position of each user in the position data set X and an initial clustering center O j Euclidean distance d of (2) ij
Step five, dividing the users into cluster clusters with minimum Euclidean distance;
step six, after all users are divided, calculating the mass centers of k clusters again;
step seven, if the twice centroid distance is smaller than a set threshold delta, using an original cluster center, and ending the cycle; if the twice centroid distance is larger than the threshold delta, using the updated centroid as a cluster center, and jumping to the fourth step.
Step eight, returning the cluster center O where the inquiring user is located j And l-1 centroids nearer to it, forming a location dataset with l false locations;
adding Laplace noise to the l cluster centers, constructing a position anonymous set containing l false positions by using the noisy position points, and sending the position anonymous set to an LSP instead of the true positions of the user;
step nine, determining cluster C j The number of users sending requests in the middle t period;
step ten, calculating cluster C j Position similarity S of each position point and user position l And arranged in ascending order;
step eleven, according to S l To obtain the corresponding arrangement (q 1 ,q 2 ,…,q n );
Step twelve, the real query request q of the user x Put query k 2 -in an anonymity set QA;
step thirteen, judge q i If present in QA, if not, then q i Adding into QA; if so, comparing the next query request q according to the query request sequence in the third step i+1 Up to the point where k is contained in QA 2 An element;
fourteen step, if cluster C j All q of (3) i All add to QA, |QA| is still less than k 2 And then the historical query request probability Pr (qi) of the same period t is ordered in a descending order to obtain (q) 1 ,q 2 ,…,q m );
Fifteen, jumping to thirteenth, and continuing to judge q i Whether or not present in QA, when |QA|=k 2 Stopping circulation when the time is over;
sixthly, privacy protection is carried out on the QA by using an exponential mechanism meeting epsilon-differential privacy, and the output probability of each candidate item in the QA is strictly controlled.
Taking delta=5 meters; i, identifying inquiry requests sent by different users, wherein the inquiry requests are integers; j identifies different clusters and cluster centers, and is an integer; n represents the cluster C in which the user is located during the t period j Number of inquiry requests issued in (a)Is an integer; m represents the number of inquiry requests sent by the history of t time period and is an integer.
As a further preference, neighboring users in the same cluster may each use the anonymized set instead of their real location information.
The beneficial effects of the application are as follows:
aiming at protecting the position privacy of a user, the application provides a DP- (k) 1 L) -means algorithm. The algorithm combines a differential privacy mechanism with a k-means algorithm, performs Voronoi graph pre-division on a road network, constructs k cluster clusters according to the divided Voronoi polygons and user position points contained in each polygon, selects l cluster centers by using the thought of l-diversity to perform noise adding processing, and sends the position points with noise to an LBS server instead of the actual position of the user, thereby avoiding the problem that a malicious attacker intercepts the privacy information of the user and the incomplete credibility of the LBS server in the process of sending the LBS service by the user, and achieving the purpose of protecting the position privacy of the user.
Aiming at query privacy protection of users, the application provides a DP-k 2 -anonymous algorithm. And constructing a query k-anonymous set according to query information of neighboring users in the cluster at the same time period and historical query probability of the POI in the region, and adding noise to the query k-anonymous set by using an exponential mechanism, so that the aim of protecting the user query privacy is fulfilled.
Different from the condition that the existing privacy protection scheme only can protect the position privacy of the user, the application adopts the personalized privacy protection scheme to provide protection for the LBS privacy information of the user on the premise of considering background knowledge attack, so that the user can autonomously control the privacy protection degree, and the targets of different users on different requirements of privacy protection are realized.
Drawings
Fig. 1 is a schematic diagram of a centralized architecture according to the present application.
FIG. 2 is a graph of an algorithmic comparison of the degree of deviation of a false location from a user's true location in accordance with the present application.
FIG. 3 is a graph of a comparison of the average road network distance algorithm from false location to true location of the present application.
FIG. 4 is a graph of an algorithmic comparison of the relevance between a false query and a true query of the present application.
FIG. 5 is a fixed k of the present application 2 Comparison of the mean run time of the algorithm.
FIG. 6 is a graph comparing the average run time of the fixed/time algorithm of the present application.
Fig. 7 is a graph of the difference between the false position and the true position of the user, which is constructed by the DPLQ algorithm of the present application.
Detailed Description
The present application is described in further detail below with reference to the drawings to enable those skilled in the art to practice the application by referring to the description.
POI refers to the nature of distinguishing the location from geographic location information. For example, when the user a sends a service request for inquiring about the nearest movie theatre to the LSP in the marketplace, the marketplace is the POI of the location of the user, the geographic location of the marketplace is the real location of the user, the movie theatre is the POI of the user inquiry request, and the location information of the movie theatre is the geographic location of the user inquiry request. In summary, POIs represent semantic information such as infrastructure or other facilities with markedness in geographic locations, and can be used as keywords for user service requests for queries.
The background knowledge comprises information such as POIs where a certain position point is located, service requests sent in a certain time period, query probability as the service requests and the like, and any user terminal with a calculation and storage function can acquire the background knowledge of the position point. The background knowledge of the application is obtained through the overlay API of the OpenStreetMap map.
The inquiry request based on LBS requires the user to acquire service according to the current position, the user firstly sends the inquiry service request Q= { ID, loc, POI, t, QPOI, qloc } to a trusted third party server (Trusted Third Party, TTP), and the TTP performs privacy protection on the position information and the inquiry information. In the query request Q of the user, the ID is an identifier of the user, the unique individual Loc, POI, qloc and the QPOI can be directly determined as the quasi identifier of the user, and the minimum attribute set of the user can be obtained according to the connection with the external table.
The Loc represents the position of the user when sending a query request, the POI represents the point of interest of the current position, the QPOI represents the point of interest to be queried by the user, and the Qloc represents the position of the query point of interest. t represents the time when the user issues a query request.
Existing LBS service-based privacy protection system architectures are broadly divided into three categories: centralized architecture, distributed architecture, and hybrid architecture. The application adopts a centralized architecture which is most widely applied at present, the architecture is composed of a user side, a TTP (trusted third party server) and an LSP (local service provider), and a schematic diagram of the architecture is shown in figure 1.
When a user sends a service request to the LSP, the data is subjected to privacy protection through a TTP server. Generating a pseudonym corresponding to the user through a pseudonym processing module, and regenerating a new pseudonym when inquiring each time so as to prevent an attacker from carrying out presumption attack; the position data of the user is generalized through a position generalization module, and the k-means clustering center is used for replacing the real position of the user and sending the position data to the LSP; and carrying out k-anonymization processing on query information of the user through a query anonymization module, and simultaneously transmitting k query information to the LSP. The LSP processes the received anonymous request and returns a service request result data set to the TTP, and the TTP returns a query result meeting the user requirements to the user through a query result refining module according to the real data of the user.
For the privacy protection of the position of the user, the map is divided by using the Voronoi diagram, and k-means and l-diversity are adopted as basic ideas of the privacy protection. Thus, the (k, l) -means privacy protection model is defined as follows:
(k, l) -means privacy protection model. If Loc 1 、Loc 2 The following conditions are satisfied:
(1)|Loc 1 |=k,Loc 1 representing a user location generalization set;
(2)Loc 1 ∈DESC(O 1 ,O 2 ,…O k );DESC(O 1 ,O 2 ,…,O k ) Representing the clustering centers arranged in a descending order;
(3)|Loc 2 |=l,Loc 2 representing the set of locations sent to the LSP.
Loc is then 2 The location data in (k, l) -means privacy protection model is satisfied.
Condition 1 represents the user's location generalization set Loc 1 K data records are included, namely, users in the area in the same time period are clustered into k clusters; condition 2 represents the set Loc 1 Is composed of cluster center O i Is formed in descending order; condition 3 represents the slave set Loc 1 Selecting one position point to form a set Loc 2 And sending the false position of the user to the LSP.
For inquiring privacy protection of users, the application provides an inquiring k-anonymity privacy protection model based on the thought of position k-anonymity. The model adopts a local k-anonymization algorithm to generalize the real query request of the user into an anonymized query set, and attribute values of different users are generalized into different independent generalized data sets, so that excessive generalization of user data can be prevented, and the service effect is influenced.
Query k-anonymity privacy preserving model. The real query information of the user is generalized into an anonymous query data set, so that the query information of the user cannot be distinguished from other k-1 records, and the generated anonymous set is processed by using an exponential mechanism, so that the probability of success of the user query request is lower than 1/k.
DP-(k 1 The l) -means algorithm combines the differential privacy with the k-means algorithm, and protects the position privacy of the user according to known conditions such as map information, user real position information X, position data set X and the like.
Firstly, carrying out Voronoi diagram pre-division on a map, and constructing k according to Voronoi polygons 1 And clustering, namely selecting l cluster centers by using an l-diversity idea, adding Laplace noise, constructing a position anonymous set containing l false positions by using the noisy position points, and sending the position anonymous set to the LSP instead of the real positions of the user. Neighbor users in the same cluster can each use the anonymized set instead of their real location information,the reciprocity of the algorithm is realized, the service response time and the operation cost are saved, and the attack of the homogeneity and the attack of the background knowledge of an attacker can be effectively avoided.
DP-(k 1 L) -means algorithm
Input: map information M, user real position information X, position data set X
And (3) outputting: l false positions
Wherein O is j Is the initial cluster center of the k-means algorithm, x i Representing a sample point, d, in the position data set X ij Representing sample point x i To O j Euclidean distance of C j Representing a cluster center O j Corresponding cluster, O j ' represent updated cluster C j Lap (λ) represents Laplacian noise satisfying ε -differential privacy.
DP- (k) is described in detail below 1 The l) -means algorithm implements the process of processing data.
Step one, constructing a Voronoi diagram according to map information, so that each Voronoi polygon only contains one POI;
step two, calculating the number of users contained in each Voronoi polygon by combining the position data set X, and arranging the Voronoi polygons in a descending order according to the number of the users;
step three, selecting the front k with a large number of users 1 A Voronoi polygon with its centroid O j As an initial cluster center of the k-means algorithm;
step four, calculating the position of each user in the position data set X and an initial clustering center O j Euclidean distance d of (2) ij
Step five, dividing the users into cluster clusters with minimum Euclidean distance;
step six, after all users are divided, calculating the mass centers of k clusters again;
step seven, if the twice centroid distance is smaller than a set threshold delta, using an original cluster center, and ending the cycle; if the twice centroid distance is larger than the threshold delta, using the updated centroid as a cluster center, and jumping to the fourth step by the algorithm.
Step eight, returning the cluster center O where the inquiring user is located by the algorithm j And l-1 centroids closer thereto, constitute a position dataset with l false positions.
DP-k 2 The anonymity algorithm combines differential privacy with the k-anonymity algorithm, and the user customizes k according to his own privacy requirements 2 Value k 2 The larger the value, the better the privacy preserving effect, but the accuracy of the service is reduced.
The threshold δ may be set according to experimental requirements, and as a preferred aspect, δ=5 meters is selected.
i, identifying inquiry requests sent by different users, wherein the inquiry requests are integers; j identifies different clusters and cluster centers, and is an integer; n represents the cluster C in which the user is located during the t period j The number of the inquiry requests sent by the system is an integer; m represents the number of inquiry requests sent by the history of t time period and is an integer.
According to DP- (k) 1 And (3) constructing a query k-anonymized set conforming to differential privacy by using the background knowledge of the cluster where the user is and the area where the user is obtained in the l) -means algorithm, and sending the anonymized set to the LSP to avoid revealing the real query request of the user, so as to protect the query privacy of the user and avoid inference attack and space-time correlation attack of an attacker.
DP-k 2 -anonymity algorithm
Input: k (k) 2 Value, inquiring cluster C of user j Voronoi diagram
And (3) outputting: query k 2 Anonymity set QA
Wherein n is as followsCluster C j User number of request transmitted in middle t period, ASCE (S 1 ,S 2 ,…,S n ) Represents the ascending order of the position similarity S of n position points, q i The position similarity at t period is represented as S i Query requests issued by location points of q x Represents the query request, pr (q i ) Indicated at t period q i Can be obtained from background knowledge, DESC (Pr (q) 1 ),Pr(q 2 ),…,Pr(q n ) Represents Pr (q) i ) And (5) performing descending order arrangement.
DP-k 2 The position similarity mentioned in the anonymity algorithm can be calculated by the following formula:
wherein U is i And U j Respectively representing two different users in a cluster, d (U i ,U j ) Representing Euclidean distance of two users, S l The larger the similarity is, the less similar the representation of the two users. In the same time period, any user in the cluster can send out a query request by using the query k 2 -anonymized set substitution.
DP-k is specifically described below 2 The anonymity algorithm implements a process of processing data.
Step one, determining cluster C according to known background knowledge j The number of users sending requests in the middle t period;
step two, calculating cluster C j Position similarity S of each position point and user position l And arranged in ascending order;
step three, according to S l To obtain the corresponding arrangement (q 1 ,q 2 ,…,q n );
Step four, the real query request q of the user is sent x Put query k 2 In the anonymized set QA,
step five, judging q i If present in QA, if not, then q i Adding into QA; if so, comparing the next query request q according to the query request sequence in the third step i+1 Up to the point where k is contained in QA 2 An element;
step six, if cluster C j All q of (3) i All add to QA, |QA| is still less than k 2 And then the historical query request probability Pr (qi) of the same period t is ordered in a descending order to obtain (q) 1 ,q 2 ,…,q n );
Step seven, the algorithm jumps to step five, and q is continuously judged i Whether or not present in QA, when |QA|=k 2 The cycle is stopped.
And step eight, privacy protection is carried out on the QA by using an exponential mechanism meeting epsilon-differential privacy, and the output probability of each candidate item in the QA is strictly controlled.
Security analysis
(1) Resist homogeneity attacks; the basic idea of a homogeneity attack is to find multiple records in one data source that correspond to one sensitive attribute at the same time.
DP-(k 1 The l) -means algorithm combines the k-anonymization algorithm and the l-diversity algorithm to generate a location anonymization set, so that all users in the cluster can use the generated location anonymization set. Even if an attacker acquires the position anonymous set constructed by the algorithm, the real information of the users in the cluster cannot be acquired, because the data in the anonymous set are false positions surrounding the users in the cluster and do not contain the real information of the users. Thus, the solution can be effective against homogeneity attacks.
(2) Resisting background knowledge attack; the basic idea of background knowledge attack is to find out a plurality of records corresponding to a certain data source from a plurality of data sources, and if the background knowledge of the data source is provided, other sensitive attribute information corresponding to the data source may be found.
The differential privacy is based on strict mathematical knowledge, the result of processing the data set is not influenced by a specific piece of data, and any piece of data is deleted from the data set, so that the calculation result is not influenced. Assuming that the complete data set is D, the attacker has already possessed all data except the attack object information, denoted as data set D ', and data sets D and D' are neighboring data sets differing by at most one piece of data. The sensitivity ΔF of the query algorithm F is expressed as
ΔF=max D,D' ||F(D)-F(D')||
ΔF≤1
In colloquial terms, the algorithm sensitivity Δf can be understood as the worst impact of randomly adding or deleting a record on the query results of the entire dataset. The 2 formulas show that the attacker can not acquire the attack object information after acquiring the maximum background knowledge, so that the differential privacy can well resist the background knowledge attack. The algorithm of the application combines a differential privacy mechanism, and can effectively resist the background knowledge attack of an attacker while guaranteeing the service quality.
(3) Resisting inference attack; the basic idea of inference attacks is that an attacker may infer possible location information and query requests of a user from information such as life experience, common sense and background knowledge.
For the user position information, the position anonymous set constructed by the algorithm adopts false positions close to the user position, so that an attacker cannot infer other privacy information of the user from the position information; for the query privacy of the user, selecting a query request sent by a neighbor user in the same cluster within the same time period, and constructing a query k 2 Anonymity set, which can ensure the authenticity of the query request and avoid an attacker from deducing the user location or other information from the query content. Therefore, the algorithm provided by the application can effectively avoid inference attack.
(4) Resisting space-time correlation attacks; space-time correlation attack [22] The DP-k proposed in the application mainly aims at query privacy 2 In the anonymity algorithm, a query request sent out in the same time period t is selected to construct a query k-anonymity set by combining the time characteristics, and the data set is processed by using an exponential mechanism, so that the reasonability of the query request in time is ensured, and the privacy of a user is also protected from being revealed.
Experiment verification
The algorithm in the experiment is written and realized by Java, and the running environment is a 1.70GHz Interl (R) Core (TM) i5 processor, a 4GB memory and a 64-bit Windows 8 operating system. The data set used for the experiment was derived from road network information from Aldburgh, germany and user information generated from the Fourdwere website. The experimental data set includes 7035 roads, 6105 vertices, 4-9 POI points, and 250 ten thousand different users' query requests.
In the experiment, DPLQ of the method is compared with three algorithms of Mobilmix, H-Star and T-SR. The MobiMix is a mix-zone based road network frame and is used for protecting the position privacy of a user; H-Star is an X-Star extended stealth algorithm based on Hilbert rules; T-SR is a location privacy protection algorithm based on POI queries. The three algorithms are classical algorithms for protecting private information based on different technologies respectively, and have good representativeness.
The experiment uses Pseudo-Variance (PV), average path distance (Average Path Distance, APD) and the degree of correlation between the false query and the real query (Association Degree, AD) as evaluation criteria. The three-point evaluation standard can well reflect the rationality and effectiveness of false information generated by the algorithm, and is convenient for comparing experimental effects of different algorithms. The PV and APD definitions are shown in formulas (1), (2).
Wherein P is uj Is the POI query frequency corresponding to the real place of the user, P ij Is the POI query frequency, k, corresponding to the i-th false location in the anonymous set of locations 1 Represents k in algorithm 1 1 -means coefficients, l representing the l-diversity coefficient in algorithm 1.
The association between two POI categories is defined as shown in equation (3).
Wherein Fnum () represents the POI at time t i To POI j N represents the total number of POIs within the same grid area. The size of the function AD depends on the POI i To POI j Access frequency and POI of (c) i Ratio of total access frequency to all other POI points.
A pseudo-variance ratio of the algorithm; PV represents the degree of deviation of the constructed false position from the user's true position. The smaller the PV, the higher the uncertainty of the generated position dataset, the more true the false position. FIG. 2 reflects the position at k 1 When the privacy protection budget epsilon=1.0 and the grid area where the user is located is 3km by 3km, the number of elements l in the position data set is different, and the PV difference of the four algorithms is the same.
As can be seen from fig. 2, the DPLQ algorithm is always superior to the other three algorithms, regardless of the value of l. Therefore, the false position generated by the DPLQ algorithm is obtained through comparison to be more real, and the reasoning attack can be well resisted. When l=10, the PV values of the DPLQ algorithm, the T-SR algorithm and the H-Star algorithm are much smaller than the MobiMix algorithm, since both algorithms consider the rationality of false locations and the diversity of POI semantics in the location dataset in combination with the real locations of the user when building the location dataset. As the value of l increases, the variability of the four algorithms becomes smaller, as the privacy preserving budget is fixed and the grid area in which the location dataset is built is also fixed; the increasing value of l will allow four algorithms to choose more highly similar false locations to construct the location dataset, thus making the inter-algorithm PV variance smaller and smaller.
Comparing average path distances of the algorithm; APD represents the average road network distance from the false location to the true location. The larger and more scattered the false location distribution area, the more difficult it is for a malicious attacker to obtain the user's real location data from the location data set. Fig. 3 reflects the effect of the change in the l value on APD of different algorithms when the grid area where the user is located is 3km by 3 km.
As can be seen from fig. 3, the APD of the DPLQ algorithm is larger than the other three algorithms, which means that the false positions generated by the DPLQ algorithm are more scattered. As the value of l increases, the APD differences for the four algorithms become smaller. This is because in experiments, the grid area where the user is located is unchanged, the value of l is increased continuously, and four algorithms can select more similar false positions, so that APD differences among algorithms become smaller.
Correlation comparison between false queries and true queries. Relevance refers to the query k 2 -a correlation between the generated false POI queries and the user's real POI queries in the anonymous collection. We compare the relevance between false queries and true queries of the DPLQ algorithm, the T-SR algorithm, the H-Star algorithm, and the Mobimix algorithm, respectively.
As can be seen from fig. 4, the association degree between the false query and the real query of the DPLQ algorithm and the T-SR algorithm is 0, and both algorithms consider the time association between the query request and the location of the user, and generate the false query which has no association with the real query of the user, so that the false query can resist the space-time association attack of a malicious attacker. The experimental results of the DPLQ algorithm and the T-SR algorithm are obviously superior to those of the other two algorithms.
Influence of experimental parameters on experimental time; the effective parameters in the experiment are respectively as follows: number of clusters k 1 The number of elements in the position data set is l, and the number of elements in the anonymous set is k 2 Privacy preserving budget ε. Due to the number of clusters k 1 Has no direct influence on the experimental result, so k is set in the experiment 1 =50. l, k 2 And epsilon form privacy preserving triples<l,k 2 ,ε>. FIG. 5 shows when k 2 When =10 is unchanged, the effect of the change in epsilon and l on the average run time of the algorithm is compared.
As shown in fig. 5, a fixed epsilon value, as the value of l increases, more false positions need to be generated, thus taking longer run time; by fixing the value of l and observing the graph from bottom to top, the privacy protection degree becomes higher, the added noise amount increases, and the algorithm running time becomes longer.
Fig. 6 shows a comparison of epsilon and k when l=8 is unchanged 2 The effect of the variation of (c) on the average run time of the algorithm, the experiment assumes n=10.
As shown in FIG. 6Fix k 2 The value, the relation between epsilon value and algorithm running time can be obtained by the same principle; fixed epsilon value with k 2 The increase in value, when k, takes longer run time 2 =12 and k 2 At=14, the algorithm run time is significantly higher than k 2 Run time at=10, since the experiment assumes n=10, when k 2 <When n, k in the cluster is directly screened 2 1 query request sent by users with small position similarity at the same time is needed without considering the influence of historical query results on constructing an anonymous set, so that the algorithm running time is obviously less than k 2 >Run time at n.
Fig. 7 shows the difference in PV values of the DPLQ algorithm when setting different privacy preserving parameters.
As can be seen from fig. 7, when epsilon=0.5 and l=12, the PV value of the algorithm is minimum, which means that the deviation degree of the constructed false position from the true position of the user is minimum, so that the LSP can provide better position service for the user without revealing the position privacy information of the user.
The application provides an LBS service privacy protection scheme-DPLQ based on differential privacy. The scheme includes two algorithms, DP- (k) 1 L) -means algorithm and DP-k 2 -anonymity algorithm, which is capable of effectively protecting location privacy and query privacy in LBS service requests. The scheme considers the influence of background knowledge and space-time correlation on a privacy protection algorithm, and defines two privacy protection models; the privacy protection strength can be defined by the user by considering the different privacy requirements of different users. Therefore, the scheme makes it difficult for malicious attacks to acquire the privacy information of the user from the constructed position data set and the query k-anonymous set, so that the constructed false positions are more dispersed, and the false query is more authentic, thereby resisting the homogeneity attack, the background knowledge attack, the inference attack and the time relevance attack. Experiments show that the algorithm has obvious advantages in the aspects of pseudo variance, average path distance, association degree between false query and real query and the like, has good expandability, and can effectively protect LBS privacy information of users. In future work, we will be specific to the location of the userAnd carrying out privacy measurement and hierarchical protection on the sent query request so as to more accurately protect the LBS request of the user without losing the service quality.
Although embodiments of the present application have been disclosed above, it is not limited to the details and embodiments shown and described, it is well suited to various fields of use for which the application would be readily apparent to those skilled in the art, and accordingly, the application is not limited to the specific details and illustrations shown and described herein, without departing from the general concepts defined in the claims and their equivalents.

Claims (2)

1. The LBS service privacy protection method based on differential privacy is characterized by comprising the following steps:
step one, constructing a Voronoi diagram according to map information, so that each Voronoi polygon only contains one POI;
step two, calculating the number of users contained in each Voronoi polygon by combining the position data set X, and arranging the Voronoi polygons in a descending order according to the number of the users;
step three, selecting the front k with a large number of users 1 A Voronoi polygon with its centroid O j As an initial cluster center of the k-means algorithm;
step four, calculating the position of each user in the position data set X and an initial clustering center O j Euclidean distance d of (2) ij
Step five, dividing the users into cluster clusters with minimum Euclidean distance;
step six, after all users are divided, calculating the mass centers of k clusters again;
step seven, if the twice centroid distance is smaller than a set threshold delta, using an original cluster center, and ending the cycle; if the twice centroid distance is larger than the threshold delta, using the updated centroid as a cluster center, and jumping to the fourth step;
step eight, returning the cluster center O where the inquiring user is located j And l-1 centroids nearer to it, forming a location dataset with l false locations;
adding Laplace noise to the l cluster centers, constructing a position anonymous set containing l false positions by using the noisy position points, and sending the position anonymous set to a local service provider LSP instead of the true positions of the users;
step nine, determining cluster C j The number of users sending requests in the middle t period;
step ten, calculating cluster C j Position similarity S of each position point and user position l And arranged in ascending order;
step eleven, according to S l To obtain the corresponding arrangement (q 1 ,q 2 ,…,q n );
Step twelve, the real query request q of the user x Put query k 2 -in an anonymity set QA;
step thirteen, judge q i If present in QA, if not, then q i Adding into QA; if so, comparing the next query request q according to the query request sequence in the third step i+1 Up to the point where k is contained in QA 2 An element;
fourteen step, if cluster C j All q of (3) i All add to QA, |QA| is still less than k 2 And then the historical query request probability Pr (qi) of the same period t is ordered in a descending order to obtain (q) 1 ,q 2 ,…,q m );
Fifteen, jumping to thirteenth, and continuing to judge q i Whether or not present in QA, when |QA|=k 2 Stopping circulation when the time is over;
sixthly, privacy protection is carried out on the QA by using an exponential mechanism meeting epsilon-differential privacy, and the output probability of each candidate item in the QA is strictly controlled;
taking delta=5 meters; i, identifying inquiry requests sent by different users, wherein the inquiry requests are integers; j identifies different clusters and cluster centers, and is an integer; n represents the cluster C in which the user is located during the t period j The number of the inquiry requests sent by the system is an integer; m represents the number of inquiry requests sent by the history of t time period and is an integer.
2. The LBS service privacy protection method of claim 1 wherein neighboring users in the same cluster can each use the anonymized set to replace their real location information.
CN202010690224.5A 2020-07-17 2020-07-17 LBS service privacy protection method based on differential privacy Active CN111797433B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010690224.5A CN111797433B (en) 2020-07-17 2020-07-17 LBS service privacy protection method based on differential privacy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010690224.5A CN111797433B (en) 2020-07-17 2020-07-17 LBS service privacy protection method based on differential privacy

Publications (2)

Publication Number Publication Date
CN111797433A CN111797433A (en) 2020-10-20
CN111797433B true CN111797433B (en) 2023-08-29

Family

ID=72808687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010690224.5A Active CN111797433B (en) 2020-07-17 2020-07-17 LBS service privacy protection method based on differential privacy

Country Status (1)

Country Link
CN (1) CN111797433B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112035880B (en) * 2020-09-10 2024-02-09 辽宁工业大学 Track privacy protection service recommendation method based on preference perception
CN112767693A (en) * 2020-12-31 2021-05-07 北京明朝万达科技股份有限公司 Vehicle driving data processing method and device
CN113407870B (en) * 2021-06-17 2023-07-04 安徽师范大学 Road network LBS interest point query privacy protection method based on semantic and space-time correlation

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104394509A (en) * 2014-11-21 2015-03-04 西安交通大学 High-efficiency difference disturbance location privacy protection system and method
CN109379718A (en) * 2018-12-10 2019-02-22 南京理工大学 Complete anonymous method for secret protection based on continuous-query location-based service
CN109413067A (en) * 2018-10-29 2019-03-01 福建师范大学 A kind of inquiry method for protecting track privacy
CN110062324A (en) * 2019-03-28 2019-07-26 南京航空航天大学 A kind of personalized location method for secret protection based on k- anonymity
CN110300029A (en) * 2019-07-06 2019-10-01 桂林电子科技大学 A kind of location privacy protection method of anti-side right attack and position semantic attacks
CN110855375A (en) * 2019-12-02 2020-02-28 河海大学常州校区 Source node privacy protection method based on position push in underwater acoustic sensor network
CN111339091A (en) * 2020-02-23 2020-06-26 兰州理工大学 Position big data differential privacy division and release method based on non-uniform quadtree

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9043927B2 (en) * 2012-09-27 2015-05-26 Neo Mechanic Limited Method and apparatus for authenticating location-based services without compromising location privacy

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104394509A (en) * 2014-11-21 2015-03-04 西安交通大学 High-efficiency difference disturbance location privacy protection system and method
CN109413067A (en) * 2018-10-29 2019-03-01 福建师范大学 A kind of inquiry method for protecting track privacy
CN109379718A (en) * 2018-12-10 2019-02-22 南京理工大学 Complete anonymous method for secret protection based on continuous-query location-based service
CN110062324A (en) * 2019-03-28 2019-07-26 南京航空航天大学 A kind of personalized location method for secret protection based on k- anonymity
CN110300029A (en) * 2019-07-06 2019-10-01 桂林电子科技大学 A kind of location privacy protection method of anti-side right attack and position semantic attacks
CN110855375A (en) * 2019-12-02 2020-02-28 河海大学常州校区 Source node privacy protection method based on position push in underwater acoustic sensor network
CN111339091A (en) * 2020-02-23 2020-06-26 兰州理工大学 Position big data differential privacy division and release method based on non-uniform quadtree

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于LBS系统的服务请求隐私保护研究;张青云;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》(第02期);第1-58页 *

Also Published As

Publication number Publication date
CN111797433A (en) 2020-10-20

Similar Documents

Publication Publication Date Title
CN111797433B (en) LBS service privacy protection method based on differential privacy
Dong et al. Novel privacy-preserving algorithm based on frequent path for trajectory data publishing
Zhou et al. Privacy-preserving online task allocation in edge-computing-enabled massive crowdsensing
Xue et al. Location diversity: Enhanced privacy protection in location based services
Kalnis et al. Preventing location-based identity inference in anonymous spatial queries
Ni et al. An anonymous entropy-based location privacy protection scheme in mobile social networks
Dewri et al. Query m-invariance: Preventing query disclosures in continuous location-based services
US20090030778A1 (en) System, method and apparatus for secure multiparty location based services
CN110300029B (en) Position privacy protection method for preventing edge-weight attack and position semantic attack
CN112035880B (en) Track privacy protection service recommendation method based on preference perception
Ma et al. A voronoi-based location privacy-preserving method for continuous query in LBS
Tan et al. Protecting privacy of location-based services in road networks
Wang et al. Achieving effective $ k $-anonymity for query privacy in location-based services
Zhang et al. DPLQ: Location‐based service privacy protection scheme based on differential privacy
Shin et al. A profile anonymization model for location-based services
Li et al. A personalized trajectory privacy protection method
Zhao et al. A Privacy‐Preserving Trajectory Publication Method Based on Secure Start‐Points and End‐Points
Gutiérrez-Soto et al. Location‐Query‐Privacy and Safety Cloaking Schemes for Continuous Location‐Based Services
Hashem et al. Crowd-enabled processing of trustworthy, privacy-enhanced and personalised location based services with quality guarantee
Lee et al. Navigational path privacy protection: navigational path privacy protection
Zhang et al. LPPS‐AGC: Location Privacy Protection Strategy Based on Alt‐Geohash Coding in Location‐Based Services
Kuang et al. T-SR: A location privacy protection algorithm based on POI query
Wang et al. RoPriv: Road network-aware privacy-preserving framework in spatial crowdsourcing
Xu et al. Location-semantic aware privacy protection algorithms for location-based services
Ma et al. Trajectory Privacy Protection Method based on Shadow vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant