CN114564747B - Trajectory differential privacy protection method and system based on semantics and prediction - Google Patents

Trajectory differential privacy protection method and system based on semantics and prediction Download PDF

Info

Publication number
CN114564747B
CN114564747B CN202210190712.9A CN202210190712A CN114564747B CN 114564747 B CN114564747 B CN 114564747B CN 202210190712 A CN202210190712 A CN 202210190712A CN 114564747 B CN114564747 B CN 114564747B
Authority
CN
China
Prior art keywords
privacy
sensitivity
track
user
location
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210190712.9A
Other languages
Chinese (zh)
Other versions
CN114564747A (en
Inventor
章静
李雁姿
林力伟
石思彤
丁倩
邱浩宇
张廖如星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian University of Technology
Original Assignee
Fujian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian University of Technology filed Critical Fujian University of Technology
Priority to CN202210190712.9A priority Critical patent/CN114564747B/en
Publication of CN114564747A publication Critical patent/CN114564747A/en
Application granted granted Critical
Publication of CN114564747B publication Critical patent/CN114564747B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a trace differential privacy protection method and a trace differential privacy protection system based on semantics and prediction, wherein the method comprises the following steps: semantic sensitivity preprocessing: according to the distance and the access degree, the semantic sensitivity of the semantic sensitive position is respectively radiated to the nearby nodes, so that the semantic sensitivity of each position point is obtained, the sign-in times of a user at the node and the semantic sensitivity are combined to be used as the position sensitivity of the node, and the privacy level of each position point is further determined; then constructing a prefix tree according to the track set and the position sensitivity and privacy level of each position point; assigning privacy budgets from the prefix tree: distributing the privacy budget of the track subsequence according to the average sensitivity of the track subsequence, and distributing the privacy budget of the position point according to the privacy level of the position point; adjusting the allocation of the privacy budget by a markov chain; noise is added according to the privacy budget to change the privacy level of the position, so that the user track privacy is protected. The method and the system are beneficial to improving the effect of track privacy protection.

Description

Trajectory differential privacy protection method and system based on semantics and prediction
Technical Field
The invention belongs to the field of track privacy protection, and particularly relates to a track differential privacy protection method and system based on semantics and prediction.
Background
In recent years, with the rapid development of the mobile internet and the continuous upgrade of communication devices, location-based services (LBS) are increasingly popular in people's daily lives. Currently, location-based services have covered aspects of national economy and social life, such as navigation, point of interest queries and recommendations, take-away, check-in, social networking, and the like. And the development and application of 5G technology has led to the application of location-based services in a wider field. However, location-based services bring convenience to people in life style and also bring problems of privacy disclosure to users. Disclosure of location information may lead to people exposing more personal privacy information, and thus location privacy has become one of the most important privacy of people.
Differential privacy technology is an important technology to address trace privacy protection, which achieves privacy protection by adding noise to a real dataset. Differential privacy is mainly implemented through a privacy mechanism, the first general differential privacy mechanism is the laplace mechanism, which is mainly aimed at numeric queries. For non-numeric queries, this is achieved through an exponential mechanism, which is the second general mechanism to achieve differential privacy. In data distribution, differential privacy realizes different privacy protection degrees and data distribution precision by adjusting privacy parameters epsilon. In general, the greater the value of ε, the lower the degree of privacy protection and the higher the accuracy of the published data set. On the premise of protecting the track of the user by using the differential privacy technology, proper noise is added to track data, so that the usability of the data can be improved while privacy disclosure is avoided.
The privacy budget allocation scheme of the differential privacy technology is that the privacy budget is adjusted through Markov chain prediction, so that the allocation of the privacy budget is more reasonable, and the addition of noise data is reduced; the other is a similar track merging method, and track position points falling into the same grid are represented by the center points of the grid by dividing the track coverage area into grids, so that the count value of the position points is greatly improved.
In the prior art, the research on the location differential privacy protection technology mainly has the following disadvantages: (1) The existing track privacy protection mechanism does not consider the sensitivity map, so the sensitivity of the user position is not accurate enough. (2) Existing location privacy preserving mechanisms do not take into account the impact of semantic locations on trajectories. Semantic locations are likely to exacerbate the risk of disclosure of private information from a user, for example, user preferences, economic levels, etc. can be inferred based on the frequency of access to certain types of semantic location points by the user. (3) In the process of issuing differential privacy track data sets, the distribution mode of privacy budgets is one of key factors for determining the final noise addition amount. If the privacy budget is not reasonably allocated, serious waste is caused, so that the whole noise is excessively added. The current privacy budget allocation mode is still in average allocation or simple differential allocation, and a certain degree of waste phenomenon still exists. How to design a more reasonable privacy budget allocation mode according to the characteristics of the track data set is also lacking in related research.
Disclosure of Invention
The invention aims to provide a trace differential privacy protection method and a trace differential privacy protection system based on semantics and prediction.
In order to achieve the above purpose, the invention adopts the following technical scheme: a track differential privacy protection method based on semantics and prediction comprises the following steps:
Step S1, preprocessing semantic sensitivity: according to the distance and the access degree, the semantic sensitivity of the semantic sensitive position is respectively radiated to the nearby nodes, so that the semantic sensitivity of each position point is obtained, and the sign-in times alpha l of a user at the node l and the semantic sensitivity Sem l are combined to be used as the position sensitivity of the node l; determining the privacy level of each position point through the relation between the position sensitivity and a preset threshold value; then constructing a prefix tree according to the track set and the position sensitivity and privacy level of each position point;
Step S2, privacy budget is allocated according to the prefix tree: distributing the privacy budget of the track subsequence according to the average sensitivity of the track subsequence, and distributing the privacy budget of the position point according to the privacy level of the position point;
step S3, adjusting the distribution of privacy budget: predicting attack probability at the next moment through a Markov chain, and adjusting sensitivity through calculating the probability, so as to adjust the distributed privacy budget;
and S4, adding noise according to the privacy budget to change the privacy level of the position, so as to protect the privacy of the user track.
Further, in the step S1, considering the overall connectivity between the location points, the sensitivity of the semantically sensitive location is respectively radiated to the nearby nodes according to the distance and the access degree, specifically:
Firstly, acquiring a semantic location node set with a privacy level near an arbitrary location a, namely a connection set neighborSet; then, converting the map into an undirected graph, wherein according to the distance and the access degree, the equivalent distance between the semantic position g i and the arbitrary position a is g i. EDis =ed (c-1), wherein ED is the euclidean distance between g i and a, c is the number of nodes passing through the shortest path between two position nodes, and c-1 is the number of line segments included in the shortest path between two position nodes; neighborSet = { g i|gi. EDis < b }, where b is a threshold set by the user;
Finally, the semantic sensitivity of the radiation of the semantic position g i in the connection set neighborSet of the arbitrary position a is obtained, as shown in the formula (1):
Wherein Sem a represents the semantic sensitivity assigned by node a.
Further, to facilitate computation, the map is gridded; then, the semantic sensitivity of each region in the map is calculated by the above calculation process, and a semantic sensitivity map sen is generated.
Further, in the step S2, a privacy budget is allocated according to the sensitivity and the privacy level of the location point; for a position with high sign-in frequency, the sensitivity is high, the privacy level is high, and the corresponding allocated privacy budget is less, so that more noise is added to protect the position information; the step S2 mainly includes privacy budget allocation of the track sub-sequence and privacy budget allocation of each sub-node on the track sub-sequence, specifically: firstly, respectively calculating the average sensitivity of each track subsequence, thereby calculating the access frequency of the subsequence; then, privacy budget is allocated to the track subsequence according to the access frequency, the higher the access frequency is, the higher the sensitivity is, and the allocated privacy budget is inversely proportional to the access frequency; secondly, distributing privacy budget for each node according to the proportion of the privacy level of each node on each track subsequence to the total privacy level of the track subsequence; finally, since part of the position points appear in multiple track sub-sequences, the repeatedly allocated privacy budgets are merged.
Further, in the step S3, the attack probability of the location point is predicted by using the property of the markov chain, and the property of the markov chain corresponds to the track, that is, the location point at the next moment depends only on the location point at the previous moment, so as to obtain the possible location set at the next moment and the attack probability of the location, and further adjust the privacy budget.
Further, the two most important components of the Markov chain: an initial state probability distribution and a state transition probability matrix; assume that the set of possible locations generated by the user at time t-1 isThe probability value is/>This is the initial state probability distribution; assuming that there are n possible positions for the trajectory of a user, i.e., l 1,l2,...,ln, the probability of a state transition from position l i to position l j is noted as P (l i→lj), then the matrix/>Namely, a state transition probability matrix;
Then calculating the possible position at the time t as The probability value is/>Wherein P (t)=P(t-1) P is the attack probability of the possible position at the moment t;
Assuming that an attacker attacks from an initial position point of the track and continue along the direction of the track; the properties of the Markov chain are used to calculate the probability of attack for the nodes on the prefix tree and the sensitivity is adjusted by calculating the probability, thereby adjusting the assigned privacy budget.
Further, in the step S4, a Laplace mechanism is used to add corresponding noise to the sensitivity of the location, so as to change the privacy level of the location; as the location privacy level changes, it is difficult for an attacker to find out the real preference of the user for the location.
Further, after the location privacy level is changed, the interest score IG u,l of the user u at the location l is calculated by formula (2); wherein S 'u,l and w' score represent the position sensitivity and position scoring weight after adding noise, respectively:
IGu,l=S′u,l×w′score (2)
Normalization processing is carried out on the IG u,l to obtain a normalized position score IGN u,l, and then a scoring Matrix IGN,IGNu.l of the user and the position is constructed as follows:
After a scoring Matrix IGN is obtained, calculating similarity sim (u, v) of the user by using the pearson correlation coefficient, and constructing a user similarity Matrix sim, wherein sim (u, v) represents the similarity between the user u and the user v;
where l (u, v) represents the set of common check-in locations for user u and user v, Representing the average location score of user u; finally, according to the user similarity Matrix sim, using n users with highest similarity with the target user as similar users; and the positions which are not visited by the target user in the position set of similar users are arranged in descending order of scores, and the first n positions are recommended to the target user.
Further, in the step S4, the feasibility and effectiveness of the track differential privacy protection method for track privacy protection are verified by using a location recommendation algorithm in LBS.
The invention also provides a trace differential privacy protection system based on semantics and prediction, which comprises a memory, a processor and computer program instructions which are stored on the memory and can be run by the processor, wherein the method steps can be realized when the processor runs the computer program instructions.
Compared with the prior art, the invention has the following beneficial effects: when differential privacy protection is carried out, the position privacy level is classified by utilizing position sensitivity combining position check-in frequency and semantic sensitivity, so that the track is converted into a prefix tree, and then corresponding privacy budget is allocated based on the prefix tree; and then, predicting the attack probability of the position through a Markov chain, so that the position sensitivity is adjusted, the distribution of privacy budget is further adjusted, the utilization rate of the privacy budget is improved, and the privacy protection of the track data is more reasonable. After the track data is subjected to privacy protection, the availability and the effectiveness of the method are verified through a common position recommendation algorithm, and the method can be used for maintaining the availability of the privacy data while guaranteeing the privacy of the track data.
Drawings
FIG. 1 is a system implementation architecture diagram of an embodiment of the present invention;
FIG. 2 is a schematic diagram of semantic sensitivity of a location according to an embodiment of the present invention;
FIG. 3 is a flow chart of an implementation of a sensitivity map building algorithm in an embodiment of the invention;
FIG. 4 is a flow chart of an implementation of a privacy budget allocation algorithm in an embodiment of the present invention;
FIG. 5 is a flow chart of an implementation of a privacy budget adjustment algorithm in an embodiment of the present invention;
FIG. 6 is a flow chart of an implementation of a location recommendation algorithm in an embodiment of the invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
As shown in fig. 1, the present embodiment provides a track differential privacy protection method based on semantics and prediction, which includes the following steps:
step S1, preprocessing semantic sensitivity: according to the distance and the access degree, the semantic sensitivity of the semantic sensitive position is respectively radiated to the nearby nodes, so that the semantic sensitivity of each position point is obtained, and the sign-in times alpha l of a user at the node l and the semantic sensitivity Sem l are combined to be used as the position sensitivity of the node l; determining the privacy level of each position point through the relation between the position sensitivity and a preset threshold value; and then constructing a prefix tree according to the track set and the position sensitivity and privacy level of each position point.
In preprocessing the semantic sensitivity, the semantic sensitivity is allocated to semantic positions close to the sensitive position by considering the influence of the semantic position, and even if the semantic positions are not directly connected with the sensitive position, the risk of exposing the sensitive position still exists. Therefore, the invention considers the whole connectivity among the position points, and radiates the semantic sensitivity of the semantic sensitive position to the nearby nodes according to the distance and the access degree. The method comprises the following steps:
As shown in fig. 2, first, a semantic location node set, i.e., a connection set neighborSet, having a privacy level near an arbitrary location a is acquired; then, the map is converted into an undirected graph, and according to the distance and the degree of ingress and egress, the equivalent distance between the semantic position g i and the arbitrary position a is g i. EDis =ed (c-1), where ED is the euclidean distance between g i and a, c is the number of nodes through which the shortest path between two position nodes passes, and c-1 is the number of segments included in the shortest path between two position nodes, for example, the shortest path between two position nodes includes 3 nodes and two segments. neighborSet = { g i|gi. EDis < b }, where b is a threshold set by the user.
Finally, the semantic sensitivity of the radiation of the semantic position g i in the connection set neighborSet of the arbitrary position a is obtained, as shown in the formula (1):
Wherein Sem a represents the semantic sensitivity assigned by node a.
In order to facilitate the calculation, the invention grids the map; then, the semantic sensitivity of each region in the map is calculated by using the calculation process, a semantic sensitivity map sen.mapsen is generated and stored at the mobile phone end, and the user can acquire the sensitivity of the position in an off-line stage.
As shown in fig. 3, the implementation flow of the sensitivity map Building algorithm (SENSITIVITY MAP Building, SMB) in this embodiment is as follows:
Input: user check-in location data set T
And (3) outputting: sensitivity map sen(li,Si,pli), prefix tree TT
Firstly, calculating the semantic sensitivity of each position point in the track according to the influence range of the semantic node, and then combining the check-in times of the user at the sensitive position to be used as the sensitivity of the position. The privacy level of the location is then determined based on the sensitivity of the location points. Setting the location privacy level to 3 when the location sensitivity is less than 10; the privacy level is set to 2 when the position sensitivity is between 10 and 50, and to 1 when the position sensitivity is 50 or more. And finally obtaining the position information containing the sensitivity and the privacy level.
Step S2, privacy budget is allocated according to the prefix tree: the privacy budget of the track sub-sequence is allocated according to the average sensitivity of the track sub-sequence, and the privacy budget of the location point is allocated according to the privacy level of the location point.
The invention allocates privacy budgets according to the sensitivity and privacy level of the location points; for locations with high check-in frequencies, the sensitivity is high, the privacy level is high, and the corresponding allocated privacy budget is small, thereby adding more noise to protect the location information. Specifically, the step S2 mainly includes the allocation of the privacy budget of the track sub-sequence and the allocation of the privacy budget of each sub-node on the track sub-sequence, specifically:
Firstly, respectively calculating the average sensitivity of each track subsequence, thereby calculating the access frequency of the subsequence; the track sub-sequences are then allocated a privacy budget according to the access frequency, the higher the sensitivity, the allocated privacy budget being inversely proportional to the access frequency. And secondly, distributing privacy budget to each node according to the proportion of the privacy level of each node on each track subsequence to the total privacy level of the track subsequence. Finally, since part of the position points appear in multiple track sub-sequences, the repeatedly allocated privacy budgets are merged.
As shown in fig. 4, the implementation flow of the privacy budget allocation algorithm (Privacy Budget Allocation, PBA) in this embodiment is as follows:
Input: privacy budget epsilon and prefix tree TT
And (3) outputting: track set TB after privacy budget allocation
The privacy budget allocation scheme of the invention mainly comprises two steps: privacy budget allocation for the track sub-sequence and privacy budget allocation for each child node on the track sub-sequence. Firstly, calculating the average sensitivity of each track subsequence, so as to calculate the access frequency of the subsequence; the track sub-sequences are then allocated a privacy budget according to the access frequency, which should be inversely proportional to the access frequency, since the higher the access frequency the higher the sensitivity. Secondly, distributing privacy budget for each node according to the proportion of the privacy level of each node on each track subsequence to the total privacy level of the track subsequence; finally, since part of the position points appear in multiple track sub-sequences, the repeatedly allocated privacy budgets are merged.
Step S3, adjusting the distribution of privacy budget: the probability of attack at the next moment is predicted by a Markov chain and the sensitivity is adjusted by calculating the probability, thereby adjusting the assigned privacy budget.
In the step S3, the attack probability of the location point is predicted by using the property of the markov chain, and the property of the markov chain corresponds to the track, that is, the location point at the next moment depends only on the location point at the previous moment, so as to obtain the possible location set at the next moment and the attack probability of the location, and further adjust the privacy budget.
The two most important components of the markov chain: an initial state probability distribution and a state transition probability matrix; assume that the set of possible locations generated by the user at time t-1 isThe probability value isThis is the initial state probability distribution; assuming that there are n possible positions for the trajectory of a user, i.e., l 1,l2,...,ln, the probability of a state transition from position l i to position l j is P (l i→lj), then the matrixNamely, a state transition probability matrix;
Then calculating the possible position at the time t as The probability value is/>Wherein P (t)=P(t-1) P is the attack probability of the possible position at the moment t;
Assuming that an attacker attacks from an initial position point of the track and continue along the direction of the track; the properties of the Markov chain are used to calculate the probability of attack for the nodes on the prefix tree and the sensitivity is adjusted by calculating the probability, thereby adjusting the assigned privacy budget.
As shown in fig. 5, the implementation flow of the privacy budget adjustment algorithm (Privacy Budget adjustment, PBAD) in this embodiment is as follows:
input: track set TB with allocated privacy budget
And (3) outputting: track set TC after privacy budget adjustment
The two most important components of the markov chain are the initial state probability distribution and the state transition matrix. The possible position set and probability value generated by the user at the time t-1 are initial state probability distribution, then a probability transition matrix is calculated according to the historical data, the attack probability of the nodes on the prefix tree is calculated by utilizing the property of the Markov process, and the sensitivity is adjusted by calculating the probability, so that the allocated privacy budget is adjusted.
And S4, adding noise according to the privacy budget to change the privacy level of the position, so as to protect the privacy of the user track.
In the step S4, a Laplace mechanism is used to add corresponding noise to the sensitivity of the location, so as to change the privacy level of the location; as the location privacy level changes, it is difficult for an attacker to find out the real preference of the user for the location.
After the location privacy level is changed, calculating an interest score IG u,l for user u at location l by equation (2); wherein S 'u,l and w' score represent the position sensitivity and position scoring weight after adding noise, respectively:
IGu,l=S′u,l×w′score (2)
Because the direct use of the position sensitivity as the position score can cause overlarge score difference between the positions and affect the accuracy of the result, the IG u,l is normalized to obtain a normalized position score IGN u,l, and then a scoring Matrix IGN,IGNu,l of the user and the position is constructed as follows:
Calculating similarity sim (u, v) of the user by using Pearson correlation coefficients after obtaining a scoring Matrix IGN, and constructing a user similarity Matrix sim, wherein sim (u, v) represents the similarity between the user u and the user v;
where l (u, v) represents the set of common check-in locations for user u and user v, Representing the average location score of user u; finally, according to the user similarity Matrix sim, using n users with highest similarity with the target user as similar users; and the positions which are not visited by the target user in the position set of similar users are arranged in descending order of scores, and the first n positions are recommended to the target user.
As shown in fig. 6, the implementation flow of the location recommendation algorithm (Location recommendation, LR) in this embodiment is as follows:
input: track set TC after privacy budget adjustment
And (3) outputting: location recommendation set LR
After obtaining the privacy budget allocated by each location of the user, the invention uses the Laplace mechanism to add the corresponding noise to the sensitivity of the location to change the privacy level of the location. And then calculating the interest score of the user u at the position l, wherein the interest score is normalized because the position sensitivity is directly used as the position score to cause the excessive score difference between the positions and influence the accuracy of the result. And then calculating the similarity of the users by using the pearson correlation coefficient, and constructing a user similarity matrix. And finally, taking n users with highest similarity with the target user as similar users according to the user similarity matrix. And the positions which are not visited by the target user in the position set of similar users are arranged in descending order of scores, and the first n positions are recommended to the target user.
After the track data is subjected to privacy protection, the availability and the effectiveness of the track data are verified through a position recommendation algorithm commonly used in LBS, so that the track data privacy protection method and the track data privacy protection device can maintain the availability of the privacy data while guaranteeing the privacy of the track data.
The embodiment also provides a trace differential privacy protection system based on semantics and prediction, which comprises a memory, a processor and computer program instructions stored on the memory and capable of being executed by the processor, wherein the processor can realize the steps of the method when executing the computer program instructions.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the invention in any way, and any person skilled in the art may make modifications or alterations to the disclosed technical content to the equivalent embodiments. However, any simple modification, equivalent variation and variation of the above embodiments according to the technical substance of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims (8)

1. The track differential privacy protection method based on semantics and prediction is characterized by comprising the following steps of:
Step S1, preprocessing semantic sensitivity: according to the distance and the access degree, the semantic sensitivity of the semantic sensitive position is respectively radiated to the nearby nodes, so that the semantic sensitivity of each position point is obtained, and the sign-in times alpha l of a user at the node l and the semantic sensitivity Sem l are combined to be used as the position sensitivity of the node l; determining the privacy level of each position point through the relation between the position sensitivity and a preset threshold value; then constructing a prefix tree according to the track set and the position sensitivity and privacy level of each position point;
Step S2, privacy budget is allocated according to the prefix tree: distributing the privacy budget of the track subsequence according to the average sensitivity of the track subsequence, and distributing the privacy budget of the position point according to the privacy level of the position point;
Step S3, adjusting the distribution of privacy budget: predicting attack probability at the next moment through a Markov chain, and adjusting sensitivity through calculating the probability, so as to adjust the distributed privacy budget;
S4, adding noise according to privacy budget to change the privacy level of the position, so as to protect the privacy of the user track;
in the step S2, a privacy budget is allocated according to the sensitivity and the privacy level of the location point; for a position with high sign-in frequency, the sensitivity is high, the privacy level is high, and the corresponding allocated privacy budget is less, so that more noise is added to protect the position information; the step S2 mainly includes privacy budget allocation of the track sub-sequence and privacy budget allocation of each sub-node on the track sub-sequence, specifically: firstly, respectively calculating the average sensitivity of each track subsequence, thereby calculating the access frequency of the subsequence; then, privacy budget is allocated to the track subsequence according to the access frequency, the higher the access frequency is, the higher the sensitivity is, and the allocated privacy budget is inversely proportional to the access frequency; secondly, distributing privacy budget for each node according to the proportion of the privacy level of each node on each track subsequence to the total privacy level of the track subsequence; finally, as partial position points appear in a plurality of track subsequences, merging the repeatedly allocated privacy budgets;
The two most important components of the markov chain: an initial state probability distribution and a state transition probability matrix; assume that the set of possible locations generated by the user at time t-1 is The probability value isIs an initial state probability distribution; assuming that there are n possible positions l 1,l2,...,ln on the trajectory of a user, the probability of a state transition from position l i to position l j is denoted as P (l i→lj), then the matrix/> Is a state transition probability matrix;
Then calculating the possible position at the time t as The probability value is/>Wherein P (t)=P(t-1) P is the attack probability of the possible position at the moment t;
Assuming that an attacker attacks from an initial position point of the track and continue along the direction of the track; the properties of the Markov chain are used to calculate the probability of attack for the nodes on the prefix tree and the sensitivity is adjusted by calculating the probability, thereby adjusting the assigned privacy budget.
2. The method for protecting trace differential privacy based on semantics and prediction according to claim 1, wherein in the step S1, sensitivity of the semantically sensitive location is respectively radiated to the nearby nodes according to the distance and the ingress and egress degree, taking into account the overall connectivity between the location points, specifically:
Firstly, acquiring a semantic location connection set neighborSet with a privacy level near an arbitrary location a; then, converting the map into an undirected graph, wherein according to the distance and the access degree, the equivalent distance between the semantic position g i and the arbitrary position a is g i. EDis =ed (c-1), wherein ED is the euclidean distance between g i and a, c is the number of nodes passing through the shortest path between two position nodes, and c-1 is the number of line segments included in the shortest path between two position nodes; neighborSet = { g i|gi. EDis < b }, where b is a threshold set by the user;
Finally, the semantic sensitivity of the radiation of the semantic position g i in the connection set neighborSet of the arbitrary position a is obtained, as shown in the formula (1):
Wherein Sem a represents the semantic sensitivity assigned by node a.
3. The semantic and prediction based trajectory differential privacy preserving method of claim 2 wherein for ease of computation, the map is meshed; then, the semantic sensitivity of each region in the map is calculated by step S1, and a semantic sensitivity map sen is generated.
4. The method according to claim 1, wherein in the step S3, the attack probability of the location point is predicted by using the property of the markov chain, and the property of the markov chain corresponds to the location point in the track, or the location point at the next moment depends only on the location point at the previous moment, so as to obtain the possible location set at the next moment and the attack probability of the location, and further adjust the privacy budget.
5. The method according to claim 1, wherein in step S4, the corresponding noise is added to the sensitivity of the location using the Laplace mechanism to change the privacy level of the location; as the location privacy level changes, it is difficult for an attacker to find out the real preference of the user for the location.
6. The semantic and prediction based trajectory differential privacy preserving method of claim 5 wherein after a change in location privacy level, the user u's interest score IG u,l at location l is calculated by equation (2); wherein S 'u,l and w' score represent the position sensitivity and position scoring weight after adding noise, respectively:
IGu,l=S'u,l×w'score (2)
Normalization processing is carried out on the IG u,l to obtain a normalized position score IGN u,l, and then a scoring Matrix IGN,IGNu,l of the user and the position is constructed as follows:
After a scoring Matrix IGN is obtained, calculating similarity sim (u, v) of the user by using the pearson correlation coefficient, and constructing a user similarity Matrix sim, wherein sim (u, v) represents the similarity between the user u and the user v;
where l (u, v) represents the set of common check-in locations for user u and user v, Representing the average location score of user u; finally, according to the user similarity Matrix sim, using n users with highest similarity with the target user as similar users; and the positions which are not visited by the target user in the position set of similar users are arranged in descending order of scores, and the first n positions are recommended to the target user.
7. The method according to claim 1, wherein in step S4, the feasibility and effectiveness of the method for protecting track privacy are verified by using a location recommendation algorithm in LBS.
8. A semantic and predictive trajectory differential privacy preserving system comprising a memory, a processor and computer program instructions stored on the memory and executable by the processor, the processor when executing the computer program instructions being capable of implementing the method steps of any one of claims 1 to 7.
CN202210190712.9A 2022-02-28 2022-02-28 Trajectory differential privacy protection method and system based on semantics and prediction Active CN114564747B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210190712.9A CN114564747B (en) 2022-02-28 2022-02-28 Trajectory differential privacy protection method and system based on semantics and prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210190712.9A CN114564747B (en) 2022-02-28 2022-02-28 Trajectory differential privacy protection method and system based on semantics and prediction

Publications (2)

Publication Number Publication Date
CN114564747A CN114564747A (en) 2022-05-31
CN114564747B true CN114564747B (en) 2024-04-23

Family

ID=81715656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210190712.9A Active CN114564747B (en) 2022-02-28 2022-02-28 Trajectory differential privacy protection method and system based on semantics and prediction

Country Status (1)

Country Link
CN (1) CN114564747B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116595254B (en) * 2023-05-18 2023-12-12 杭州绿城信息技术有限公司 Data privacy and service recommendation method in smart city
CN117910046A (en) * 2024-03-18 2024-04-19 青岛他坦科技服务有限公司 Electric power big data release method based on differential privacy protection

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109257385A (en) * 2018-11-16 2019-01-22 重庆邮电大学 A kind of location privacy protection strategy based on difference privacy
CN110598447A (en) * 2019-09-17 2019-12-20 西北大学 T-close privacy protection method meeting epsilon-difference privacy
CN110750806A (en) * 2019-07-16 2020-02-04 黑龙江省科学院自动化研究所 TP-MFSA (TP-Multi-function document analysis) inhibition release-based high-dimensional position track data privacy protection release system and method
WO2020230061A1 (en) * 2019-05-14 2020-11-19 Telefonaktiebolaget Lm Ericsson (Publ) Utility optimized differential privacy system
CN112069532A (en) * 2020-07-22 2020-12-11 安徽工业大学 Track privacy protection method and device based on differential privacy
CN112560084A (en) * 2020-12-11 2021-03-26 南京航空航天大学 Differential privacy track protection method based on R tree
CN113254999A (en) * 2021-06-04 2021-08-13 郑州轻工业大学 User community mining method and system based on differential privacy
CN113361694A (en) * 2021-06-30 2021-09-07 哈尔滨工业大学 Layered federated learning method and system applying differential privacy protection
CN113934945A (en) * 2021-10-18 2022-01-14 暨南大学 Semantic-sensitive position track privacy protection method, system and medium for track publishing data set

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10546043B1 (en) * 2018-07-16 2020-01-28 Here Global B.V. Triangulation for K-anonymity in location trajectory data
US11763093B2 (en) * 2020-04-30 2023-09-19 Arizona Board Of Regents On Behalf Of Arizona State University Systems and methods for a privacy preserving text representation learning framework

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109257385A (en) * 2018-11-16 2019-01-22 重庆邮电大学 A kind of location privacy protection strategy based on difference privacy
WO2020230061A1 (en) * 2019-05-14 2020-11-19 Telefonaktiebolaget Lm Ericsson (Publ) Utility optimized differential privacy system
CN110750806A (en) * 2019-07-16 2020-02-04 黑龙江省科学院自动化研究所 TP-MFSA (TP-Multi-function document analysis) inhibition release-based high-dimensional position track data privacy protection release system and method
CN110598447A (en) * 2019-09-17 2019-12-20 西北大学 T-close privacy protection method meeting epsilon-difference privacy
CN112069532A (en) * 2020-07-22 2020-12-11 安徽工业大学 Track privacy protection method and device based on differential privacy
CN112560084A (en) * 2020-12-11 2021-03-26 南京航空航天大学 Differential privacy track protection method based on R tree
CN113254999A (en) * 2021-06-04 2021-08-13 郑州轻工业大学 User community mining method and system based on differential privacy
CN113361694A (en) * 2021-06-30 2021-09-07 哈尔滨工业大学 Layered federated learning method and system applying differential privacy protection
CN113934945A (en) * 2021-10-18 2022-01-14 暨南大学 Semantic-sensitive position track privacy protection method, system and medium for track publishing data set

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Novel Trajectory privacy-Preserving method based on prefix tree using differential privacy;Xiaodong Zhao 等;Knowledge-Based Systems;20200621;第198卷;1-15 *
PrivateCheckIn:一种移动社交网络中的轨迹隐私保护方法;霍峥 等;计算机学报;20130415;第36卷(第04期);716-726 *
Successive Trajectory PrivacyProtection with Semantics Prediction Differential Privacy;Jing Zhang 等;Topic Computational Complex Networks;20220823;第24卷(第9期);1-19 *
基于差分隐私模型的位置轨迹发布技术研究;冯登国 等;电子与信息学报;20200115;第42卷(第01期);74-88 *
基于语义的位置服务隐私保护综述;马明杰 等;网络与信息安全学报;20161215;第2卷(第12期);1-11 *

Also Published As

Publication number Publication date
CN114564747A (en) 2022-05-31

Similar Documents

Publication Publication Date Title
CN114564747B (en) Trajectory differential privacy protection method and system based on semantics and prediction
KR20220112766A (en) Federated Mixed Models
CN112148987A (en) Message pushing method based on target object activity and related equipment
CN114065287B (en) Track differential privacy protection method and system for resisting predictive attack
CN108834077B (en) Tracking area division method and device based on user movement characteristics and electronic equipment
JP2011171876A (en) Portable terminal for estimating address/whereabouts as user moves, server, program, and method
US11176217B2 (en) Taxonomy-based system for discovering and annotating geofences from geo-referenced data
Fujiwara et al. Fast and exact top-k algorithm for pagerank
US10552438B2 (en) Triggering method for instant search
CN112418525B (en) Method and device for predicting social topic group behaviors and computer storage medium
CN111339091A (en) Position big data differential privacy division and release method based on non-uniform quadtree
CN111460234A (en) Graph query method and device, electronic equipment and computer readable storage medium
CN113254999A (en) User community mining method and system based on differential privacy
Wang et al. Location-aware influence maximization over dynamic social streams
Gupta et al. SPMC-CRP: a cache replacement policy for location dependent data in mobile environment
Ning et al. Differential privacy protection on weighted graph in wireless networks
WO2019042275A1 (en) Method for determining movement mode within target regional range, and electronic device
Gupta Some issues for location dependent information system query in mobile environment
Donohoo et al. Exploiting spatiotemporal and device contexts for energy-efficient mobile embedded systems
CN112819157B (en) Neural network training method and device, intelligent driving control method and device
CN108696418B (en) Privacy protection method and device in social network
CN112699402A (en) Wearable device activity prediction method based on federal personalized random forest
CN116957678A (en) Data processing method and related device
CN110598122B (en) Social group mining method, device, equipment and storage medium
CN111667028B (en) Reliable negative sample determination method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant