CN107247909A - A kind of difference privacy methods for protecting multiple positions in location information service - Google Patents

A kind of difference privacy methods for protecting multiple positions in location information service Download PDF

Info

Publication number
CN107247909A
CN107247909A CN201710433690.3A CN201710433690A CN107247909A CN 107247909 A CN107247909 A CN 107247909A CN 201710433690 A CN201710433690 A CN 201710433690A CN 107247909 A CN107247909 A CN 107247909A
Authority
CN
China
Prior art keywords
privacy
user
consumption
point
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710433690.3A
Other languages
Chinese (zh)
Other versions
CN107247909B (en
Inventor
朱马克
华景煜
仲盛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201710433690.3A priority Critical patent/CN107247909B/en
Publication of CN107247909A publication Critical patent/CN107247909A/en
Application granted granted Critical
Publication of CN107247909B publication Critical patent/CN107247909B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2111Location-sensitive, e.g. geographical location, GPS

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The present invention disclose a kind of difference privacy methods for protecting multiple positions in location information service, improves indistinguishability algorithm in original ground, it is proposed that predict testing mechanism.This method privacy a small amount of by consuming approximately consumes to build one of actual position to reduce total privacy.It can greatly reduce privacy consumption on the premise of availability of data is damaged not too much.For the performance of the mechanism of testing proposition, we are tested on two popular data sets.As a result show that our mechanism considerably reduces privacy consumption really, while ensure that the availability of data.

Description

A kind of difference privacy methods for protecting multiple positions in location information service
Technical field
The present invention relates to a kind of difference privacy methods for protecting multiple positions in location information service, belong to positional information Electric powder prediction.
Background technology
In recent years, with the continuous popularization of the smart mobile phone with GPS functions, location information service is also people's More and more important role is play in life.Almost all of smart mobile phone all can with it is various the reasons why explicitly or implicitly Ground uses the position data of user.For example, the friend that facebook is inquired about using the position data of user near user.News application Local news is then pushed using the position data of user.
Unfortunately, although location information service brings huge facility for our life, it also result in seriously Privacy concern.User is typically reluctant to being exposed to the real time position data of itself into the 3rd including service provider Side.Because user is once oneself position data is uploaded, with regard to uncontrollable third party position data is carried out it is each Plant operation.The third party of malice may track user using position data, so that infer the home address of user, it is interested Place, even such as health status or religious belief etc sensitive information.Therefore, we are in the urgent need to secret protection Type location information service, the positional information of user is hidden, while ensureing high-quality service.
Existing solution to this problem can be divided into two major classes.The first kind is cryptography method.This method with It is encrypted before the uploading position data of family.It can protect the individual privacy of user completely, and there is provided the secret protection that can be proved. However, encryption method can seriously damage the availability of data, service provider is caused to be difficult to provide valuable location-based service.And And, cryptography method generally very takes, and this is insufferable on a handheld device.Equations of The Second Kind method is based on to initial data Disturbed, so as to prevent the accurate location of third party acquisition user.Compared with cryptography method, perturbation motion method more light weight Level, and the infringement to availability of data is much smaller.Therefore, third party is also more willing to receive this method.But, due to this Method has still revealed the non-accurate data of user, and its security still leaves a question open.
Recently, Andres et al. proposes indistinguishability in ground --- first formal privacy based on position disturbance Model.This model is developed by difference privacy, and can provide evincible privacy guarantee.Specifically, if Under some disruption and recovery, any two is smaller than the position of a given threshold value and produces identical carry-out bit with close probability Put, then the disruption and recovery just contentedly in indistinguishability.So as to which third party can not recognize from a series of positions closed on Go out the actual position of user.Andres et al. devises corresponding algorithm, the algorithm by add two Laplacian noises come Indistinguishability in contentedly.However, the mechanism is primarily designed for protecting single position.If directly using it for protection Multiple positions, overall privacy consumption can increase with the increase of number of positions.This means the position that user can be carried out The number of times of service is extremely limited.Otherwise, privacy is depleted to zero, and the privacy of user will be destroyed.This limitation is unacceptable , because the position of user can be used for multiple times in many location based service providers within one day even one hour in actual life Data.For example, driver may be obtained in real time when using online navigation per that will carry out within several seconds a position enquiring Road information.Therefore, we are in the urgent need to improving indistinguishability disruption and recovery in existing ground so that keeping original Privacy consumption is reduced as far as on the premise of availability of data.
The content of the invention
Goal of the invention:As location information service becomes to become more and more popular, its privacy concern brought is increasingly highlighted, and is used Family is typically reluctant to the position of itself being exposed to third party user (including service provider).In order to solve this problem, ground In indistinguishability --- a kind of variant of difference privacy --- be proposed out.This privacy model passes through on home position Plus noise so that neighbouring position produces identical outgoing position with approximate probability.However, this method is originally designed use To protect single position.If directly using it for protecting multiple positions, overall privacy consumption can be with number of positions Increase and increase sharply, cause user to carry out the very limited amount of position enquiring service of number of times.The present invention provides one kind and existed The difference privacy methods of multiple positions are protected in location information service, indistinguishability algorithm in original ground is improved, proposed Prediction testing mechanism (Predict and Test Mechanism, abbreviation PTM).This method privacy a small amount of by consuming come Build actual position one approximately reduces total privacy consumption.It can be on the premise of availability of data be damaged not too much Greatly reduce privacy consumption.For the performance of the mechanism of testing proposition, we are tested on two popular data sets. As a result show that our mechanism considerably reduces privacy consumption really, while ensure that the availability of data.
Technical scheme:A kind of difference privacy methods for protecting multiple positions in location information service, comprise the following steps:
Step 1, test the order of accuarcy of future position using threshold value R (1-2 times that is typically set to change radius of circle) and determine Determine publishing point, input user will protect real trace point X1,X2…;Being distributed gamma (2,1/ ε) value l according to gamma, (equation isThe π of wherein β=2, α=2, x is stochastic variable, and l=x is made after one x of sampling), ε refers to for user Fixed privacy parameters;
Step 2, according to U (0,2 π) value θ is uniformly distributed, (equation isWherein a=0, b=2 π, x For stochastic variable, θ=x is made after the x that samples);
Step 3, Y is made1=X1+(lcosθ,lsinθ);
Step 4, Y is exported1
Step 5, i=2, s=0, f=1 are made;
Step 6, gamma (2,1/ ε) value l is distributed according to gamma;
Step 7, according to being uniformly distributed U (0,2 π) value θ;
Step 8, Y is madei=Xi+(lcosθ,lsinθ);
Step 9, structure forecast point P, produces random number ran (between 0,1), if ran < s/ (s+f):Issue Yi=P, i =i+1, return to step 6;Otherwise:Perform step 10;
Step 10, if dis (Yi, P) and < R:Issue Yi=P, s=s+1;Otherwise:Issue Yi, f=f+1;
Step 11:I=i+1, return to step 6.
Structure forecast point P method is:
When user is static, Pi=Yi-1
When user (is often referred to actual position distance twice and is no more than R) mobile at a slow speed, Pi=Yi-1
When user at a high speed move by (be often referred to twice actual position distance more than R), Pi=2Yi-1-Yi-2
Beneficial effect:Compared with prior art, the present invention is provided a kind of protects multiple positions in location information service Difference privacy methods, have the following advantages that:
(1) when a user wants to carry out the position enquiring of a new round, the present invention is first according to passing position enquiring History (namely the publishing point of user.Because being the one point processing of a point, publishing point before can be regarded as inquiry is gone through History) predict position that user is current.Afterwards by the predicted position with being obtained using indistinguishability disruption and recovery in original ground To position be compared.If the distance between two positions are less than a predefined threshold value, it is believed that this prediction It is successful, and position enquiring is carried out using the predicted value.Otherwise it is assumed that prediction of failure, uses indistinguishability in original ground The position that disruption and recovery is obtained carries out position enquiring.The property that the mechanism still meets indistinguishability in ground is we demonstrated, And privacy consumption can be significantly reduced.
(2) devised for three kinds of main scenes (stationary user, at a slow speed user and highspeed user) simple and efficient Forecasting Methodology.We also add a hopping strategy further to improve.This step can perceive the available of user's publishing point Character condition, and attempt to sacrifice certain availability further to reduce privacy consumption.Based on above method, privacy is greatly reduced Consumption.Privacy consumption is even fallen below into constant rank under some scenes.
(3) experiment has been carried out on True Data collection to verify the performance of the present invention.Experiment is based on two popular data Collection:Geolife and T-drive.In stationary user, tested respectively under three scenes of user and highspeed user at a slow speed, and And have evaluated corresponding privacy consumption and availability.The privacy point that the experimental result display present invention can be saved under three kinds of scenes 98%, 81% and 55% is not reached, while ensureing that the availability of user data is not much affected.
Brief description of the drawings
Fig. 1 (a) is Y' present positions schematic diagram when being successfully tested;
Y' present positions schematic diagram when Fig. 1 (b) is test crash;
Fig. 2 (a) is εcWhen being successfully tested with d change curve;
Fig. 2 (b) is εcWhen being successfully tested with R change curve;
Fig. 3 (a) is probability change curve with d when being successfully tested;
Fig. 3 (b) is probability change curve with R when being successfully tested;
Fig. 4 (a) is change curve of the privacy consumption Cumulative probability function in R=100 under stationary user;
Fig. 4 (b) is change curve of the probability of error aggregation function in R=100 under stationary user;
Fig. 4 (c) is change curve of the privacy consumption Cumulative probability function in R=200 under stationary user;
Fig. 4 (d) is change curve of the probability of error aggregation function in R=200 under stationary user;
Fig. 4 (e) is change curve of the privacy consumption Cumulative probability function in R=300 under stationary user;
Fig. 4 (f) is change curve of the probability of error aggregation function in R=300 under stationary user;
Fig. 5 (a) is change curve of the privacy consumption Cumulative probability function in R=100 under user at a slow speed;
Fig. 5 (b) is change curve of the probability of error aggregation function in R=100 under user at a slow speed;
Fig. 5 (c) is change curve of the privacy consumption Cumulative probability function in R=200 under user at a slow speed;
Fig. 5 (d) is change curve of the probability of error aggregation function in R=200 under user at a slow speed;
Fig. 5 (e) is change curve of the privacy consumption Cumulative probability function in R=300 under user at a slow speed;
Fig. 5 (f) is change curve of the probability of error aggregation function in R=300 under user at a slow speed;
Fig. 6 (a) is change curve of the privacy consumption Cumulative probability function in R=100 under highspeed user;
Fig. 6 (b) is change curve of the probability of error aggregation function in R=100 under highspeed user;
Fig. 6 (c) is change curve of the privacy consumption Cumulative probability function in R=200 under highspeed user;
Fig. 6 (d) is change curve of the probability of error aggregation function in R=200 under highspeed user;
Fig. 6 (e) is change curve of the privacy consumption Cumulative probability function in R=300 under highspeed user;
Fig. 6 (f) is change curve of the probability of error aggregation function in R=300 under highspeed user.
Embodiment
With reference to specific embodiment, the present invention is furture elucidated, it should be understood that these embodiments are merely to illustrate the present invention Rather than limitation the scope of the present invention, after the present invention has been read, various equivalences of the those skilled in the art to the present invention The modification of form falls within the application appended claims limited range.
Difference privacy and positional information protection
Introduce expansion (the i.e. prior art of the concept and difference privacy of difference privacy in location information service in this part In indistinguishability).The expectation privacy of analysis Andres et al. disruption and recoveries of design is consumed and anticipation error after this.This A little comparison bases that will be tested as us.
Difference privacy
The privacy guarantee that can be proved to due to that can provide, difference privacy has become a very popular privacy mould Type.Its key idea is that the output of staqtistical data base will not change too in the situation that at most only one of which data record is changed Greatly.In other words, influence of the missing of personal data to whole system is extremely limited, so as to reach the mesh of protection individual privacy 's.The standard of difference privacy is defined as follows:
Define (ε-difference privacy):One issue mechanism A meets ε-difference privacy and if only if that A meets following condition:It is right Arbitrary adjacent data collection D and D'(is that is, they at most only differ a record), arbitrary output Z ∈ range (A), we Have:
ε in formula is used to weigh the privacy classes of issue mechanism.The value is smaller, and issue mechanism more can guarantee that stronger hidden Private grade.
In order to meet difference privacy, it would be desirable to which the original output to database increases noise.The amplitude of noise is by sensitivity Degree determines that it is defined as follows:
Define (susceptibility):To arbitrary adjacent data collection D and D', a query function f is given:D→Rd, its susceptibility It is defined as follows:
Laplce's mechanism is a kind of conventional algorithm for being used for realizing difference privacy.It passes through full to each output increase The noise of the certain laplacian distribution of foot realizes difference privacy, i.e.,:
Define (Laplce's mechanism):To arbitrary query function f:D→RdIf a mechanism A exports f (D)+Lap (Δ f/ ε), then mechanism A meets ε-difference privacy.
However, directly the difference privacy of standard is applied in guarded by location to be very difficult, because position is not It is also any and continuous to have the distance between adjacent concept, and diverse location.In order to which difference privacy is applied into position Protection, it would be desirable to the vague generalization definition of difference privacy, it is as follows:
Define (broad sense ε-difference privacy):One issue mechanism A meets broad sense ε-difference privacy and if only if that A meets as follows Condition:The data set D and D' of at most k bars record, arbitrary output Z ∈ range (A), Wo Menyou are differed to any two:
Privacy definition shows that it is bigger that database changes, and the change of the probability distribution for the output that we allow is bigger.This It is that difference privacy is applied to the basis gone in guarded by location.
Location privacy protection
In view of the need for the characteristic and practical application of two-dimensional space, we are applied in location privacy by difference privacy When need to make a little changes.
First, position issue does not have query function f concept.In order to simplify problem, we can assume that there is one to look into Ask function.The query function is an identity function, that is to say, that function exports essentially equal with input.This is follow-up concept Definition bring conveniently, such as susceptibility.
Secondly, position issue does not have the concept of adjacent position.This is also the reason for we use Generalized Difference privacy.I Need a measurement to describe the difference between diverse location.This database different from two differs bar and records similar Seemingly.It will be apparent that Euclidean distance is to describe the good selection of this difference.
Finally, we can not possibly preserve our planet on privacy a little.We need to preset a threshold value 2r, and Only consider the privacy of protection difference distance at most 2r position.In other words, we draw a circle, its center in actual position, Radius is r.We need only assure that any two points can produce identical output with approximate probability in garden.We claim this circle Make to change circle C.
Based on described above, we are defined as below:
Definition (in indistinguishability):One issue mechanism A meets indistinguishability in broad sense ε-ground and if only if that A expires The following condition of foot:Pair radius is any two position X in r variation circle C1And X2, arbitrary outgoing position Y, Wo Menyou:
It will be apparent that to any two points X in C1And X2, we have dis (X1,X2)≤2r.Here, dis (X1,X2) represent two Euclidean distance between point.
In order to meet geographical indistinguishability, it would be desirable to actual position plus a noisy vector.Andres et al. A kind of simple and general mechanism is described, it is as follows:
Algorithm 1 (in the continuous mechanism of indistinguishability):
Input:True point X;Privacy parameters ε;
Output:Publishing point Y;
1:Gamma (2,1/ ε) value l is distributed according to gamma;
2:According to being uniformly distributed U (0,2 π) value θ;
3:Y=X+ (lcos θ, lsin θ);
4:Export publishing point Y.
The correctness of algorithm has been demonstrated, refers to Andres et al. paper.For convenience, we claim this algorithm Make GICM algorithms.
Privacy consumes the desired value with error
Many worked uses ε as the comparison base in experiment.However, due to the particularity of location privacy protection, We have found that the safe class of issue mechanism is directly weighed using ε to throw into question.The following is our analysis:Privacy is consumed It is a more preferable selection for being used for weighing privacy intensity, we are designated as εc.In Laplce's mechanism, we can lead to Cross equation below and calculate the value:
ε in formulacShow that we can obtain how many new knowledge by observing output Z.The value and Laplce's mechanism In ε be equal.Likewise, for GICM algorithms, we can also identical mode define privacy consumption εc
Above-mentioned definition shows that we can obtain how many new knowledge after output point Y is observed.However, and Laplce Unlike mechanism, the value is unequal with the ε in GICM.ε is please noted for all possible output Y, and εcOnly For a specific output Y.We are used as an example with GICM algorithms.Assuming that input point is X and output point is Y;Become Dynamic circle C radius is r, and the distance between X and Y are l.Changed if Y falls outside circle C, i.e. l >=r, we can calculate and obtain εc
This seems to have no problem.But as l < r, situation will change.We still can calculate εc
The value is less than privacy parameters ε.This result shows that output Y can't be revealed when output point Y falls and changed in circle So much information as desired by us.Therefore, only using only ε can not accurate description mechanism reveal information number.Examine Considering output point Y has the probability distribution of their own, and we select E (εc) come weigh mechanism leakage information number.Applied To GICM algorithms, we obtain:
Theorem:For GICM algorithms, it expects privacy consumptionWherein r is to become The radius of circle is moved, ε is privacy parameters.
Prove:We need to consider two kinds of situations:Y falls in variation circle and Y falls changing round outer.X is set to coordinate former Point, Wo Menyou:
With r increase, desired value E (εc) reduce to ε/2 from ε.This is consistent with our intuition.It is big absolutely when r is close to 0 Most output point Y, which falls, to be changed outside circle, therefore E (εc) close to ε.When r becomes sufficiently large, most of output point Y It approximately can regard as to fall and change the center of circle, therefore E (εc) close to ε/2.
Except the desired value that privacy is consumed, the desired value of true point and the distance between publishing point will also consider emphatically, i.e., Error expects E (error).E (error) reflects the availability of publishing point.E (error) calculating is not difficult:
Theorem:For GICM algorithms, its anticipation error E (error)=2/ ε.
Prove:X is equally set to the origin of coordinates by us, is then had:
This explains why privacy parameters ε can not be set it is too small, otherwise the availability of publishing point will be by serious Influence.
Prediction and testing mechanism
Although GICM algorithms can be perfectly suitable for the situation of a single point, directly it is applied to multiple points, such as rail Mark data, can produce privacy concern.According to the property of difference privacy, if each point on track meets ε-difference privacy, Total privacy consumption can be accumulated.That is, the privacy consumption of whole piece track is n ε, wherein n is the number of the point in track.Cause This, if we directly protect our privacy using GICM algorithms, we can not carry out multiple location information service inquiry, Otherwise privacy can quickly be consumed light.
In view of the discussion of the above, reduces privacy consumption very meaningful.Due to it is contemplated that be line model, we only It can be handled one point of a point.So problem is reformed into:How privacy consumption is reduced when handling at oneAnswer Case is:Utilize historical data.Herein, historical data refers to what issued point and other attackers had known, energy For speculating that the data of user's actual position (please note that this can't reveal extra privacy, because we are not using any On the information truly put).Using these historical datas, we can obtain a future position.Although future position is unlikely Directly fall on true point, but as long as our prediction is good enough, then future position is just differed not with truly the distance between point Can be too big.A kind of intuitively method is directly to be released future position as publishing point.The advantage of this method is apparent:By In us at all just not using any information truly put, so privacy consumption is 0.However, because future position is based on upper Data once, and last data of the data based on upper last time.So predicated error can accumulate, and ultimately result in pre- Survey result unavailable.Therefore, it is intended that correcting the predicated error of accumulation using the information of not many actual position, simultaneously Excessive extra privacy consumption is not resulted in.And most important, the method designed by us will be still met can not area in ground Divide property, i.e. difference privacy.
Based on described above, we devise prediction and testing mechanism (Predict and Test Mechanism we It is denoted as PTM).Our algorithm still using truly point X and privacy parameters ε as input, exports a position Y.Algorithm 2 illustrates three Individual key step.First, it produces a noise point Y' based on X using algorithm 1.Our points are called preparation publishing point. Afterwards, the historical knowledge that it is grasped using attacker produces a future position P.The method for producing future position will be next Part is discussed.In final step, algorithm tests the order of accuarcy of future position using a threshold value R and determines publishing point.Specifically For, if the distance between Y' and P are less than R, it is believed that prediction is successful, and P is set into final publishing point.We It is referred to as being successfully tested, such as shown in Fig. 1 (a).Otherwise, it is believed that test crash simultaneously regard Y' as final publishing point.I Be referred to as test crash, shown in such as Fig. 1 (b).
Algorithm 2 (prediction and testing mechanism):
Input:True point X;Threshold value R;Privacy parameters ε;
Output:Publishing point Y;
1:Gamma (2,1/ ε) value l is distributed according to gamma;
2:According to being uniformly distributed U (0,2 π) value θ;
3:Y'=X+ (lcos θ, lsin θ);
4:Structure forecast point P;
5:If dis (Y', P) < R, P is issued, Y' is otherwise issued.
Theorem:Prediction and testing mechanism contentedly in indistinguishability.
Prove:Let us is used as explanation with Fig. 1.In figure, we use P as the origin of coordinates, then X coordinate is exactly (-d,0)。
When test crash, algorithm retracts into GICM algorithms, and it is clearly to meet geographical indistinguishability.When testing into During work(, it is complicated that calculating understands some.If P is published, then Y' should be located in test circle.The center of circle of test circle is P, and radius is R.Therefore we can calculate Pr (A (X)=P):
We need to find Pr (A (X)=P) maximum and minimum value.It can be seen thatWith d increasing Plus and reduce.That is, X is further away from P, Pr (A (X)=P) is smaller.We can change circle in find a bit, change the time away from It is closest from P, and the distance between they are set to dmin.We set dmax in the same way.So, for becoming Any two points X in dynamic circle C1And X2, Wo Menyou:
We have again:
It is because triangle inequality, and to have dmax-dmin≤2r that above-mentioned inequality, which is set up,.Finally, Wo Menyou:
Namely:
Proof is finished.
Please note that we can not calculate the E (ε of algorithm 2c) and E (error), because our Forecasting Methodology is unknown 's.But to analysis, we can be calculated for a given prediction point P, the corresponding privacy consumption of algorithm and error.
Theorem:For a given P, the ε of algorithmcIt is:
The error of algorithm is:
Prove:ε when being successfully testedcCalculating can be obtained from last proof.ε during test crashcCan be from " privacy is consumed With the desired value of error " this part obtains.The calculating of error is ordinary, therefore is omitted.
Due to calculate privacy consumption formula it is sufficiently complex, we can not between find out our algorithm be how to reduce it is hidden Private consumption.Therefore ε is consumed We conducted a series of experiment to observe privacycIt is how to become with the change of its dependent variable Change.We make ε=0.02, d=100 (the distance between true point and future position), R=300 and r=100.Fig. 2 is shown εcHow to change when being successfully tested with d and R change.Straight line in Fig. 2 represents the E (ε of GICM algorithmsc), we use should Value, which is used as, to be compared.Fig. 3 shows how the probability being successfully tested changes with d and R change.
From Fig. 2 (a) it may be seen that εcIncrease with d increase, and be finally reached ε.When d is less than 370, εcIt is small In GICM E (εc).When d is less than 150, εcOnly about GICM E (εc) 1/10th.Fig. 3 (a) shows that d is smaller, test Successful possibility is bigger.Therefore, if our foreseeable comparisons are accurate, εcIt can become very small.
Fig. 2 (b) shows εcReduced with d increase, and be finally reached 0.And the probability that Fig. 3 (b) displays are successfully tested Increase with R increase and be finally reached 1.Therefore a privacy that can be not only reduced than larger R when being successfully tested disappears Consumption, can also increase the probability being successfully tested.But R can not be too big, and otherwise error can become to be difficult to receive.
Forecasting Methodology and further improvement
As above as a part is introduced, our method needs to use historical data to be predicted.In the part, we Our Forecasting Methodology, and the improved method that can further reduce privacy consumption is presented.This two parts ensures our calculation Method can work good in actual production and living.
Forecasting Methodology
We will be apparent to a bit first, not be adapted to the universal method of all scenes.Therefore, it is main we consider three kinds Scene and separately design Forecasting Methodology for them.Three scenes include:Stationary user, at a slow speed user and highspeed user.In order to Illustrate convenient, we use X1,X2···XnOur true points to be protected are represented, Y is used1,Y2···Y3Publishing point is represented, is used P1,P2···PnRepresent corresponding future position.It is now assumed that our issued point Yi-1, now want to protection point Xi, I The first thing to be done be exactly future position Pi
Let us considers extreme case first:User is static.Such case may hair in real-life truly have It is raw.For example, a people is sitting in coffee-house restaurant or the cinema for inquiring about annex.One preferable strategy is protected using GICM First point is protected, the publishing point of follow-up all inquiries is then used as using first publishing point.So there was only first point Privacy consumption is generated, and overall privacy consumption has been reduced to Constant Grade.Inspired by this, we have Pi=Yi-1.While it appear that It is very simple, but the actual effect of this method is fine.Reason is very simple:A upper publishing point Yi-1From upper one true point Xi-1No Meeting is far (otherwise a upper publishing point is just without availability), and Xi=Xi-1, therefore Yi-1With XiThe distance between also will not be very Far.Mechanism is skipped along with what next part will be introduced, the privacy consumption of the scene can will be arrived Constant Grade by us.
Second of scene shows user in movement, but speed is very slow.This scene is also very universal, and such as user is while play A time walking of mobile phone.We have found that the Forecasting Methodology of static scene is also applied for the scene, because user can approximately recognize at a slow speed To be static.Another explanation is Yi-1Close to Xi-1, and Xi-1Again close to Xi, therefore Yi-1With XiAlso it is sufficiently close to.
When considering highspeed user, situation can be a little complicated.Forecasting Methodology in the past can not work well, because Xi-1And XiUsual wide apart.Relation between true point becomes very weak, therefore accurately prediction is very difficult.But We still are able to relatively accurately predict under some scenes.We use following Forecasting Methodology:Pi=2Yi-1-Yi-2.The prediction Method is based on following thought:The direction of advance and pace of user will not change too big.Therefore we can be used with approximating assumption Family is advanced same distance in same direction every time.
But also one problem:How we know which scene be currently in ourselvesFortunately, the problem is not It is difficult to resolve certainly.We can be used only publishing point to determine scene.If publishing point is very intensive, it is understood that we are in field at a slow speed Scape even static scene.Likewise, sparse publishing point shows us in High-speed Circumstance.Please note that this process can't be made Consumed into extra privacy, because attacker is it can be seen that publishing point.
It is further to improve
Our target is that privacy consumption is reduced as far as in the case where not dramatically increasing error.However, at us Experiment in we have found that a kind of very interesting phenomenon:Privacy is consumed and error is reduced simultaneously.It is hidden that this shows that we are also reduced The space of private consumption.Privacy and error remain a kind of balance.Since present error is smaller than what we wanted, we can Further to reduce privacy consumption by sacrificing certain availability.In other words, our mechanism can also be made further Improve.
First problem is:How we detect this phenomenonThis step is critically important, because this phenomenon and not always sending out It is raw.A kind of direct method is to calculate the error of publishing point in the past and decided whether to take measures with average value.However, so Doing can cause extra privacy to consume because we used attacker the ignorant information truly put.It is a kind of more suitable Method be to use success rate prediction.Success rate prediction is to predict successful number of times divided by total prediction number of times herein.The change Whether amount can too small with response error, because error is smaller, success rate prediction is bigger.This step does not result in extra privacy and disappeared Consumption, because attacker can also obtain successfully tested rate in itself.
Another problem is:How we are reduced by sacrificing certain availability because of private consumptionA kind of direct side Method is dynamically-adjusting parameter.For example, reducing privacy parameters ε or increase testing radius R.However, it has been found that these methods are simultaneously It is undesirable, because the privacy of their reductions is not notable.Therefore we employ a kind of efficiently simple method:Directly skip Testing procedure.When we have found that error is sufficiently small, we directly make publishing point be future position.So, privacy disappears Consumption is directly changed into 0, because we just do not use true point at all.This method can't make it that error increase is excessive, because absolutely In most cases our prediction is without departing from too remote.
Based on as discussed above, we to skip mechanism as follows:We represent the number of times being successfully tested with variable s, are represented with f The number of times of test crash.For first point, we use original GICM algorithms, and set s=0, f=1.For follow-up Point, we use PTM.We are predicted after a point every time, and we produce a random number ran between 0,1.If ran < s/ (s+t), our direct skip test steps, and with future position as publishing point, while keeping s and f constant.Otherwise, I Perform testing procedure and according to test result update s and f.In simple terms, the probability skipped is exactly successfully tested rate.
Experiment
In order to understand the actual performance of our algorithms, we are tested on two famous data sets.We examine Three kinds of scenes are considered:Stationary user, at a slow speed user and highspeed user.Our privacy consumption corresponding with GICM algorithm comparisons and by mistake Difference.
Basis setting
We introduce the two datasets used first:
(1)Geolife.This data set is collected by Microsoft Research, Asia.It has 182 users and 17621 rails Mark.Total value length has reached 1292951 kms, and total time has reached 50176 hours (from April, 2007 in August, 2012). Most positions are located at BeiJing, China.We are main to use the data set under scene at a slow speed, because portion big absolutely in data set It is all at a slow speed to divide movement.
(2)T-Drive.The data set contains the track of 10357 taxis in one week.Total length reaches 9000000 Km, total positional number reaches 15000000.Due to taxi translational speed quickly, we by the data set be used for use at a high speed Family.
Having three important parameters, we need setting at the very start:The radius r of circle is changed, the radius R of circle is tested, with And privacy parameters ε.These parameters are associated, so we can not arbitrarily set them.If for example, r=1000, then ε=0.1 is exactly insignificant, because the error of most publishing point is less than 4/ ε=40, and 1000 is obviously too big.At me Experiment in, we set these parameters as follows:R=100, ε=0.02, R=100,150,200 even 300 (for comparing).Unit is rice.Although we can set other parameter values, it has been found that the performance of mechanism can't change How much.So we use above-mentioned setting as explanation.
Use E (εc) and E (error) be used as the baseline compared.E (ε are obtained using above-mentioned parameter setup algorithmc) be 0.0173, E (error) is 100.Although the corresponding E (ε of PTM can not be calculatedc) and E (error), but average value can be used Instead, and with the performance of baseline comparatively bright mechanism.
Stationary user
In this scene, True Data is replaced using the data of simulation.Reason is very simple:User is static, institute Do not have any difference with analogue data and True Data.We have 100 tracks, and there are 1000 identical points every track. Make 100 PTM on these tracks, and record the error of all privacy consumption, then draw their Cumulative Distribution Function. We calculate the consumption of average privacy and error afterwards, and by its E (ε with GICM algorithmsc) compared with E (error).Fig. 4 shows Our result is shown.
Three figures (a, c, e) on Fig. 4 left sides show the cumulative distribution of the corresponding privacy consumption under different testing radius R Function.Fig. 4 (a) is shown, as R=100, and about 35% privacy consumption is no more than ε/10.Calculate and obtain average privacy consumption For 0.0096, it means that we save 44.51% privacy consumption.This result is not notable, but in Fig. 4 (c), when During R=200, the privacy consumption that there are about 85% is no more than ε/10.It may be seen that the Cumulative Distribution Function of privacy consumption with it is straight Line y=0.9 is sufficiently close to.It is 0.0017 now to calculate obtained average privacy consumption.That is, we are saved more than 90% Privacy is consumed.When R reaches 300, we are in Fig. 4 (e) it can be seen that most of point is all consumed without privacy.This benefits Mechanism is skipped in us.Average privacy consumption is 0.00023, and we save the privacy consumption more than 98%.This is one Huge lifting, has nearly reached ideal situation:Constant rank.
Three figures (b, d, f) on the right of Fig. 4 illustrate the cumulative distribution function of error under different R.We can be with from figure Find out that this three curves are sufficiently close to, this show our PTM mechanism can't influence publishing point availability (sometimes even energy Enough improve availability, such as Fig. 4 (f)).
User at a slow speed
In scene at a slow speed, distance between points is smaller, and we use this data set of Geolife.Although User in Geolife data sets has the various vehicles, such as walking, bicycle, automobile and subway, among them The overwhelming majority still use walking.Therefore it is suitable this data set to be used for into scene at a slow speed.Certainly, we can also delete The track of some non-walkings causes the data set to meet our requirement.We randomly select 1000 tracks, then using PTM Mechanism 1000 times.Just as us in static scene, we collect all privacies and consumed and error, then draw them Cumulative distribution function.Then we using average value come the E (ε with GICM algorithmsc) compared with E (error).
Fig. 5 shows result.Three of left side figures (a, c, e) depict R from 100 change to 300 when corresponding privacy disappear The cumulative distribution function of consumption.It may be seen that they are sufficiently close to Fig. 4, it means that our mechanism is under scene at a slow speed Also what is worked is fine.Obtained in fact, we calculate, work as R=100, when 200,300, our PTM is saved respectively 40.46%, 81.5% and 93% privacy consumption.
It may be seen that error increases with R increase from three figures (b, d, f) on the right.As R=100, Three curve is about the same.With R increase, the cumulative distribution function of error becomes more and more lower.This is easily explained:Work as R When too big, error is easier accumulation so that our prediction is more and more inaccurate.However, in actual life, this degree Loss of availability is acceptable.Our calculating obtain error under different R and add 5%, 22%, 46% respectively.With loss Availability compare, it is clear that the consumption of the privacy of reduction is more notable.
Highspeed user
It is finally highspeed user.In this scene, we use T-Drive data sets.It is exhausted most in the data set Number data come from taxi, therefore user is moved with one relative to high speed.We are adopted from initial data with fixed frequency Sampling point, to ensure that the distance between two points are more than 100 meters in most cases.We equally depict cumulative distribution function And calculated average value.It note that the Forecasting Methodology used in this scene is different from the Forecasting Methodology of the first two scene.
As a result show in figure 6.We can see that our PTM still can save the privacy consumption of about half.Tool For body, we save 28%, 55% and 69% privacy consumption.It can not save substantial amounts of hidden as first two scene The reason for private consumption, is very simple:User quickly moves so that track does not have good regularity.Therefore we can not be than calibrated Really it is predicted.We also note that when R changes to 300 from 200, how much the privacy consumption of saving does not have increase.This The precision and testing radius R for enlightening our Forecasting Methodologies constrain PTM overall performance jointly.
Fig. 6 also illustrates that availability is affected.Although the loss of availability in Fig. 6 (d) and Fig. 6 (b) can also receive, scheme Loss of availability in 6 (f) is just difficult to receive.Consumed in fact, R is set into 300 and can't also save too many privacy.Will It is a good selection that R, which is set to 200,.It is considered that when precision of prediction is not high enough, an excessive R is typically useless.

Claims (2)

1. a kind of difference privacy methods for protecting multiple positions in location information service, it is characterised in that comprise the following steps:
Step 1, the order of accuarcy of future position is tested using threshold value R and publishing point is determined, input user will protect real trace Point X1,X2...;It is privacy parameters to be distributed gamma (2,1/ ε) values l, ε according to gamma;
Step 2, according to being uniformly distributed U (0,2 π) value θ;
Step 3, Y is made1=X1+(lcosθ,lsinθ);
Step 4, Y is exported1
Step 5, i=2, s=0, f=1 are made;
Step 6, gamma (2,1/ ε) value l is distributed according to gamma;
Step 7, according to being uniformly distributed U (0,2 π) value θ;
Step 8, Y is madei=Xi+(lcosθ,lsinθ);
Step 9, structure forecast point P, produces random number ran (between 0,1), if ran < s/ (s+f):Issue Yi=P, i=i+ 1, return to step 6;Otherwise:Perform step 10;
Step 10, if dis (Yi, P) and < R:Issue Yi=P, s=s+1;Otherwise:Issue Yi, f=f+1;
Step 11:I=i+1, return to step 6.
2. the difference privacy methods of multiple positions are protected in location information service as claimed in claim 1, it is characterised in that Structure forecast point P method is:
When user is static, Pi=Yi-1
When user moves slowly at, Pi=Yi-1
When user's high-speed mobile, Pi=2Yi-1-Yi-2
CN201710433690.3A 2017-06-09 2017-06-09 Differential privacy method for protecting multiple positions in position information service Active CN107247909B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710433690.3A CN107247909B (en) 2017-06-09 2017-06-09 Differential privacy method for protecting multiple positions in position information service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710433690.3A CN107247909B (en) 2017-06-09 2017-06-09 Differential privacy method for protecting multiple positions in position information service

Publications (2)

Publication Number Publication Date
CN107247909A true CN107247909A (en) 2017-10-13
CN107247909B CN107247909B (en) 2020-05-05

Family

ID=60019278

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710433690.3A Active CN107247909B (en) 2017-06-09 2017-06-09 Differential privacy method for protecting multiple positions in position information service

Country Status (1)

Country Link
CN (1) CN107247909B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110022531A (en) * 2019-03-01 2019-07-16 华南理工大学 A kind of localization difference privacy municipal refuse data report and privacy calculation method
CN110516476A (en) * 2019-08-31 2019-11-29 贵州大学 Geographical indistinguishable location privacy protection method based on frequent location classification
CN110633402A (en) * 2019-09-20 2019-12-31 东北大学 Three-dimensional space-time information propagation prediction method with differential privacy mechanism
CN112487471A (en) * 2020-10-27 2021-03-12 重庆邮电大学 Differential privacy publishing method and system of associated metadata
CN114065287A (en) * 2021-11-18 2022-02-18 南京航空航天大学 Track difference privacy protection method and system for resisting prediction attack

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100077484A1 (en) * 2008-09-23 2010-03-25 Yahoo! Inc. Location tracking permissions and privacy
CN104135362A (en) * 2014-07-21 2014-11-05 南京大学 Availability computing method of data published based on differential privacy
CN105095447A (en) * 2015-07-24 2015-11-25 武汉大学 Distributed w-event differential privacy infinite streaming data distribution method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100077484A1 (en) * 2008-09-23 2010-03-25 Yahoo! Inc. Location tracking permissions and privacy
CN104135362A (en) * 2014-07-21 2014-11-05 南京大学 Availability computing method of data published based on differential privacy
CN105095447A (en) * 2015-07-24 2015-11-25 武汉大学 Distributed w-event differential privacy infinite streaming data distribution method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HOA NGO 等: "Location Privacy via Differential Private Perturbation of Cloaking Area", 《2015 IEEE 28TH COMPUTER SECURITY FOUNDATIONS SYMPOSIUM》 *
仝伟 等: "抗大数据分析的隐私保护:研究现状与进展", 《网络与信息安全学报》 *
杨松涛 等: "随机匿名的位置隐私保护方法", 《哈尔滨工程大学学报》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110022531A (en) * 2019-03-01 2019-07-16 华南理工大学 A kind of localization difference privacy municipal refuse data report and privacy calculation method
CN110022531B (en) * 2019-03-01 2021-01-19 华南理工大学 Localized differential privacy urban garbage data report and privacy calculation method
CN110516476A (en) * 2019-08-31 2019-11-29 贵州大学 Geographical indistinguishable location privacy protection method based on frequent location classification
CN110633402A (en) * 2019-09-20 2019-12-31 东北大学 Three-dimensional space-time information propagation prediction method with differential privacy mechanism
CN110633402B (en) * 2019-09-20 2021-05-04 东北大学 Three-dimensional space-time information propagation prediction method with differential privacy mechanism
CN112487471A (en) * 2020-10-27 2021-03-12 重庆邮电大学 Differential privacy publishing method and system of associated metadata
CN112487471B (en) * 2020-10-27 2022-01-28 重庆邮电大学 Differential privacy publishing method and system of associated metadata
CN114065287A (en) * 2021-11-18 2022-02-18 南京航空航天大学 Track difference privacy protection method and system for resisting prediction attack
CN114065287B (en) * 2021-11-18 2024-05-07 南京航空航天大学 Track differential privacy protection method and system for resisting predictive attack

Also Published As

Publication number Publication date
CN107247909B (en) 2020-05-05

Similar Documents

Publication Publication Date Title
CN107247909A (en) A kind of difference privacy methods for protecting multiple positions in location information service
US11710035B2 (en) Distributed labeling for supervised learning
US11989634B2 (en) Private federated learning with protection against reconstruction
US10171947B2 (en) Mobile application and device feature regulation based on profile data
Zheng et al. Diagnosing New York city's noises with ubiquitous data
EP3461105B1 (en) Positioning method and server
CN112868042B (en) Systems, methods, and computer program products for fraud management using shared hash graphs
US8463289B2 (en) Depersonalizing location traces
US9615248B2 (en) Anonymous vehicle communication protocol in vehicle-to-vehicle networks
Wang et al. On the serviceability of mobile vehicular cloudlets in a large-scale urban environment
EP3349126B1 (en) Method, device, storage medium, and apparatus for automatically discovering fuel station poi
CN108696558B (en) Position information processing method and device
CN108712375B (en) Coordinate encryption method, coordinate encryption system and vehicle with coordinate encryption system
US20200027340A1 (en) A method and apparatus for predicting a location where a vehicle appears
Lee et al. Grid-based cloaking area creation scheme supporting continuous location-based services
US10444062B2 (en) Measuring and diagnosing noise in an urban environment
CN112039861A (en) Risk identification method and device, electronic equipment and computer readable storage medium
Cui et al. Privacy and accuracy for cloud-fog-edge collaborative driver-vehicle-road relation graphs
CN111221827B (en) Database table connection method and device based on graphic processor, computer equipment and storage medium
CN111861540A (en) Information pushing method and device, computer equipment and storage medium
Xiong et al. A spatial entropy-based approach to improve mobile risk-based authentication
Iwata et al. A stayed location estimation method for sparse GPS positioning information
CN114817881A (en) Account abnormity detection method and device, computer equipment and storage medium
Barnes Privacy in connected and automated vehicles
US10789347B1 (en) Identification preprocessing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant