CN107247909A

CN107247909A - A kind of difference privacy methods for protecting multiple positions in location information service

Info

Publication number: CN107247909A
Application number: CN201710433690.3A
Authority: CN
Inventors: 朱马克; 华景煜; 仲盛
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2017-06-09
Filing date: 2017-06-09
Publication date: 2017-10-13
Anticipated expiration: 2037-06-09
Also published as: CN107247909B

Abstract

The present invention disclose a kind of difference privacy methods for protecting multiple positions in location information service, improves indistinguishability algorithm in original ground, it is proposed that predict testing mechanism.This method privacy a small amount of by consuming approximately consumes to build one of actual position to reduce total privacy.It can greatly reduce privacy consumption on the premise of availability of data is damaged not too much.For the performance of the mechanism of testing proposition, we are tested on two popular data sets.As a result show that our mechanism considerably reduces privacy consumption really, while ensure that the availability of data.

Description

A kind of difference privacy methods for protecting multiple positions in location information service

Technical field

The present invention relates to a kind of difference privacy methods for protecting multiple positions in location information service, belong to positional information Electric powder prediction.

Background technology

In recent years, with the continuous popularization of the smart mobile phone with GPS functions, location information service is also people's More and more important role is play in life.Almost all of smart mobile phone all can with it is various the reasons why explicitly or implicitly Ground uses the position data of user.For example, the friend that facebook is inquired about using the position data of user near user.News application Local news is then pushed using the position data of user.

Unfortunately, although location information service brings huge facility for our life, it also result in seriously Privacy concern.User is typically reluctant to being exposed to the real time position data of itself into the 3rd including service provider Side.Because user is once oneself position data is uploaded, with regard to uncontrollable third party position data is carried out it is each Plant operation.The third party of malice may track user using position data, so that infer the home address of user, it is interested Place, even such as health status or religious belief etc sensitive information.Therefore, we are in the urgent need to secret protection Type location information service, the positional information of user is hidden, while ensureing high-quality service.

Existing solution to this problem can be divided into two major classes.The first kind is cryptography method.This method with It is encrypted before the uploading position data of family.It can protect the individual privacy of user completely, and there is provided the secret protection that can be proved. However, encryption method can seriously damage the availability of data, service provider is caused to be difficult to provide valuable location-based service.And And, cryptography method generally very takes, and this is insufferable on a handheld device.Equations of The Second Kind method is based on to initial data Disturbed, so as to prevent the accurate location of third party acquisition user.Compared with cryptography method, perturbation motion method more light weight Level, and the infringement to availability of data is much smaller.Therefore, third party is also more willing to receive this method.But, due to this Method has still revealed the non-accurate data of user, and its security still leaves a question open.

Recently, Andres et al. proposes indistinguishability in ground --- first formal privacy based on position disturbance Model.This model is developed by difference privacy, and can provide evincible privacy guarantee.Specifically, if Under some disruption and recovery, any two is smaller than the position of a given threshold value and produces identical carry-out bit with close probability Put, then the disruption and recovery just contentedly in indistinguishability.So as to which third party can not recognize from a series of positions closed on Go out the actual position of user.Andres et al. devises corresponding algorithm, the algorithm by add two Laplacian noises come Indistinguishability in contentedly.However, the mechanism is primarily designed for protecting single position.If directly using it for protection Multiple positions, overall privacy consumption can increase with the increase of number of positions.This means the position that user can be carried out The number of times of service is extremely limited.Otherwise, privacy is depleted to zero, and the privacy of user will be destroyed.This limitation is unacceptable , because the position of user can be used for multiple times in many location based service providers within one day even one hour in actual life Data.For example, driver may be obtained in real time when using online navigation per that will carry out within several seconds a position enquiring Road information.Therefore, we are in the urgent need to improving indistinguishability disruption and recovery in existing ground so that keeping original Privacy consumption is reduced as far as on the premise of availability of data.

The content of the invention

Goal of the invention：As location information service becomes to become more and more popular, its privacy concern brought is increasingly highlighted, and is used Family is typically reluctant to the position of itself being exposed to third party user (including service provider).In order to solve this problem, ground In indistinguishability --- a kind of variant of difference privacy --- be proposed out.This privacy model passes through on home position Plus noise so that neighbouring position produces identical outgoing position with approximate probability.However, this method is originally designed use To protect single position.If directly using it for protecting multiple positions, overall privacy consumption can be with number of positions Increase and increase sharply, cause user to carry out the very limited amount of position enquiring service of number of times.The present invention provides one kind and existed The difference privacy methods of multiple positions are protected in location information service, indistinguishability algorithm in original ground is improved, proposed Prediction testing mechanism (Predict and Test Mechanism, abbreviation PTM).This method privacy a small amount of by consuming come Build actual position one approximately reduces total privacy consumption.It can be on the premise of availability of data be damaged not too much Greatly reduce privacy consumption.For the performance of the mechanism of testing proposition, we are tested on two popular data sets. As a result show that our mechanism considerably reduces privacy consumption really, while ensure that the availability of data.

Technical scheme：A kind of difference privacy methods for protecting multiple positions in location information service, comprise the following steps：

Step 1, test the order of accuarcy of future position using threshold value R (1-2 times that is typically set to change radius of circle) and determine Determine publishing point, input user will protect real trace point X₁,X₂…；Being distributed gamma (2,1/ ε) value l according to gamma, (equation isThe π of wherein β=2, α=2, x is stochastic variable, and l=x is made after one x of sampling), ε refers to for user Fixed privacy parameters；

Step 2, according to U (0,2 π) value θ is uniformly distributed, (equation isWherein a=0, b=2 π, x For stochastic variable, θ=x is made after the x that samples)；

Step 3, Y is made₁=X₁+(lcosθ,lsinθ)；

Step 4, Y is exported₁；

Step 5, i=2, s=0, f=1 are made；

Step 6, gamma (2,1/ ε) value l is distributed according to gamma；

Step 7, according to being uniformly distributed U (0,2 π) value θ；

Step 8, Y is made_i=X_i+(lcosθ,lsinθ)；

Step 9, structure forecast point P, produces random number ran (between 0,1), if ran ＜ s/ (s+f)：Issue Y_i=P, i =i+1, return to step 6；Otherwise：Perform step 10；

Step 10, if dis (Y_i, P) and ＜ R：Issue Y_i=P, s=s+1；Otherwise：Issue Y_i, f=f+1；

Step 11：I=i+1, return to step 6.

Structure forecast point P method is：

When user is static, P_i=Y_i-1；

When user (is often referred to actual position distance twice and is no more than R) mobile at a slow speed, P_i=Y_i-1；

When user at a high speed move by (be often referred to twice actual position distance more than R), P_i=2Y_i-1-Y_i-2。

Beneficial effect：Compared with prior art, the present invention is provided a kind of protects multiple positions in location information service Difference privacy methods, have the following advantages that：

(1) when a user wants to carry out the position enquiring of a new round, the present invention is first according to passing position enquiring History (namely the publishing point of user.Because being the one point processing of a point, publishing point before can be regarded as inquiry is gone through History) predict position that user is current.Afterwards by the predicted position with being obtained using indistinguishability disruption and recovery in original ground To position be compared.If the distance between two positions are less than a predefined threshold value, it is believed that this prediction It is successful, and position enquiring is carried out using the predicted value.Otherwise it is assumed that prediction of failure, uses indistinguishability in original ground The position that disruption and recovery is obtained carries out position enquiring.The property that the mechanism still meets indistinguishability in ground is we demonstrated, And privacy consumption can be significantly reduced.

(2) devised for three kinds of main scenes (stationary user, at a slow speed user and highspeed user) simple and efficient Forecasting Methodology.We also add a hopping strategy further to improve.This step can perceive the available of user's publishing point Character condition, and attempt to sacrifice certain availability further to reduce privacy consumption.Based on above method, privacy is greatly reduced Consumption.Privacy consumption is even fallen below into constant rank under some scenes.

(3) experiment has been carried out on True Data collection to verify the performance of the present invention.Experiment is based on two popular data Collection：Geolife and T-drive.In stationary user, tested respectively under three scenes of user and highspeed user at a slow speed, and And have evaluated corresponding privacy consumption and availability.The privacy point that the experimental result display present invention can be saved under three kinds of scenes 98%, 81% and 55% is not reached, while ensureing that the availability of user data is not much affected.

Brief description of the drawings

Fig. 1 (a) is Y' present positions schematic diagram when being successfully tested；

Y' present positions schematic diagram when Fig. 1 (b) is test crash；

Fig. 2 (a) is ε_cWhen being successfully tested with d change curve；

Fig. 2 (b) is ε_cWhen being successfully tested with R change curve；

Fig. 3 (a) is probability change curve with d when being successfully tested；

Fig. 3 (b) is probability change curve with R when being successfully tested；

Fig. 4 (a) is change curve of the privacy consumption Cumulative probability function in R=100 under stationary user；

Fig. 4 (b) is change curve of the probability of error aggregation function in R=100 under stationary user；

Fig. 4 (c) is change curve of the privacy consumption Cumulative probability function in R=200 under stationary user；

Fig. 4 (d) is change curve of the probability of error aggregation function in R=200 under stationary user；

Fig. 4 (e) is change curve of the privacy consumption Cumulative probability function in R=300 under stationary user；

Fig. 4 (f) is change curve of the probability of error aggregation function in R=300 under stationary user；

Fig. 5 (a) is change curve of the privacy consumption Cumulative probability function in R=100 under user at a slow speed；

Fig. 5 (b) is change curve of the probability of error aggregation function in R=100 under user at a slow speed；

Fig. 5 (c) is change curve of the privacy consumption Cumulative probability function in R=200 under user at a slow speed；

Fig. 5 (d) is change curve of the probability of error aggregation function in R=200 under user at a slow speed；

Fig. 5 (e) is change curve of the privacy consumption Cumulative probability function in R=300 under user at a slow speed；

Fig. 5 (f) is change curve of the probability of error aggregation function in R=300 under user at a slow speed；

Fig. 6 (a) is change curve of the privacy consumption Cumulative probability function in R=100 under highspeed user；

Fig. 6 (b) is change curve of the probability of error aggregation function in R=100 under highspeed user；

Fig. 6 (c) is change curve of the privacy consumption Cumulative probability function in R=200 under highspeed user；

Fig. 6 (d) is change curve of the probability of error aggregation function in R=200 under highspeed user；

Fig. 6 (e) is change curve of the privacy consumption Cumulative probability function in R=300 under highspeed user；

Fig. 6 (f) is change curve of the probability of error aggregation function in R=300 under highspeed user.

Embodiment

With reference to specific embodiment, the present invention is furture elucidated, it should be understood that these embodiments are merely to illustrate the present invention Rather than limitation the scope of the present invention, after the present invention has been read, various equivalences of the those skilled in the art to the present invention The modification of form falls within the application appended claims limited range.

Difference privacy and positional information protection

Introduce expansion (the i.e. prior art of the concept and difference privacy of difference privacy in location information service in this part In indistinguishability).The expectation privacy of analysis Andres et al. disruption and recoveries of design is consumed and anticipation error after this.This A little comparison bases that will be tested as us.

Difference privacy

The privacy guarantee that can be proved to due to that can provide, difference privacy has become a very popular privacy mould Type.Its key idea is that the output of staqtistical data base will not change too in the situation that at most only one of which data record is changed Greatly.In other words, influence of the missing of personal data to whole system is extremely limited, so as to reach the mesh of protection individual privacy 's.The standard of difference privacy is defined as follows：

Define (ε-difference privacy)：One issue mechanism A meets ε-difference privacy and if only if that A meets following condition：It is right Arbitrary adjacent data collection D and D'(is that is, they at most only differ a record), arbitrary output Z ∈ range (A), we Have：

ε in formula is used to weigh the privacy classes of issue mechanism.The value is smaller, and issue mechanism more can guarantee that stronger hidden Private grade.

In order to meet difference privacy, it would be desirable to which the original output to database increases noise.The amplitude of noise is by sensitivity Degree determines that it is defined as follows：

Define (susceptibility)：To arbitrary adjacent data collection D and D', a query function f is given:D→R^d, its susceptibility It is defined as follows：

Laplce's mechanism is a kind of conventional algorithm for being used for realizing difference privacy.It passes through full to each output increase The noise of the certain laplacian distribution of foot realizes difference privacy, i.e.,：

Define (Laplce's mechanism)：To arbitrary query function f:D→R^dIf a mechanism A exports f (D)+Lap (Δ f/ ε), then mechanism A meets ε-difference privacy.

However, directly the difference privacy of standard is applied in guarded by location to be very difficult, because position is not It is also any and continuous to have the distance between adjacent concept, and diverse location.In order to which difference privacy is applied into position Protection, it would be desirable to the vague generalization definition of difference privacy, it is as follows：

Define (broad sense ε-difference privacy)：One issue mechanism A meets broad sense ε-difference privacy and if only if that A meets as follows Condition：The data set D and D' of at most k bars record, arbitrary output Z ∈ range (A), Wo Menyou are differed to any two：

Privacy definition shows that it is bigger that database changes, and the change of the probability distribution for the output that we allow is bigger.This It is that difference privacy is applied to the basis gone in guarded by location.

Location privacy protection

In view of the need for the characteristic and practical application of two-dimensional space, we are applied in location privacy by difference privacy When need to make a little changes.

First, position issue does not have query function f concept.In order to simplify problem, we can assume that there is one to look into Ask function.The query function is an identity function, that is to say, that function exports essentially equal with input.This is follow-up concept Definition bring conveniently, such as susceptibility.

Secondly, position issue does not have the concept of adjacent position.This is also the reason for we use Generalized Difference privacy.I Need a measurement to describe the difference between diverse location.This database different from two differs bar and records similar Seemingly.It will be apparent that Euclidean distance is to describe the good selection of this difference.

Finally, we can not possibly preserve our planet on privacy a little.We need to preset a threshold value 2r, and Only consider the privacy of protection difference distance at most 2r position.In other words, we draw a circle, its center in actual position, Radius is r.We need only assure that any two points can produce identical output with approximate probability in garden.We claim this circle Make to change circle C.

Based on described above, we are defined as below：

Definition (in indistinguishability)：One issue mechanism A meets indistinguishability in broad sense ε-ground and if only if that A expires The following condition of foot：Pair radius is any two position X in r variation circle C₁And X₂, arbitrary outgoing position Y, Wo Menyou：

It will be apparent that to any two points X in C₁And X₂, we have dis (X₁,X₂)≤2r.Here, dis (X₁,X₂) represent two Euclidean distance between point.

In order to meet geographical indistinguishability, it would be desirable to actual position plus a noisy vector.Andres et al. A kind of simple and general mechanism is described, it is as follows：

Algorithm 1 (in the continuous mechanism of indistinguishability)：

Input：True point X；Privacy parameters ε；

Output：Publishing point Y；

1：Gamma (2,1/ ε) value l is distributed according to gamma；

2：According to being uniformly distributed U (0,2 π) value θ；

3：Y=X+ (lcos θ, lsin θ)；

4：Export publishing point Y.

The correctness of algorithm has been demonstrated, refers to Andres et al. paper.For convenience, we claim this algorithm Make GICM algorithms.

Privacy consumes the desired value with error

Many worked uses ε as the comparison base in experiment.However, due to the particularity of location privacy protection, We have found that the safe class of issue mechanism is directly weighed using ε to throw into question.The following is our analysis：Privacy is consumed It is a more preferable selection for being used for weighing privacy intensity, we are designated as ε_c.In Laplce's mechanism, we can lead to Cross equation below and calculate the value：

ε in formula_cShow that we can obtain how many new knowledge by observing output Z.The value and Laplce's mechanism In ε be equal.Likewise, for GICM algorithms, we can also identical mode define privacy consumption ε_c：

Above-mentioned definition shows that we can obtain how many new knowledge after output point Y is observed.However, and Laplce Unlike mechanism, the value is unequal with the ε in GICM.ε is please noted for all possible output Y, and ε_cOnly For a specific output Y.We are used as an example with GICM algorithms.Assuming that input point is X and output point is Y；Become Dynamic circle C radius is r, and the distance between X and Y are l.Changed if Y falls outside circle C, i.e. l >=r, we can calculate and obtain ε_c：

This seems to have no problem.But as l ＜ r, situation will change.We still can calculate ε_c：

The value is less than privacy parameters ε.This result shows that output Y can't be revealed when output point Y falls and changed in circle So much information as desired by us.Therefore, only using only ε can not accurate description mechanism reveal information number.Examine Considering output point Y has the probability distribution of their own, and we select E (ε_c) come weigh mechanism leakage information number.Applied To GICM algorithms, we obtain：

Theorem：For GICM algorithms, it expects privacy consumptionWherein r is to become The radius of circle is moved, ε is privacy parameters.

Prove：We need to consider two kinds of situations：Y falls in variation circle and Y falls changing round outer.X is set to coordinate former Point, Wo Menyou：

With r increase, desired value E (ε_c) reduce to ε/2 from ε.This is consistent with our intuition.It is big absolutely when r is close to 0 Most output point Y, which falls, to be changed outside circle, therefore E (ε_c) close to ε.When r becomes sufficiently large, most of output point Y It approximately can regard as to fall and change the center of circle, therefore E (ε_c) close to ε/2.

Except the desired value that privacy is consumed, the desired value of true point and the distance between publishing point will also consider emphatically, i.e., Error expects E (error).E (error) reflects the availability of publishing point.E (error) calculating is not difficult：

Theorem：For GICM algorithms, its anticipation error E (error)=2/ ε.

Prove：X is equally set to the origin of coordinates by us, is then had：

This explains why privacy parameters ε can not be set it is too small, otherwise the availability of publishing point will be by serious Influence.

Prediction and testing mechanism

Although GICM algorithms can be perfectly suitable for the situation of a single point, directly it is applied to multiple points, such as rail Mark data, can produce privacy concern.According to the property of difference privacy, if each point on track meets ε-difference privacy, Total privacy consumption can be accumulated.That is, the privacy consumption of whole piece track is n ε, wherein n is the number of the point in track.Cause This, if we directly protect our privacy using GICM algorithms, we can not carry out multiple location information service inquiry, Otherwise privacy can quickly be consumed light.

In view of the discussion of the above, reduces privacy consumption very meaningful.Due to it is contemplated that be line model, we only It can be handled one point of a point.So problem is reformed into：How privacy consumption is reduced when handling at oneAnswer Case is：Utilize historical data.Herein, historical data refers to what issued point and other attackers had known, energy For speculating that the data of user's actual position (please note that this can't reveal extra privacy, because we are not using any On the information truly put).Using these historical datas, we can obtain a future position.Although future position is unlikely Directly fall on true point, but as long as our prediction is good enough, then future position is just differed not with truly the distance between point Can be too big.A kind of intuitively method is directly to be released future position as publishing point.The advantage of this method is apparent：By In us at all just not using any information truly put, so privacy consumption is 0.However, because future position is based on upper Data once, and last data of the data based on upper last time.So predicated error can accumulate, and ultimately result in pre- Survey result unavailable.Therefore, it is intended that correcting the predicated error of accumulation using the information of not many actual position, simultaneously Excessive extra privacy consumption is not resulted in.And most important, the method designed by us will be still met can not area in ground Divide property, i.e. difference privacy.

Based on described above, we devise prediction and testing mechanism (Predict and Test Mechanism we It is denoted as PTM).Our algorithm still using truly point X and privacy parameters ε as input, exports a position Y.Algorithm 2 illustrates three Individual key step.First, it produces a noise point Y' based on X using algorithm 1.Our points are called preparation publishing point. Afterwards, the historical knowledge that it is grasped using attacker produces a future position P.The method for producing future position will be next Part is discussed.In final step, algorithm tests the order of accuarcy of future position using a threshold value R and determines publishing point.Specifically For, if the distance between Y' and P are less than R, it is believed that prediction is successful, and P is set into final publishing point.We It is referred to as being successfully tested, such as shown in Fig. 1 (a).Otherwise, it is believed that test crash simultaneously regard Y' as final publishing point.I Be referred to as test crash, shown in such as Fig. 1 (b).

Algorithm 2 (prediction and testing mechanism)：

Input：True point X；Threshold value R；Privacy parameters ε；

Output：Publishing point Y；

1：Gamma (2,1/ ε) value l is distributed according to gamma；

2：According to being uniformly distributed U (0,2 π) value θ；

3：Y'=X+ (lcos θ, lsin θ)；

4：Structure forecast point P；

5：If dis (Y', P) ＜ R, P is issued, Y' is otherwise issued.

Theorem：Prediction and testing mechanism contentedly in indistinguishability.

Prove：Let us is used as explanation with Fig. 1.In figure, we use P as the origin of coordinates, then X coordinate is exactly (-d,0)。

When test crash, algorithm retracts into GICM algorithms, and it is clearly to meet geographical indistinguishability.When testing into During work(, it is complicated that calculating understands some.If P is published, then Y' should be located in test circle.The center of circle of test circle is P, and radius is R.Therefore we can calculate Pr (A (X)=P)：

We need to find Pr (A (X)=P) maximum and minimum value.It can be seen thatWith d increasing Plus and reduce.That is, X is further away from P, Pr (A (X)=P) is smaller.We can change circle in find a bit, change the time away from It is closest from P, and the distance between they are set to dmin.We set dmax in the same way.So, for becoming Any two points X in dynamic circle C₁And X₂, Wo Menyou：

We have again：

It is because triangle inequality, and to have dmax-dmin≤2r that above-mentioned inequality, which is set up,.Finally, Wo Menyou：

Namely：

Proof is finished.

Please note that we can not calculate the E (ε of algorithm 2_c) and E (error), because our Forecasting Methodology is unknown 's.But to analysis, we can be calculated for a given prediction point P, the corresponding privacy consumption of algorithm and error.

Theorem：For a given P, the ε of algorithm_cIt is：

The error of algorithm is：

Prove：ε when being successfully tested_cCalculating can be obtained from last proof.ε during test crash_cCan be from " privacy is consumed With the desired value of error " this part obtains.The calculating of error is ordinary, therefore is omitted.

Due to calculate privacy consumption formula it is sufficiently complex, we can not between find out our algorithm be how to reduce it is hidden Private consumption.Therefore ε is consumed We conducted a series of experiment to observe privacy_cIt is how to become with the change of its dependent variable Change.We make ε=0.02, d=100 (the distance between true point and future position), R=300 and r=100.Fig. 2 is shown ε_cHow to change when being successfully tested with d and R change.Straight line in Fig. 2 represents the E (ε of GICM algorithms_c), we use should Value, which is used as, to be compared.Fig. 3 shows how the probability being successfully tested changes with d and R change.

From Fig. 2 (a) it may be seen that ε_cIncrease with d increase, and be finally reached ε.When d is less than 370, ε_cIt is small In GICM E (ε_c).When d is less than 150, ε_cOnly about GICM E (ε_c) 1/10th.Fig. 3 (a) shows that d is smaller, test Successful possibility is bigger.Therefore, if our foreseeable comparisons are accurate, ε_cIt can become very small.

Fig. 2 (b) shows ε_cReduced with d increase, and be finally reached 0.And the probability that Fig. 3 (b) displays are successfully tested Increase with R increase and be finally reached 1.Therefore a privacy that can be not only reduced than larger R when being successfully tested disappears Consumption, can also increase the probability being successfully tested.But R can not be too big, and otherwise error can become to be difficult to receive.

Forecasting Methodology and further improvement

As above as a part is introduced, our method needs to use historical data to be predicted.In the part, we Our Forecasting Methodology, and the improved method that can further reduce privacy consumption is presented.This two parts ensures our calculation Method can work good in actual production and living.

Forecasting Methodology

We will be apparent to a bit first, not be adapted to the universal method of all scenes.Therefore, it is main we consider three kinds Scene and separately design Forecasting Methodology for them.Three scenes include：Stationary user, at a slow speed user and highspeed user.In order to Illustrate convenient, we use X₁,X₂···X_nOur true points to be protected are represented, Y is used₁,Y₂···Y₃Publishing point is represented, is used P₁,P₂···P_nRepresent corresponding future position.It is now assumed that our issued point Y_i-1, now want to protection point X_i, I The first thing to be done be exactly future position P_i。

Let us considers extreme case first：User is static.Such case may hair in real-life truly have It is raw.For example, a people is sitting in coffee-house restaurant or the cinema for inquiring about annex.One preferable strategy is protected using GICM First point is protected, the publishing point of follow-up all inquiries is then used as using first publishing point.So there was only first point Privacy consumption is generated, and overall privacy consumption has been reduced to Constant Grade.Inspired by this, we have P_i=Y_i-1.While it appear that It is very simple, but the actual effect of this method is fine.Reason is very simple：A upper publishing point Y_i-1From upper one true point X_i-1No Meeting is far (otherwise a upper publishing point is just without availability), and X_i=X_i-1, therefore Y_i-1With X_iThe distance between also will not be very Far.Mechanism is skipped along with what next part will be introduced, the privacy consumption of the scene can will be arrived Constant Grade by us.

Second of scene shows user in movement, but speed is very slow.This scene is also very universal, and such as user is while play A time walking of mobile phone.We have found that the Forecasting Methodology of static scene is also applied for the scene, because user can approximately recognize at a slow speed To be static.Another explanation is Y_i-1Close to X_i-1, and X_i-1Again close to X_i, therefore Y_i-1With X_iAlso it is sufficiently close to.

When considering highspeed user, situation can be a little complicated.Forecasting Methodology in the past can not work well, because X_i-1And X_iUsual wide apart.Relation between true point becomes very weak, therefore accurately prediction is very difficult.But We still are able to relatively accurately predict under some scenes.We use following Forecasting Methodology：P_i=2Y_i-1-Y_i-2.The prediction Method is based on following thought：The direction of advance and pace of user will not change too big.Therefore we can be used with approximating assumption Family is advanced same distance in same direction every time.

But also one problem：How we know which scene be currently in ourselvesFortunately, the problem is not It is difficult to resolve certainly.We can be used only publishing point to determine scene.If publishing point is very intensive, it is understood that we are in field at a slow speed Scape even static scene.Likewise, sparse publishing point shows us in High-speed Circumstance.Please note that this process can't be made Consumed into extra privacy, because attacker is it can be seen that publishing point.

It is further to improve

Our target is that privacy consumption is reduced as far as in the case where not dramatically increasing error.However, at us Experiment in we have found that a kind of very interesting phenomenon：Privacy is consumed and error is reduced simultaneously.It is hidden that this shows that we are also reduced The space of private consumption.Privacy and error remain a kind of balance.Since present error is smaller than what we wanted, we can Further to reduce privacy consumption by sacrificing certain availability.In other words, our mechanism can also be made further Improve.

First problem is：How we detect this phenomenonThis step is critically important, because this phenomenon and not always sending out It is raw.A kind of direct method is to calculate the error of publishing point in the past and decided whether to take measures with average value.However, so Doing can cause extra privacy to consume because we used attacker the ignorant information truly put.It is a kind of more suitable Method be to use success rate prediction.Success rate prediction is to predict successful number of times divided by total prediction number of times herein.The change Whether amount can too small with response error, because error is smaller, success rate prediction is bigger.This step does not result in extra privacy and disappeared Consumption, because attacker can also obtain successfully tested rate in itself.

Another problem is：How we are reduced by sacrificing certain availability because of private consumptionA kind of direct side Method is dynamically-adjusting parameter.For example, reducing privacy parameters ε or increase testing radius R.However, it has been found that these methods are simultaneously It is undesirable, because the privacy of their reductions is not notable.Therefore we employ a kind of efficiently simple method：Directly skip Testing procedure.When we have found that error is sufficiently small, we directly make publishing point be future position.So, privacy disappears Consumption is directly changed into 0, because we just do not use true point at all.This method can't make it that error increase is excessive, because absolutely In most cases our prediction is without departing from too remote.

Based on as discussed above, we to skip mechanism as follows：We represent the number of times being successfully tested with variable s, are represented with f The number of times of test crash.For first point, we use original GICM algorithms, and set s=0, f=1.For follow-up Point, we use PTM.We are predicted after a point every time, and we produce a random number ran between 0,1.If ran ＜ s/ (s+t), our direct skip test steps, and with future position as publishing point, while keeping s and f constant.Otherwise, I Perform testing procedure and according to test result update s and f.In simple terms, the probability skipped is exactly successfully tested rate.

Experiment

In order to understand the actual performance of our algorithms, we are tested on two famous data sets.We examine Three kinds of scenes are considered：Stationary user, at a slow speed user and highspeed user.Our privacy consumption corresponding with GICM algorithm comparisons and by mistake Difference.

Basis setting

We introduce the two datasets used first：

(1)Geolife.This data set is collected by Microsoft Research, Asia.It has 182 users and 17621 rails Mark.Total value length has reached 1292951 kms, and total time has reached 50176 hours (from April, 2007 in August, 2012). Most positions are located at BeiJing, China.We are main to use the data set under scene at a slow speed, because portion big absolutely in data set It is all at a slow speed to divide movement.

(2)T-Drive.The data set contains the track of 10357 taxis in one week.Total length reaches 9000000 Km, total positional number reaches 15000000.Due to taxi translational speed quickly, we by the data set be used for use at a high speed Family.

Having three important parameters, we need setting at the very start：The radius r of circle is changed, the radius R of circle is tested, with And privacy parameters ε.These parameters are associated, so we can not arbitrarily set them.If for example, r=1000, then ε=0.1 is exactly insignificant, because the error of most publishing point is less than 4/ ε=40, and 1000 is obviously too big.At me Experiment in, we set these parameters as follows：R=100, ε=0.02, R=100,150,200 even 300 (for comparing).Unit is rice.Although we can set other parameter values, it has been found that the performance of mechanism can't change How much.So we use above-mentioned setting as explanation.

Use E (ε_c) and E (error) be used as the baseline compared.E (ε are obtained using above-mentioned parameter setup algorithm_c) be 0.0173, E (error) is 100.Although the corresponding E (ε of PTM can not be calculated_c) and E (error), but average value can be used Instead, and with the performance of baseline comparatively bright mechanism.

Stationary user

In this scene, True Data is replaced using the data of simulation.Reason is very simple：User is static, institute Do not have any difference with analogue data and True Data.We have 100 tracks, and there are 1000 identical points every track. Make 100 PTM on these tracks, and record the error of all privacy consumption, then draw their Cumulative Distribution Function. We calculate the consumption of average privacy and error afterwards, and by its E (ε with GICM algorithms_c) compared with E (error).Fig. 4 shows Our result is shown.

Three figures (a, c, e) on Fig. 4 left sides show the cumulative distribution of the corresponding privacy consumption under different testing radius R Function.Fig. 4 (a) is shown, as R=100, and about 35% privacy consumption is no more than ε/10.Calculate and obtain average privacy consumption For 0.0096, it means that we save 44.51% privacy consumption.This result is not notable, but in Fig. 4 (c), when During R=200, the privacy consumption that there are about 85% is no more than ε/10.It may be seen that the Cumulative Distribution Function of privacy consumption with it is straight Line y=0.9 is sufficiently close to.It is 0.0017 now to calculate obtained average privacy consumption.That is, we are saved more than 90% Privacy is consumed.When R reaches 300, we are in Fig. 4 (e) it can be seen that most of point is all consumed without privacy.This benefits Mechanism is skipped in us.Average privacy consumption is 0.00023, and we save the privacy consumption more than 98%.This is one Huge lifting, has nearly reached ideal situation：Constant rank.

Three figures (b, d, f) on the right of Fig. 4 illustrate the cumulative distribution function of error under different R.We can be with from figure Find out that this three curves are sufficiently close to, this show our PTM mechanism can't influence publishing point availability (sometimes even energy Enough improve availability, such as Fig. 4 (f)).

User at a slow speed

In scene at a slow speed, distance between points is smaller, and we use this data set of Geolife.Although User in Geolife data sets has the various vehicles, such as walking, bicycle, automobile and subway, among them The overwhelming majority still use walking.Therefore it is suitable this data set to be used for into scene at a slow speed.Certainly, we can also delete The track of some non-walkings causes the data set to meet our requirement.We randomly select 1000 tracks, then using PTM Mechanism 1000 times.Just as us in static scene, we collect all privacies and consumed and error, then draw them Cumulative distribution function.Then we using average value come the E (ε with GICM algorithms_c) compared with E (error).

Fig. 5 shows result.Three of left side figures (a, c, e) depict R from 100 change to 300 when corresponding privacy disappear The cumulative distribution function of consumption.It may be seen that they are sufficiently close to Fig. 4, it means that our mechanism is under scene at a slow speed Also what is worked is fine.Obtained in fact, we calculate, work as R=100, when 200,300, our PTM is saved respectively 40.46%, 81.5% and 93% privacy consumption.

It may be seen that error increases with R increase from three figures (b, d, f) on the right.As R=100, Three curve is about the same.With R increase, the cumulative distribution function of error becomes more and more lower.This is easily explained：Work as R When too big, error is easier accumulation so that our prediction is more and more inaccurate.However, in actual life, this degree Loss of availability is acceptable.Our calculating obtain error under different R and add 5%, 22%, 46% respectively.With loss Availability compare, it is clear that the consumption of the privacy of reduction is more notable.

Highspeed user

It is finally highspeed user.In this scene, we use T-Drive data sets.It is exhausted most in the data set Number data come from taxi, therefore user is moved with one relative to high speed.We are adopted from initial data with fixed frequency Sampling point, to ensure that the distance between two points are more than 100 meters in most cases.We equally depict cumulative distribution function And calculated average value.It note that the Forecasting Methodology used in this scene is different from the Forecasting Methodology of the first two scene.

As a result show in figure 6.We can see that our PTM still can save the privacy consumption of about half.Tool For body, we save 28%, 55% and 69% privacy consumption.It can not save substantial amounts of hidden as first two scene The reason for private consumption, is very simple：User quickly moves so that track does not have good regularity.Therefore we can not be than calibrated Really it is predicted.We also note that when R changes to 300 from 200, how much the privacy consumption of saving does not have increase.This The precision and testing radius R for enlightening our Forecasting Methodologies constrain PTM overall performance jointly.

Fig. 6 also illustrates that availability is affected.Although the loss of availability in Fig. 6 (d) and Fig. 6 (b) can also receive, scheme Loss of availability in 6 (f) is just difficult to receive.Consumed in fact, R is set into 300 and can't also save too many privacy.Will It is a good selection that R, which is set to 200,.It is considered that when precision of prediction is not high enough, an excessive R is typically useless.

Claims

1. a kind of difference privacy methods for protecting multiple positions in location information service, it is characterised in that comprise the following steps：

Step 1, the order of accuarcy of future position is tested using threshold value R and publishing point is determined, input user will protect real trace Point X₁,X₂...；It is privacy parameters to be distributed gamma (2,1/ ε) values l, ε according to gamma；

Step 2, according to being uniformly distributed U (0,2 π) value θ；

Step 3, Y is made₁=X₁+(lcosθ,lsinθ)；

Step 4, Y is exported₁；

Step 5, i=2, s=0, f=1 are made；

Step 6, gamma (2,1/ ε) value l is distributed according to gamma；

Step 7, according to being uniformly distributed U (0,2 π) value θ；

Step 8, Y is made_i=X_i+(lcosθ,lsinθ)；

Step 9, structure forecast point P, produces random number ran (between 0,1), if ran ＜ s/ (s+f)：Issue Y_i=P, i=i+ 1, return to step 6；Otherwise：Perform step 10；

Step 11：I=i+1, return to step 6.

2. the difference privacy methods of multiple positions are protected in location information service as claimed in claim 1, it is characterised in that Structure forecast point P method is：

When user is static, P_i=Y_i-1；

When user moves slowly at, P_i=Y_i-1；

When user's high-speed mobile, P_i=2Y_i-1-Y_i-2。