CN107247909A - A kind of difference privacy methods for protecting multiple positions in location information service - Google Patents
A kind of difference privacy methods for protecting multiple positions in location information service Download PDFInfo
- Publication number
- CN107247909A CN107247909A CN201710433690.3A CN201710433690A CN107247909A CN 107247909 A CN107247909 A CN 107247909A CN 201710433690 A CN201710433690 A CN 201710433690A CN 107247909 A CN107247909 A CN 107247909A
- Authority
- CN
- China
- Prior art keywords
- privacy
- user
- consumption
- point
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 230000003068 static effect Effects 0.000 claims description 8
- 230000007246 mechanism Effects 0.000 abstract description 40
- 238000004422 calculation algorithm Methods 0.000 abstract description 35
- 238000012360 testing method Methods 0.000 abstract description 27
- 230000008859 change Effects 0.000 description 39
- 230000006870 function Effects 0.000 description 26
- 230000001186 cumulative effect Effects 0.000 description 17
- 230000002776 aggregation Effects 0.000 description 9
- 238000004220 aggregation Methods 0.000 description 9
- 238000005315 distribution function Methods 0.000 description 7
- 238000002474 experimental method Methods 0.000 description 7
- 238000011084 recovery Methods 0.000 description 6
- 238000013480 data collection Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 230000009184 walking Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000033001 locomotion Effects 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000012956 testing procedure Methods 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 239000000843 powder Substances 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000029305 taxis Effects 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2111—Location-sensitive, e.g. geographical location, GPS
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Storage Device Security (AREA)
Abstract
The present invention disclose a kind of difference privacy methods for protecting multiple positions in location information service, improves indistinguishability algorithm in original ground, it is proposed that predict testing mechanism.This method privacy a small amount of by consuming approximately consumes to build one of actual position to reduce total privacy.It can greatly reduce privacy consumption on the premise of availability of data is damaged not too much.For the performance of the mechanism of testing proposition, we are tested on two popular data sets.As a result show that our mechanism considerably reduces privacy consumption really, while ensure that the availability of data.
Description
Technical field
The present invention relates to a kind of difference privacy methods for protecting multiple positions in location information service, belong to positional information
Electric powder prediction.
Background technology
In recent years, with the continuous popularization of the smart mobile phone with GPS functions, location information service is also people's
More and more important role is play in life.Almost all of smart mobile phone all can with it is various the reasons why explicitly or implicitly
Ground uses the position data of user.For example, the friend that facebook is inquired about using the position data of user near user.News application
Local news is then pushed using the position data of user.
Unfortunately, although location information service brings huge facility for our life, it also result in seriously
Privacy concern.User is typically reluctant to being exposed to the real time position data of itself into the 3rd including service provider
Side.Because user is once oneself position data is uploaded, with regard to uncontrollable third party position data is carried out it is each
Plant operation.The third party of malice may track user using position data, so that infer the home address of user, it is interested
Place, even such as health status or religious belief etc sensitive information.Therefore, we are in the urgent need to secret protection
Type location information service, the positional information of user is hidden, while ensureing high-quality service.
Existing solution to this problem can be divided into two major classes.The first kind is cryptography method.This method with
It is encrypted before the uploading position data of family.It can protect the individual privacy of user completely, and there is provided the secret protection that can be proved.
However, encryption method can seriously damage the availability of data, service provider is caused to be difficult to provide valuable location-based service.And
And, cryptography method generally very takes, and this is insufferable on a handheld device.Equations of The Second Kind method is based on to initial data
Disturbed, so as to prevent the accurate location of third party acquisition user.Compared with cryptography method, perturbation motion method more light weight
Level, and the infringement to availability of data is much smaller.Therefore, third party is also more willing to receive this method.But, due to this
Method has still revealed the non-accurate data of user, and its security still leaves a question open.
Recently, Andres et al. proposes indistinguishability in ground --- first formal privacy based on position disturbance
Model.This model is developed by difference privacy, and can provide evincible privacy guarantee.Specifically, if
Under some disruption and recovery, any two is smaller than the position of a given threshold value and produces identical carry-out bit with close probability
Put, then the disruption and recovery just contentedly in indistinguishability.So as to which third party can not recognize from a series of positions closed on
Go out the actual position of user.Andres et al. devises corresponding algorithm, the algorithm by add two Laplacian noises come
Indistinguishability in contentedly.However, the mechanism is primarily designed for protecting single position.If directly using it for protection
Multiple positions, overall privacy consumption can increase with the increase of number of positions.This means the position that user can be carried out
The number of times of service is extremely limited.Otherwise, privacy is depleted to zero, and the privacy of user will be destroyed.This limitation is unacceptable
, because the position of user can be used for multiple times in many location based service providers within one day even one hour in actual life
Data.For example, driver may be obtained in real time when using online navigation per that will carry out within several seconds a position enquiring
Road information.Therefore, we are in the urgent need to improving indistinguishability disruption and recovery in existing ground so that keeping original
Privacy consumption is reduced as far as on the premise of availability of data.
The content of the invention
Goal of the invention:As location information service becomes to become more and more popular, its privacy concern brought is increasingly highlighted, and is used
Family is typically reluctant to the position of itself being exposed to third party user (including service provider).In order to solve this problem, ground
In indistinguishability --- a kind of variant of difference privacy --- be proposed out.This privacy model passes through on home position
Plus noise so that neighbouring position produces identical outgoing position with approximate probability.However, this method is originally designed use
To protect single position.If directly using it for protecting multiple positions, overall privacy consumption can be with number of positions
Increase and increase sharply, cause user to carry out the very limited amount of position enquiring service of number of times.The present invention provides one kind and existed
The difference privacy methods of multiple positions are protected in location information service, indistinguishability algorithm in original ground is improved, proposed
Prediction testing mechanism (Predict and Test Mechanism, abbreviation PTM).This method privacy a small amount of by consuming come
Build actual position one approximately reduces total privacy consumption.It can be on the premise of availability of data be damaged not too much
Greatly reduce privacy consumption.For the performance of the mechanism of testing proposition, we are tested on two popular data sets.
As a result show that our mechanism considerably reduces privacy consumption really, while ensure that the availability of data.
Technical scheme:A kind of difference privacy methods for protecting multiple positions in location information service, comprise the following steps:
Step 1, test the order of accuarcy of future position using threshold value R (1-2 times that is typically set to change radius of circle) and determine
Determine publishing point, input user will protect real trace point X1,X2…;Being distributed gamma (2,1/ ε) value l according to gamma, (equation isThe π of wherein β=2, α=2, x is stochastic variable, and l=x is made after one x of sampling), ε refers to for user
Fixed privacy parameters;
Step 2, according to U (0,2 π) value θ is uniformly distributed, (equation isWherein a=0, b=2 π, x
For stochastic variable, θ=x is made after the x that samples);
Step 3, Y is made1=X1+(lcosθ,lsinθ);
Step 4, Y is exported1;
Step 5, i=2, s=0, f=1 are made;
Step 6, gamma (2,1/ ε) value l is distributed according to gamma;
Step 7, according to being uniformly distributed U (0,2 π) value θ;
Step 8, Y is madei=Xi+(lcosθ,lsinθ);
Step 9, structure forecast point P, produces random number ran (between 0,1), if ran < s/ (s+f):Issue Yi=P, i
=i+1, return to step 6;Otherwise:Perform step 10;
Step 10, if dis (Yi, P) and < R:Issue Yi=P, s=s+1;Otherwise:Issue Yi, f=f+1;
Step 11:I=i+1, return to step 6.
Structure forecast point P method is:
When user is static, Pi=Yi-1;
When user (is often referred to actual position distance twice and is no more than R) mobile at a slow speed, Pi=Yi-1;
When user at a high speed move by (be often referred to twice actual position distance more than R), Pi=2Yi-1-Yi-2。
Beneficial effect:Compared with prior art, the present invention is provided a kind of protects multiple positions in location information service
Difference privacy methods, have the following advantages that:
(1) when a user wants to carry out the position enquiring of a new round, the present invention is first according to passing position enquiring
History (namely the publishing point of user.Because being the one point processing of a point, publishing point before can be regarded as inquiry is gone through
History) predict position that user is current.Afterwards by the predicted position with being obtained using indistinguishability disruption and recovery in original ground
To position be compared.If the distance between two positions are less than a predefined threshold value, it is believed that this prediction
It is successful, and position enquiring is carried out using the predicted value.Otherwise it is assumed that prediction of failure, uses indistinguishability in original ground
The position that disruption and recovery is obtained carries out position enquiring.The property that the mechanism still meets indistinguishability in ground is we demonstrated,
And privacy consumption can be significantly reduced.
(2) devised for three kinds of main scenes (stationary user, at a slow speed user and highspeed user) simple and efficient
Forecasting Methodology.We also add a hopping strategy further to improve.This step can perceive the available of user's publishing point
Character condition, and attempt to sacrifice certain availability further to reduce privacy consumption.Based on above method, privacy is greatly reduced
Consumption.Privacy consumption is even fallen below into constant rank under some scenes.
(3) experiment has been carried out on True Data collection to verify the performance of the present invention.Experiment is based on two popular data
Collection:Geolife and T-drive.In stationary user, tested respectively under three scenes of user and highspeed user at a slow speed, and
And have evaluated corresponding privacy consumption and availability.The privacy point that the experimental result display present invention can be saved under three kinds of scenes
98%, 81% and 55% is not reached, while ensureing that the availability of user data is not much affected.
Brief description of the drawings
Fig. 1 (a) is Y' present positions schematic diagram when being successfully tested;
Y' present positions schematic diagram when Fig. 1 (b) is test crash;
Fig. 2 (a) is εcWhen being successfully tested with d change curve;
Fig. 2 (b) is εcWhen being successfully tested with R change curve;
Fig. 3 (a) is probability change curve with d when being successfully tested;
Fig. 3 (b) is probability change curve with R when being successfully tested;
Fig. 4 (a) is change curve of the privacy consumption Cumulative probability function in R=100 under stationary user;
Fig. 4 (b) is change curve of the probability of error aggregation function in R=100 under stationary user;
Fig. 4 (c) is change curve of the privacy consumption Cumulative probability function in R=200 under stationary user;
Fig. 4 (d) is change curve of the probability of error aggregation function in R=200 under stationary user;
Fig. 4 (e) is change curve of the privacy consumption Cumulative probability function in R=300 under stationary user;
Fig. 4 (f) is change curve of the probability of error aggregation function in R=300 under stationary user;
Fig. 5 (a) is change curve of the privacy consumption Cumulative probability function in R=100 under user at a slow speed;
Fig. 5 (b) is change curve of the probability of error aggregation function in R=100 under user at a slow speed;
Fig. 5 (c) is change curve of the privacy consumption Cumulative probability function in R=200 under user at a slow speed;
Fig. 5 (d) is change curve of the probability of error aggregation function in R=200 under user at a slow speed;
Fig. 5 (e) is change curve of the privacy consumption Cumulative probability function in R=300 under user at a slow speed;
Fig. 5 (f) is change curve of the probability of error aggregation function in R=300 under user at a slow speed;
Fig. 6 (a) is change curve of the privacy consumption Cumulative probability function in R=100 under highspeed user;
Fig. 6 (b) is change curve of the probability of error aggregation function in R=100 under highspeed user;
Fig. 6 (c) is change curve of the privacy consumption Cumulative probability function in R=200 under highspeed user;
Fig. 6 (d) is change curve of the probability of error aggregation function in R=200 under highspeed user;
Fig. 6 (e) is change curve of the privacy consumption Cumulative probability function in R=300 under highspeed user;
Fig. 6 (f) is change curve of the probability of error aggregation function in R=300 under highspeed user.
Embodiment
With reference to specific embodiment, the present invention is furture elucidated, it should be understood that these embodiments are merely to illustrate the present invention
Rather than limitation the scope of the present invention, after the present invention has been read, various equivalences of the those skilled in the art to the present invention
The modification of form falls within the application appended claims limited range.
Difference privacy and positional information protection
Introduce expansion (the i.e. prior art of the concept and difference privacy of difference privacy in location information service in this part
In indistinguishability).The expectation privacy of analysis Andres et al. disruption and recoveries of design is consumed and anticipation error after this.This
A little comparison bases that will be tested as us.
Difference privacy
The privacy guarantee that can be proved to due to that can provide, difference privacy has become a very popular privacy mould
Type.Its key idea is that the output of staqtistical data base will not change too in the situation that at most only one of which data record is changed
Greatly.In other words, influence of the missing of personal data to whole system is extremely limited, so as to reach the mesh of protection individual privacy
's.The standard of difference privacy is defined as follows:
Define (ε-difference privacy):One issue mechanism A meets ε-difference privacy and if only if that A meets following condition:It is right
Arbitrary adjacent data collection D and D'(is that is, they at most only differ a record), arbitrary output Z ∈ range (A), we
Have:
ε in formula is used to weigh the privacy classes of issue mechanism.The value is smaller, and issue mechanism more can guarantee that stronger hidden
Private grade.
In order to meet difference privacy, it would be desirable to which the original output to database increases noise.The amplitude of noise is by sensitivity
Degree determines that it is defined as follows:
Define (susceptibility):To arbitrary adjacent data collection D and D', a query function f is given:D→Rd, its susceptibility
It is defined as follows:
Laplce's mechanism is a kind of conventional algorithm for being used for realizing difference privacy.It passes through full to each output increase
The noise of the certain laplacian distribution of foot realizes difference privacy, i.e.,:
Define (Laplce's mechanism):To arbitrary query function f:D→RdIf a mechanism A exports f (D)+Lap
(Δ f/ ε), then mechanism A meets ε-difference privacy.
However, directly the difference privacy of standard is applied in guarded by location to be very difficult, because position is not
It is also any and continuous to have the distance between adjacent concept, and diverse location.In order to which difference privacy is applied into position
Protection, it would be desirable to the vague generalization definition of difference privacy, it is as follows:
Define (broad sense ε-difference privacy):One issue mechanism A meets broad sense ε-difference privacy and if only if that A meets as follows
Condition:The data set D and D' of at most k bars record, arbitrary output Z ∈ range (A), Wo Menyou are differed to any two:
Privacy definition shows that it is bigger that database changes, and the change of the probability distribution for the output that we allow is bigger.This
It is that difference privacy is applied to the basis gone in guarded by location.
Location privacy protection
In view of the need for the characteristic and practical application of two-dimensional space, we are applied in location privacy by difference privacy
When need to make a little changes.
First, position issue does not have query function f concept.In order to simplify problem, we can assume that there is one to look into
Ask function.The query function is an identity function, that is to say, that function exports essentially equal with input.This is follow-up concept
Definition bring conveniently, such as susceptibility.
Secondly, position issue does not have the concept of adjacent position.This is also the reason for we use Generalized Difference privacy.I
Need a measurement to describe the difference between diverse location.This database different from two differs bar and records similar
Seemingly.It will be apparent that Euclidean distance is to describe the good selection of this difference.
Finally, we can not possibly preserve our planet on privacy a little.We need to preset a threshold value 2r, and
Only consider the privacy of protection difference distance at most 2r position.In other words, we draw a circle, its center in actual position,
Radius is r.We need only assure that any two points can produce identical output with approximate probability in garden.We claim this circle
Make to change circle C.
Based on described above, we are defined as below:
Definition (in indistinguishability):One issue mechanism A meets indistinguishability in broad sense ε-ground and if only if that A expires
The following condition of foot:Pair radius is any two position X in r variation circle C1And X2, arbitrary outgoing position Y, Wo Menyou:
It will be apparent that to any two points X in C1And X2, we have dis (X1,X2)≤2r.Here, dis (X1,X2) represent two
Euclidean distance between point.
In order to meet geographical indistinguishability, it would be desirable to actual position plus a noisy vector.Andres et al.
A kind of simple and general mechanism is described, it is as follows:
Algorithm 1 (in the continuous mechanism of indistinguishability):
Input:True point X;Privacy parameters ε;
Output:Publishing point Y;
1:Gamma (2,1/ ε) value l is distributed according to gamma;
2:According to being uniformly distributed U (0,2 π) value θ;
3:Y=X+ (lcos θ, lsin θ);
4:Export publishing point Y.
The correctness of algorithm has been demonstrated, refers to Andres et al. paper.For convenience, we claim this algorithm
Make GICM algorithms.
Privacy consumes the desired value with error
Many worked uses ε as the comparison base in experiment.However, due to the particularity of location privacy protection,
We have found that the safe class of issue mechanism is directly weighed using ε to throw into question.The following is our analysis:Privacy is consumed
It is a more preferable selection for being used for weighing privacy intensity, we are designated as εc.In Laplce's mechanism, we can lead to
Cross equation below and calculate the value:
ε in formulacShow that we can obtain how many new knowledge by observing output Z.The value and Laplce's mechanism
In ε be equal.Likewise, for GICM algorithms, we can also identical mode define privacy consumption εc:
Above-mentioned definition shows that we can obtain how many new knowledge after output point Y is observed.However, and Laplce
Unlike mechanism, the value is unequal with the ε in GICM.ε is please noted for all possible output Y, and εcOnly
For a specific output Y.We are used as an example with GICM algorithms.Assuming that input point is X and output point is Y;Become
Dynamic circle C radius is r, and the distance between X and Y are l.Changed if Y falls outside circle C, i.e. l >=r, we can calculate and obtain
εc:
This seems to have no problem.But as l < r, situation will change.We still can calculate εc:
The value is less than privacy parameters ε.This result shows that output Y can't be revealed when output point Y falls and changed in circle
So much information as desired by us.Therefore, only using only ε can not accurate description mechanism reveal information number.Examine
Considering output point Y has the probability distribution of their own, and we select E (εc) come weigh mechanism leakage information number.Applied
To GICM algorithms, we obtain:
Theorem:For GICM algorithms, it expects privacy consumptionWherein r is to become
The radius of circle is moved, ε is privacy parameters.
Prove:We need to consider two kinds of situations:Y falls in variation circle and Y falls changing round outer.X is set to coordinate former
Point, Wo Menyou:
With r increase, desired value E (εc) reduce to ε/2 from ε.This is consistent with our intuition.It is big absolutely when r is close to 0
Most output point Y, which falls, to be changed outside circle, therefore E (εc) close to ε.When r becomes sufficiently large, most of output point Y
It approximately can regard as to fall and change the center of circle, therefore E (εc) close to ε/2.
Except the desired value that privacy is consumed, the desired value of true point and the distance between publishing point will also consider emphatically, i.e.,
Error expects E (error).E (error) reflects the availability of publishing point.E (error) calculating is not difficult:
Theorem:For GICM algorithms, its anticipation error E (error)=2/ ε.
Prove:X is equally set to the origin of coordinates by us, is then had:
This explains why privacy parameters ε can not be set it is too small, otherwise the availability of publishing point will be by serious
Influence.
Prediction and testing mechanism
Although GICM algorithms can be perfectly suitable for the situation of a single point, directly it is applied to multiple points, such as rail
Mark data, can produce privacy concern.According to the property of difference privacy, if each point on track meets ε-difference privacy,
Total privacy consumption can be accumulated.That is, the privacy consumption of whole piece track is n ε, wherein n is the number of the point in track.Cause
This, if we directly protect our privacy using GICM algorithms, we can not carry out multiple location information service inquiry,
Otherwise privacy can quickly be consumed light.
In view of the discussion of the above, reduces privacy consumption very meaningful.Due to it is contemplated that be line model, we only
It can be handled one point of a point.So problem is reformed into:How privacy consumption is reduced when handling at oneAnswer
Case is:Utilize historical data.Herein, historical data refers to what issued point and other attackers had known, energy
For speculating that the data of user's actual position (please note that this can't reveal extra privacy, because we are not using any
On the information truly put).Using these historical datas, we can obtain a future position.Although future position is unlikely
Directly fall on true point, but as long as our prediction is good enough, then future position is just differed not with truly the distance between point
Can be too big.A kind of intuitively method is directly to be released future position as publishing point.The advantage of this method is apparent:By
In us at all just not using any information truly put, so privacy consumption is 0.However, because future position is based on upper
Data once, and last data of the data based on upper last time.So predicated error can accumulate, and ultimately result in pre-
Survey result unavailable.Therefore, it is intended that correcting the predicated error of accumulation using the information of not many actual position, simultaneously
Excessive extra privacy consumption is not resulted in.And most important, the method designed by us will be still met can not area in ground
Divide property, i.e. difference privacy.
Based on described above, we devise prediction and testing mechanism (Predict and Test Mechanism we
It is denoted as PTM).Our algorithm still using truly point X and privacy parameters ε as input, exports a position Y.Algorithm 2 illustrates three
Individual key step.First, it produces a noise point Y' based on X using algorithm 1.Our points are called preparation publishing point.
Afterwards, the historical knowledge that it is grasped using attacker produces a future position P.The method for producing future position will be next
Part is discussed.In final step, algorithm tests the order of accuarcy of future position using a threshold value R and determines publishing point.Specifically
For, if the distance between Y' and P are less than R, it is believed that prediction is successful, and P is set into final publishing point.We
It is referred to as being successfully tested, such as shown in Fig. 1 (a).Otherwise, it is believed that test crash simultaneously regard Y' as final publishing point.I
Be referred to as test crash, shown in such as Fig. 1 (b).
Algorithm 2 (prediction and testing mechanism):
Input:True point X;Threshold value R;Privacy parameters ε;
Output:Publishing point Y;
1:Gamma (2,1/ ε) value l is distributed according to gamma;
2:According to being uniformly distributed U (0,2 π) value θ;
3:Y'=X+ (lcos θ, lsin θ);
4:Structure forecast point P;
5:If dis (Y', P) < R, P is issued, Y' is otherwise issued.
Theorem:Prediction and testing mechanism contentedly in indistinguishability.
Prove:Let us is used as explanation with Fig. 1.In figure, we use P as the origin of coordinates, then X coordinate is exactly
(-d,0)。
When test crash, algorithm retracts into GICM algorithms, and it is clearly to meet geographical indistinguishability.When testing into
During work(, it is complicated that calculating understands some.If P is published, then Y' should be located in test circle.The center of circle of test circle is P, and radius is
R.Therefore we can calculate Pr (A (X)=P):
We need to find Pr (A (X)=P) maximum and minimum value.It can be seen thatWith d increasing
Plus and reduce.That is, X is further away from P, Pr (A (X)=P) is smaller.We can change circle in find a bit, change the time away from
It is closest from P, and the distance between they are set to dmin.We set dmax in the same way.So, for becoming
Any two points X in dynamic circle C1And X2, Wo Menyou:
We have again:
It is because triangle inequality, and to have dmax-dmin≤2r that above-mentioned inequality, which is set up,.Finally, Wo Menyou:
Namely:
Proof is finished.
Please note that we can not calculate the E (ε of algorithm 2c) and E (error), because our Forecasting Methodology is unknown
's.But to analysis, we can be calculated for a given prediction point P, the corresponding privacy consumption of algorithm and error.
Theorem:For a given P, the ε of algorithmcIt is:
The error of algorithm is:
Prove:ε when being successfully testedcCalculating can be obtained from last proof.ε during test crashcCan be from " privacy is consumed
With the desired value of error " this part obtains.The calculating of error is ordinary, therefore is omitted.
Due to calculate privacy consumption formula it is sufficiently complex, we can not between find out our algorithm be how to reduce it is hidden
Private consumption.Therefore ε is consumed We conducted a series of experiment to observe privacycIt is how to become with the change of its dependent variable
Change.We make ε=0.02, d=100 (the distance between true point and future position), R=300 and r=100.Fig. 2 is shown
εcHow to change when being successfully tested with d and R change.Straight line in Fig. 2 represents the E (ε of GICM algorithmsc), we use should
Value, which is used as, to be compared.Fig. 3 shows how the probability being successfully tested changes with d and R change.
From Fig. 2 (a) it may be seen that εcIncrease with d increase, and be finally reached ε.When d is less than 370, εcIt is small
In GICM E (εc).When d is less than 150, εcOnly about GICM E (εc) 1/10th.Fig. 3 (a) shows that d is smaller, test
Successful possibility is bigger.Therefore, if our foreseeable comparisons are accurate, εcIt can become very small.
Fig. 2 (b) shows εcReduced with d increase, and be finally reached 0.And the probability that Fig. 3 (b) displays are successfully tested
Increase with R increase and be finally reached 1.Therefore a privacy that can be not only reduced than larger R when being successfully tested disappears
Consumption, can also increase the probability being successfully tested.But R can not be too big, and otherwise error can become to be difficult to receive.
Forecasting Methodology and further improvement
As above as a part is introduced, our method needs to use historical data to be predicted.In the part, we
Our Forecasting Methodology, and the improved method that can further reduce privacy consumption is presented.This two parts ensures our calculation
Method can work good in actual production and living.
Forecasting Methodology
We will be apparent to a bit first, not be adapted to the universal method of all scenes.Therefore, it is main we consider three kinds
Scene and separately design Forecasting Methodology for them.Three scenes include:Stationary user, at a slow speed user and highspeed user.In order to
Illustrate convenient, we use X1,X2···XnOur true points to be protected are represented, Y is used1,Y2···Y3Publishing point is represented, is used
P1,P2···PnRepresent corresponding future position.It is now assumed that our issued point Yi-1, now want to protection point Xi, I
The first thing to be done be exactly future position Pi。
Let us considers extreme case first:User is static.Such case may hair in real-life truly have
It is raw.For example, a people is sitting in coffee-house restaurant or the cinema for inquiring about annex.One preferable strategy is protected using GICM
First point is protected, the publishing point of follow-up all inquiries is then used as using first publishing point.So there was only first point
Privacy consumption is generated, and overall privacy consumption has been reduced to Constant Grade.Inspired by this, we have Pi=Yi-1.While it appear that
It is very simple, but the actual effect of this method is fine.Reason is very simple:A upper publishing point Yi-1From upper one true point Xi-1No
Meeting is far (otherwise a upper publishing point is just without availability), and Xi=Xi-1, therefore Yi-1With XiThe distance between also will not be very
Far.Mechanism is skipped along with what next part will be introduced, the privacy consumption of the scene can will be arrived Constant Grade by us.
Second of scene shows user in movement, but speed is very slow.This scene is also very universal, and such as user is while play
A time walking of mobile phone.We have found that the Forecasting Methodology of static scene is also applied for the scene, because user can approximately recognize at a slow speed
To be static.Another explanation is Yi-1Close to Xi-1, and Xi-1Again close to Xi, therefore Yi-1With XiAlso it is sufficiently close to.
When considering highspeed user, situation can be a little complicated.Forecasting Methodology in the past can not work well, because
Xi-1And XiUsual wide apart.Relation between true point becomes very weak, therefore accurately prediction is very difficult.But
We still are able to relatively accurately predict under some scenes.We use following Forecasting Methodology:Pi=2Yi-1-Yi-2.The prediction
Method is based on following thought:The direction of advance and pace of user will not change too big.Therefore we can be used with approximating assumption
Family is advanced same distance in same direction every time.
But also one problem:How we know which scene be currently in ourselvesFortunately, the problem is not
It is difficult to resolve certainly.We can be used only publishing point to determine scene.If publishing point is very intensive, it is understood that we are in field at a slow speed
Scape even static scene.Likewise, sparse publishing point shows us in High-speed Circumstance.Please note that this process can't be made
Consumed into extra privacy, because attacker is it can be seen that publishing point.
It is further to improve
Our target is that privacy consumption is reduced as far as in the case where not dramatically increasing error.However, at us
Experiment in we have found that a kind of very interesting phenomenon:Privacy is consumed and error is reduced simultaneously.It is hidden that this shows that we are also reduced
The space of private consumption.Privacy and error remain a kind of balance.Since present error is smaller than what we wanted, we can
Further to reduce privacy consumption by sacrificing certain availability.In other words, our mechanism can also be made further
Improve.
First problem is:How we detect this phenomenonThis step is critically important, because this phenomenon and not always sending out
It is raw.A kind of direct method is to calculate the error of publishing point in the past and decided whether to take measures with average value.However, so
Doing can cause extra privacy to consume because we used attacker the ignorant information truly put.It is a kind of more suitable
Method be to use success rate prediction.Success rate prediction is to predict successful number of times divided by total prediction number of times herein.The change
Whether amount can too small with response error, because error is smaller, success rate prediction is bigger.This step does not result in extra privacy and disappeared
Consumption, because attacker can also obtain successfully tested rate in itself.
Another problem is:How we are reduced by sacrificing certain availability because of private consumptionA kind of direct side
Method is dynamically-adjusting parameter.For example, reducing privacy parameters ε or increase testing radius R.However, it has been found that these methods are simultaneously
It is undesirable, because the privacy of their reductions is not notable.Therefore we employ a kind of efficiently simple method:Directly skip
Testing procedure.When we have found that error is sufficiently small, we directly make publishing point be future position.So, privacy disappears
Consumption is directly changed into 0, because we just do not use true point at all.This method can't make it that error increase is excessive, because absolutely
In most cases our prediction is without departing from too remote.
Based on as discussed above, we to skip mechanism as follows:We represent the number of times being successfully tested with variable s, are represented with f
The number of times of test crash.For first point, we use original GICM algorithms, and set s=0, f=1.For follow-up
Point, we use PTM.We are predicted after a point every time, and we produce a random number ran between 0,1.If ran
< s/ (s+t), our direct skip test steps, and with future position as publishing point, while keeping s and f constant.Otherwise, I
Perform testing procedure and according to test result update s and f.In simple terms, the probability skipped is exactly successfully tested rate.
Experiment
In order to understand the actual performance of our algorithms, we are tested on two famous data sets.We examine
Three kinds of scenes are considered:Stationary user, at a slow speed user and highspeed user.Our privacy consumption corresponding with GICM algorithm comparisons and by mistake
Difference.
Basis setting
We introduce the two datasets used first:
(1)Geolife.This data set is collected by Microsoft Research, Asia.It has 182 users and 17621 rails
Mark.Total value length has reached 1292951 kms, and total time has reached 50176 hours (from April, 2007 in August, 2012).
Most positions are located at BeiJing, China.We are main to use the data set under scene at a slow speed, because portion big absolutely in data set
It is all at a slow speed to divide movement.
(2)T-Drive.The data set contains the track of 10357 taxis in one week.Total length reaches 9000000
Km, total positional number reaches 15000000.Due to taxi translational speed quickly, we by the data set be used for use at a high speed
Family.
Having three important parameters, we need setting at the very start:The radius r of circle is changed, the radius R of circle is tested, with
And privacy parameters ε.These parameters are associated, so we can not arbitrarily set them.If for example, r=1000, then
ε=0.1 is exactly insignificant, because the error of most publishing point is less than 4/ ε=40, and 1000 is obviously too big.At me
Experiment in, we set these parameters as follows:R=100, ε=0.02, R=100,150,200 even 300
(for comparing).Unit is rice.Although we can set other parameter values, it has been found that the performance of mechanism can't change
How much.So we use above-mentioned setting as explanation.
Use E (εc) and E (error) be used as the baseline compared.E (ε are obtained using above-mentioned parameter setup algorithmc) be
0.0173, E (error) is 100.Although the corresponding E (ε of PTM can not be calculatedc) and E (error), but average value can be used
Instead, and with the performance of baseline comparatively bright mechanism.
Stationary user
In this scene, True Data is replaced using the data of simulation.Reason is very simple:User is static, institute
Do not have any difference with analogue data and True Data.We have 100 tracks, and there are 1000 identical points every track.
Make 100 PTM on these tracks, and record the error of all privacy consumption, then draw their Cumulative Distribution Function.
We calculate the consumption of average privacy and error afterwards, and by its E (ε with GICM algorithmsc) compared with E (error).Fig. 4 shows
Our result is shown.
Three figures (a, c, e) on Fig. 4 left sides show the cumulative distribution of the corresponding privacy consumption under different testing radius R
Function.Fig. 4 (a) is shown, as R=100, and about 35% privacy consumption is no more than ε/10.Calculate and obtain average privacy consumption
For 0.0096, it means that we save 44.51% privacy consumption.This result is not notable, but in Fig. 4 (c), when
During R=200, the privacy consumption that there are about 85% is no more than ε/10.It may be seen that the Cumulative Distribution Function of privacy consumption with it is straight
Line y=0.9 is sufficiently close to.It is 0.0017 now to calculate obtained average privacy consumption.That is, we are saved more than 90%
Privacy is consumed.When R reaches 300, we are in Fig. 4 (e) it can be seen that most of point is all consumed without privacy.This benefits
Mechanism is skipped in us.Average privacy consumption is 0.00023, and we save the privacy consumption more than 98%.This is one
Huge lifting, has nearly reached ideal situation:Constant rank.
Three figures (b, d, f) on the right of Fig. 4 illustrate the cumulative distribution function of error under different R.We can be with from figure
Find out that this three curves are sufficiently close to, this show our PTM mechanism can't influence publishing point availability (sometimes even energy
Enough improve availability, such as Fig. 4 (f)).
User at a slow speed
In scene at a slow speed, distance between points is smaller, and we use this data set of Geolife.Although
User in Geolife data sets has the various vehicles, such as walking, bicycle, automobile and subway, among them
The overwhelming majority still use walking.Therefore it is suitable this data set to be used for into scene at a slow speed.Certainly, we can also delete
The track of some non-walkings causes the data set to meet our requirement.We randomly select 1000 tracks, then using PTM
Mechanism 1000 times.Just as us in static scene, we collect all privacies and consumed and error, then draw them
Cumulative distribution function.Then we using average value come the E (ε with GICM algorithmsc) compared with E (error).
Fig. 5 shows result.Three of left side figures (a, c, e) depict R from 100 change to 300 when corresponding privacy disappear
The cumulative distribution function of consumption.It may be seen that they are sufficiently close to Fig. 4, it means that our mechanism is under scene at a slow speed
Also what is worked is fine.Obtained in fact, we calculate, work as R=100, when 200,300, our PTM is saved respectively
40.46%, 81.5% and 93% privacy consumption.
It may be seen that error increases with R increase from three figures (b, d, f) on the right.As R=100,
Three curve is about the same.With R increase, the cumulative distribution function of error becomes more and more lower.This is easily explained:Work as R
When too big, error is easier accumulation so that our prediction is more and more inaccurate.However, in actual life, this degree
Loss of availability is acceptable.Our calculating obtain error under different R and add 5%, 22%, 46% respectively.With loss
Availability compare, it is clear that the consumption of the privacy of reduction is more notable.
Highspeed user
It is finally highspeed user.In this scene, we use T-Drive data sets.It is exhausted most in the data set
Number data come from taxi, therefore user is moved with one relative to high speed.We are adopted from initial data with fixed frequency
Sampling point, to ensure that the distance between two points are more than 100 meters in most cases.We equally depict cumulative distribution function
And calculated average value.It note that the Forecasting Methodology used in this scene is different from the Forecasting Methodology of the first two scene.
As a result show in figure 6.We can see that our PTM still can save the privacy consumption of about half.Tool
For body, we save 28%, 55% and 69% privacy consumption.It can not save substantial amounts of hidden as first two scene
The reason for private consumption, is very simple:User quickly moves so that track does not have good regularity.Therefore we can not be than calibrated
Really it is predicted.We also note that when R changes to 300 from 200, how much the privacy consumption of saving does not have increase.This
The precision and testing radius R for enlightening our Forecasting Methodologies constrain PTM overall performance jointly.
Fig. 6 also illustrates that availability is affected.Although the loss of availability in Fig. 6 (d) and Fig. 6 (b) can also receive, scheme
Loss of availability in 6 (f) is just difficult to receive.Consumed in fact, R is set into 300 and can't also save too many privacy.Will
It is a good selection that R, which is set to 200,.It is considered that when precision of prediction is not high enough, an excessive R is typically useless.
Claims (2)
1. a kind of difference privacy methods for protecting multiple positions in location information service, it is characterised in that comprise the following steps:
Step 1, the order of accuarcy of future position is tested using threshold value R and publishing point is determined, input user will protect real trace
Point X1,X2...;It is privacy parameters to be distributed gamma (2,1/ ε) values l, ε according to gamma;
Step 2, according to being uniformly distributed U (0,2 π) value θ;
Step 3, Y is made1=X1+(lcosθ,lsinθ);
Step 4, Y is exported1;
Step 5, i=2, s=0, f=1 are made;
Step 6, gamma (2,1/ ε) value l is distributed according to gamma;
Step 7, according to being uniformly distributed U (0,2 π) value θ;
Step 8, Y is madei=Xi+(lcosθ,lsinθ);
Step 9, structure forecast point P, produces random number ran (between 0,1), if ran < s/ (s+f):Issue Yi=P, i=i+
1, return to step 6;Otherwise:Perform step 10;
Step 10, if dis (Yi, P) and < R:Issue Yi=P, s=s+1;Otherwise:Issue Yi, f=f+1;
Step 11:I=i+1, return to step 6.
2. the difference privacy methods of multiple positions are protected in location information service as claimed in claim 1, it is characterised in that
Structure forecast point P method is:
When user is static, Pi=Yi-1;
When user moves slowly at, Pi=Yi-1;
When user's high-speed mobile, Pi=2Yi-1-Yi-2。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710433690.3A CN107247909B (en) | 2017-06-09 | 2017-06-09 | Differential privacy method for protecting multiple positions in position information service |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710433690.3A CN107247909B (en) | 2017-06-09 | 2017-06-09 | Differential privacy method for protecting multiple positions in position information service |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107247909A true CN107247909A (en) | 2017-10-13 |
CN107247909B CN107247909B (en) | 2020-05-05 |
Family
ID=60019278
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710433690.3A Active CN107247909B (en) | 2017-06-09 | 2017-06-09 | Differential privacy method for protecting multiple positions in position information service |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107247909B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110022531A (en) * | 2019-03-01 | 2019-07-16 | 华南理工大学 | A kind of localization difference privacy municipal refuse data report and privacy calculation method |
CN110516476A (en) * | 2019-08-31 | 2019-11-29 | 贵州大学 | Geographical indistinguishable location privacy protection method based on frequent location classification |
CN110633402A (en) * | 2019-09-20 | 2019-12-31 | 东北大学 | Three-dimensional space-time information propagation prediction method with differential privacy mechanism |
CN112487471A (en) * | 2020-10-27 | 2021-03-12 | 重庆邮电大学 | Differential privacy publishing method and system of associated metadata |
CN114065287A (en) * | 2021-11-18 | 2022-02-18 | 南京航空航天大学 | Track difference privacy protection method and system for resisting prediction attack |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100077484A1 (en) * | 2008-09-23 | 2010-03-25 | Yahoo! Inc. | Location tracking permissions and privacy |
CN104135362A (en) * | 2014-07-21 | 2014-11-05 | 南京大学 | Availability computing method of data published based on differential privacy |
CN105095447A (en) * | 2015-07-24 | 2015-11-25 | 武汉大学 | Distributed w-event differential privacy infinite streaming data distribution method |
-
2017
- 2017-06-09 CN CN201710433690.3A patent/CN107247909B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100077484A1 (en) * | 2008-09-23 | 2010-03-25 | Yahoo! Inc. | Location tracking permissions and privacy |
CN104135362A (en) * | 2014-07-21 | 2014-11-05 | 南京大学 | Availability computing method of data published based on differential privacy |
CN105095447A (en) * | 2015-07-24 | 2015-11-25 | 武汉大学 | Distributed w-event differential privacy infinite streaming data distribution method |
Non-Patent Citations (3)
Title |
---|
HOA NGO 等: "Location Privacy via Differential Private Perturbation of Cloaking Area", 《2015 IEEE 28TH COMPUTER SECURITY FOUNDATIONS SYMPOSIUM》 * |
仝伟 等: "抗大数据分析的隐私保护:研究现状与进展", 《网络与信息安全学报》 * |
杨松涛 等: "随机匿名的位置隐私保护方法", 《哈尔滨工程大学学报》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110022531A (en) * | 2019-03-01 | 2019-07-16 | 华南理工大学 | A kind of localization difference privacy municipal refuse data report and privacy calculation method |
CN110022531B (en) * | 2019-03-01 | 2021-01-19 | 华南理工大学 | Localized differential privacy urban garbage data report and privacy calculation method |
CN110516476A (en) * | 2019-08-31 | 2019-11-29 | 贵州大学 | Geographical indistinguishable location privacy protection method based on frequent location classification |
CN110633402A (en) * | 2019-09-20 | 2019-12-31 | 东北大学 | Three-dimensional space-time information propagation prediction method with differential privacy mechanism |
CN110633402B (en) * | 2019-09-20 | 2021-05-04 | 东北大学 | Three-dimensional space-time information propagation prediction method with differential privacy mechanism |
CN112487471A (en) * | 2020-10-27 | 2021-03-12 | 重庆邮电大学 | Differential privacy publishing method and system of associated metadata |
CN112487471B (en) * | 2020-10-27 | 2022-01-28 | 重庆邮电大学 | Differential privacy publishing method and system of associated metadata |
CN114065287A (en) * | 2021-11-18 | 2022-02-18 | 南京航空航天大学 | Track difference privacy protection method and system for resisting prediction attack |
CN114065287B (en) * | 2021-11-18 | 2024-05-07 | 南京航空航天大学 | Track differential privacy protection method and system for resisting predictive attack |
Also Published As
Publication number | Publication date |
---|---|
CN107247909B (en) | 2020-05-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107247909A (en) | A kind of difference privacy methods for protecting multiple positions in location information service | |
US11710035B2 (en) | Distributed labeling for supervised learning | |
US11989634B2 (en) | Private federated learning with protection against reconstruction | |
US10171947B2 (en) | Mobile application and device feature regulation based on profile data | |
Zheng et al. | Diagnosing New York city's noises with ubiquitous data | |
EP3461105B1 (en) | Positioning method and server | |
CN112868042B (en) | Systems, methods, and computer program products for fraud management using shared hash graphs | |
US8463289B2 (en) | Depersonalizing location traces | |
US9615248B2 (en) | Anonymous vehicle communication protocol in vehicle-to-vehicle networks | |
Wang et al. | On the serviceability of mobile vehicular cloudlets in a large-scale urban environment | |
EP3349126B1 (en) | Method, device, storage medium, and apparatus for automatically discovering fuel station poi | |
CN108696558B (en) | Position information processing method and device | |
CN108712375B (en) | Coordinate encryption method, coordinate encryption system and vehicle with coordinate encryption system | |
US20200027340A1 (en) | A method and apparatus for predicting a location where a vehicle appears | |
Lee et al. | Grid-based cloaking area creation scheme supporting continuous location-based services | |
US10444062B2 (en) | Measuring and diagnosing noise in an urban environment | |
CN112039861A (en) | Risk identification method and device, electronic equipment and computer readable storage medium | |
Cui et al. | Privacy and accuracy for cloud-fog-edge collaborative driver-vehicle-road relation graphs | |
CN111221827B (en) | Database table connection method and device based on graphic processor, computer equipment and storage medium | |
CN111861540A (en) | Information pushing method and device, computer equipment and storage medium | |
Xiong et al. | A spatial entropy-based approach to improve mobile risk-based authentication | |
Iwata et al. | A stayed location estimation method for sparse GPS positioning information | |
CN114817881A (en) | Account abnormity detection method and device, computer equipment and storage medium | |
Barnes | Privacy in connected and automated vehicles | |
US10789347B1 (en) | Identification preprocessing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |