Summary of the invention
Have in view of that, be necessary to provide a kind of track data method for secret protection based on spatial clustering, should be low based on the track data method for secret protection computation complexity of spatial clustering, the spatial data of big data quantity can be adapted to.
For achieving the above object, the present invention adopts following technical proposals:
Based on a track data method for secret protection for spatial clustering, comprise the steps:
Step S110: build privacy model to obtain individual privacy risk metric;
Step S120: the actual conditions contrasting described individual privacy risk metric and the distribution of Urban population risk domain, builds privacy risk protection; And
Step S130: evaluate the effect after secret protection.
In certain embodiments, wherein, step S110, builds privacy model and comprises the steps:
Build the sequence of each individual activity, obtain the set of individual frequent activities point according to individual activity Annual distribution;
Metric form based on K-anonymity obtains frequent activities point set, obtains the metric that described frequent activities point concentrates each individual privacy risk;
Based on different events and assailant can the number of getable individual activity point, obtain different moving point by the different value of individual privacy risk under the prerequisite known.
In certain embodiments, in step S120, contrast the actual conditions of described individual privacy risk metric and the distribution of Urban population risk domain, build privacy risk protection, comprise the steps:
When event, number was known based on individual difference, respectively different levels merging is spatially carried out to the mobile terminal locating base station of Urban population;
Wherein, the space merging of the base station, position of described mobile terminal carries out merging according to distance from small to large.
In certain embodiments, wherein, in step S130, evaluate the effect after secret protection, comprise the steps:
Step S131: the message registration of individual ownership is sorted according to time order and function;
Step S132: the individuality finding continuous message registration, and continuous two time point positions be there occurs change and regard as and once move, and preceding for time point position is designated as O as starting point, posterior for time point position is regarded as destination and is designated as D;
Step S133: the summation calculating OD before and after secret protection respectively;
Step S134: the change of comparing individual privacy risk metric change and OD before and after data protection respectively, and then obtain the protected effect protecting front and back data.
In certain embodiments, wherein, step S133: the summation calculating OD before and after secret protection respectively, calculates by adopting following computing formula:
Wherein, N represents total number of TAZ, i and j represents the numbering of TAZ from 1 to N, OD
ijrepresent according to raw data statistics obtain from TAZ
ito TAZ
joD flow, OD'
ijthe corresponding value that data statistics after expression basis has done secret protection process obtains, the traffic zone that described TAZ is divided by traffic programme unit in city.
On the other hand, present invention also offers a kind of intimacy protection system based on excessive risk frequent activities point replacement policy, comprising:
Model construction module, for building privacy model to obtain individual privacy risk metric;
Classification of risks module, contrasts the actual conditions of described individual privacy risk metric and the distribution of Urban population risk domain, builds privacy risk protection; And
Effect assessment module, for evaluating the effect after secret protection.
The technique effect that the present invention adopts technique scheme to bring is:
On the one hand; track data method for secret protection based on spatial clustering provided by the invention and system; by building privacy model to obtain individual privacy risk metric; contrast the actual conditions of described individual privacy risk metric and the distribution of Urban population risk domain; structure privacy risk is protected; evaluate the effect after secret protection, said method simple computation complexity is low, can be adapted to the spatial data of big data quantity.
On the other hand, track data method for secret protection based on spatial clustering provided by the invention and system, angle based on frequent activities point set builds the privacy model of individual subject, compensate for the Privacy Protection in prior art shortage frequent activities point set research mobile phone location data issuing process, the research of the data utilization benefit of data acquisition new after adding data publication in mobile phone location data (CDR) issue is according to method for secret protection, the method that highlights uses with reality and combines; Simultaneously building the operability of further balancing method in privacy model and guard method and actual application feature, there is very strong animal migration and portability.
Embodiment
For the ease of understanding the present invention, below with reference to relevant drawings, the present invention is described more fully.Better embodiment of the present invention is given in accompanying drawing.But the present invention can realize in many different forms, is not limited to embodiment described herein.On the contrary, provide the object of these embodiments be make to disclosure of the present invention understand more thorough comprehensively.
Unless otherwise defined, all technology used herein and scientific terminology are identical with belonging to the implication that those skilled in the art of the present invention understand usually.The object of term used in the description of the invention herein just in order to describe concrete embodiment, is not intended to be restriction the present invention.Term as used herein " and/or " comprise arbitrary and all combinations of one or more relevant Listed Items.
As shown in Figure 1, the track data method for secret protection 100 based on spatial clustering provided for the embodiment of the present invention one comprises the steps:
Step S110: build privacy model to obtain individual privacy risk metric;
Refer to Fig. 2, step S110, build privacy model and comprise the steps:
Step S111: the sequence building each individual activity, obtains the set of individual frequent activities point according to individual activity Annual distribution;
Step S112: the metric form based on K-anonymity obtains frequent activities point set, obtains the metric that described frequent activities point concentrates each individual privacy risk;
Step S113: based on different events and assailant can the number of getable individual activity point, obtain different moving point by the different value of individual privacy risk under the prerequisite known.
Be appreciated that k-anonymity technology is the effective method for secret protection of one comparatively early used.Its core concept divides set according to the standard identifier of mobile object, in a set, the standard identifier of all mobile objects is all identical, each object cannot separate with other k-1 target area in set, and namely this k-1 object achieves anonymity in set.Under the help not having other supplementarys, in set, the heavy identification probability of the identity of each object is 1/k.Here k represents the number of mobile object in anonymous set, and the heavy identification probability of the larger then mobile object of k is lower, and privacy risk is lower.Standard identifier refers to the set of multiple attributes that uniquely can identify certain mobile object, such as birthday, sex and age etc.Do not comprise such attribute information in major part track data, but using the frequent activities place of mobile object as standard identifier, more special for frequent activities place user heavily can be identified.Time extreme situation is k=1, that is, only have an object in anonymous set, the heavy identification probability of this mobile object reaches 100%.
Step S120: the actual conditions contrasting described individual privacy risk metric and the distribution of Urban population risk domain, builds privacy risk protection;
Particularly, when event, number was known based on individual difference, respectively different levels merging is spatially carried out to the mobile terminal locating base station of Urban population;
Wherein, the space merging of the base station, position of described mobile terminal carries out merging according to distance from small to large.
Particularly, when event, number was known based on individual difference, respectively different levels merging is spatially carried out to the mobile terminal locating base station of Urban population, specifically realizes according to following method:
(1) first according to Thiessen polygon matching base station effective coverage range;
(2) center of gravity of each base station coverage area is obtained;
(3) grid (wherein, spatial grid is initially the grid of 200mX200m) according to a certain size covers whole survey region, and the distance then according to 200m constantly expands grid;
(4) base station area center of gravity being dropped on same grid merges, base station range new after obtaining merging.
In reality, because the locator data of mobile terminal positions based on base station location, base station location is fixing on geographical space, is carrying out in the process of spatial clustering to base station, and the space merging of base station, individual body position carries out merging according to distance from small to large.After merging, origin-location information will by obfuscation.The distance merged is larger, and spatial dimension is larger.
Be appreciated that the method for secret protection of spatial clustering is polymerized the service range of base station, the scope of expansion space unit.For data in mobile phone, the ambiguous location of the mobile object recorded in data in mobile phone is the position of base station service range central point, the densely distributed degree of base station and the distribution density correlation to a great extent of population.But, here the number of the base station of unfixing polymerization, after just simulating the service range of each base station, service range focus point being arranged in the base station of the grid of same particular space yardstick aggregates into a large region, and using the space cell of the region after polymerization as research.It is uncertain for adopting the base station number of being polymerized in this way, by polymerization after region be made up of multiple base stations service range, some regions are not polymerized veritably, still only have the service range size of a base station.Fig. 3 illustrates the process of base station service range polymerization.Example as can be seen from figure, the focus point being numbered the service range of the base station of 1,2,3 is positioned at same grid (green rectangle represents), the service range of these three base stations is aggregated into a large region (region comprised in the thick frame of grey in figure), and the region after polymerization comprises the signal cover of these 3 base stations.Other zones of convergency in figure also more or less contain multiple base station, the base station number just comprised is different, and the region shape after the polymerization of formation is also irregular.
Step S130: evaluate the effect after secret protection;
Refer to Fig. 4, wherein, in step S130, evaluate the effect after secret protection, comprise the steps:
Step S131: the message registration of individual ownership is sorted according to time order and function;
Step S132: the individuality finding continuous message registration, and continuous two time point positions be there occurs change and regard as and once move, and preceding for time point position is designated as O as starting point, posterior for time point position is regarded as destination and is designated as D;
Step S133: the summation calculating OD before and after secret protection respectively;
Step S134: compare the change of privacy risk and the change of OD before and after data protection respectively, and then obtain the protected effect protecting front and back data.
Be appreciated that the individual privacy risk metric of raw data is calculated by K-anonymous way, after spatial clustering, after individual location fuzzy, still can calculate by K-anonymous way.Now, individual anonymous rally becomes large, and risk can diminish.
Be appreciated that the situation of movement each user of statistics is from a base station to another base station, each movement between base station is all denoted as once goes on a journey.Concrete way is that message registrations all for each mobile object sorts according to time order and function by we, the movement of continuous print message registration position is found out, namely continuous two time point positions there occurs change and regard as and once move, we will once move in the preceding position of time point as starting point (Origin, O), posterior for time point position is regarded as destination (Destination, D).Fig. 4 illustrates the part trip situation of the mobile object that we obtain.
Further, wherein, step S133: the summation calculating OD before and after secret protection respectively, calculates by adopting following computing formula:
Wherein, N represents total number of TAZ, i and j represents the numbering of TAZ from 1 to N, OD
ijrepresent according to raw data statistics obtain from TAZ
ito TAZ
joD flow, OD'
ijthe corresponding value that data statistics after expression basis has done secret protection process obtains, the traffic zone that described TAZ is divided by traffic programme unit in city.
Be appreciated that, traffic zone (the TAZ divided by traffic programme unit in city, TrafficAnalysisZone) social and economic background of region is considered when dividing, larger than the spatial granularity of base station and relatively stable, be the research unit that field of traffic often uses.So, the present invention is based on the OD flow of TAZ, by the OD map traffic of different spaces dimension calculation on TAZ, different space scales comprises the results area after the Thiessen polygon of origination base station data genaration and base station polymerization, each space cell TAZ under different scale is cut, also shared by multiple TAZ of this space cell of cutting the OD flow that should flow out from this space cell, the flow proportional shared is that the area ratio cutting this space cell according to TAZ calculates.
According to the Pareto Principle be extensively present in field of traffic, the i.e. eighty-twenty rule, show that the road of about 20% carries the magnitude of traffic flow of 80%.The application of this principle can be known in our study, in the OD flow of our statistics, most flows necessarily by a small amount of OD to producing.Therefore, 80% flow extracts separately as main flow by we, studies the impact of above-mentioned method for secret protection for these main flows.
Refer to Fig. 5, comprise model construction module 210 for provided by the invention based on the intimacy protection system 200 of excessive risk frequent activities point replacement policy, for building privacy model to obtain individual privacy risk metric; Classification of risks module 220, contrasts the actual conditions of described individual privacy risk metric and the distribution of Urban population risk domain, builds privacy risk protection; And effect assessment module 230, for evaluating the effect after secret protection.Because said system specific implementation is described in detail aforementioned, repeat no more here.
Track data method for secret protection based on spatial clustering provided by the invention and system are by building privacy model to obtain individual privacy risk metric; contrast the actual conditions of described individual privacy risk metric and the distribution of Urban population risk domain; structure privacy risk is protected; evaluate the effect after secret protection; said method simple computation complexity is low, can be adapted to the spatial data of big data quantity.
On the other hand, track data method for secret protection based on spatial clustering provided by the invention and system, angle based on frequent activities point set builds the privacy model of individual subject, compensate for the Privacy Protection in prior art shortage frequent activities point set research mobile phone location data issuing process, the research of the data utilization benefit of data acquisition new after adding data publication in mobile phone location data (CDR) issue is according to method for secret protection, the method that highlights uses with reality and combines; Simultaneously building the operability of further balancing method in privacy model and guard method and actual application feature, there is very strong animal migration and portability.
Embodiment
It is survey region that this method chooses Shenzhen, by the mobile phone communication data (CDR) of survey region in 2011 as data source.Refer to Fig. 6, be the base station location comprised and the Tyson shape changeable schematic diagram generated according to base station location, its son, in table 1, the co-ordinate position information of record represents the information of the positional information record data of base station.Adopt method of the present invention to data processing.
Table 1 user anonymity is designated the part message registration of the cellphone subscriber of " 0000****50 "
Cellphone subscriber's anonymous identification |
The Time To Event such as beat/to answer the call |
Latitude coordinates |
Longitude coordinate |
Event type |
Area code |
0000****50 |
2011/5/1 4:39 |
22.550833 |
114.125833 |
0 |
0755 |
0000****50 |
2011/5/1 5:45 |
22.542534 |
114.11719 |
0 |
0755 |
0000****50 |
2011/5/1 6:03 |
22.542534 |
114.11719 |
1 |
0755 |
0000****50 |
2011/5/1 9:42 |
22.542534 |
114.11719 |
1 |
0755 |
0000****50 |
2011/5/1 10:27 |
22.542534 |
114.11719 |
0 |
0755 |
0000****50 |
2011/5/1 15:38 |
22.542534 |
114.11719 |
1 |
0755 |
0000****50 |
2011/5/1 16:06 |
22.542534 |
114.11719 |
0 |
0755 |
0000****50 |
2011/5/1 16:15 |
22.544722 |
114.119444 |
0 |
0755 |
0000****50 |
2011/5/1 16:56 |
22.546667 |
114.120306 |
1 |
0755 |
0000****50 |
2011/5/1 18:00 |
22.550833 |
114.122222 |
0 |
0755 |
0000****50 |
2011/5/1 19:46 |
22.548929 |
114.111791 |
1 |
0755 |
0000****50 |
2011/5/1 20:22 |
22.546667 |
114.120306 |
1 |
0755 |
0000****50 |
2011/5/1 20:24 |
22.546667 |
114.120306 |
0 |
0755 |
0000****50 |
2011/5/1 20:47 |
22.546667 |
114.120306 |
1 |
0755 |
0000****50 |
2011/5/1 21:15 |
22.546667 |
114.120306 |
0 |
0755 |
0000****50 |
2011/5/1 21:34 |
22.546667 |
114.120306 |
0 |
0755 |
0000****50 |
2011/5/1 22:00 |
22.546667 |
114.120306 |
1 |
0755 |
0000****50 |
2011/5/1 22:24 |
22.55195 |
114.12525 |
0 |
0755 |
0000****50 |
2011/5/1 22:52 |
22.55195 |
114.12525 |
1 |
0755 |
0000****50 |
2011/5/1 22:53 |
22.55195 |
114.12525 |
1 |
0755 |
0000****50 |
2011/5/1 23:12 |
22.550833 |
114.125833 |
0 |
0755 |
Method for secret protection based on spatial clustering can reduce the privacy risk of whole track data collection significantly; if Fig. 7 is privacy risk decline curve; wherein (a) is frequent activities point attack model; during N=1, the change of privacy risk is not obvious; work as N=2; along with the rising of spatial resolution when 3, completely heavily identified that population number percent obviously declines, what decline during N=3 is more obvious.When spatial resolution is increased to 2800m, completely heavily identify during N=2 that population ratio 17% drops to 1% from, close to value-at-risk during N=1, during N=3, drop to 12% from 49%, spatial resolution become larger time privacy risk decline curve convergence mild.For random point attack model; if (b) in Fig. 7 is random point attack model; we select random point number to be that the method for secret protection of the case study spatial clustering of 4 and 8 reduces the impact of degree for privacy risk; as can be seen from the figure, the number of event is 4 and 8 is climacterics on privacy risk curve.When spatial resolution is increased to 2800m, the population number percent completely heavily identified, drops to 40% from 75% during 4 points, drops to 64% during 8 points from 80%.From the slow reduction of privacy risk decline curve during 8 points, we can know, if the tracing point that the background knowledge of assailant comprises mobile object is more, reduce its heavy identification risk more difficult.
Refer to Fig. 8, wherein, (a) is the availability of data loss that 491 TAZ are corresponding; b () is the availability of data loss that 1112 TAZ are corresponding; for the situation of availability of data loss before and after data protection, can see from the above results, in this research process.Based on method of the present invention, mobile phone talking position data are protected, can not only protect privacy of user, also ensure that the benefit that data use simultaneously.This method to the information loss of the change of DATA POPULATION and data W-response also in regulatable scope.
Above-described embodiment of the present invention, does not form limiting the scope of the present invention.Any amendment done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within claims of the present invention.