CN106295395A - The uncertain method for protecting track privacy divided based on figure - Google Patents
The uncertain method for protecting track privacy divided based on figure Download PDFInfo
- Publication number
- CN106295395A CN106295395A CN201610597003.7A CN201610597003A CN106295395A CN 106295395 A CN106295395 A CN 106295395A CN 201610597003 A CN201610597003 A CN 201610597003A CN 106295395 A CN106295395 A CN 106295395A
- Authority
- CN
- China
- Prior art keywords
- track
- uncertain
- trajectory
- represent
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Databases & Information Systems (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明公开基于图划分的不确定轨迹隐私保护方法,其包括步骤(1)数据预处理:对原始轨迹数据集进行预处理,使得不确定轨迹之间在时间维度上有交集,且每条不确定轨迹都有同样的采样点个数;步骤(2)关联度构建:对预处理之后的轨迹数据提取时间特征、方向特征和距离特征,计算不同的不确定轨迹之间的关联度;步骤(3)无向图构建:将轨迹数据集映射成一个无向图,该无向图中每个节点代表一条轨迹,节点之间边的权值代表两条相应不确定轨迹之间的关联度;步骤(4)无向图划分:利用贪婪算法对无向图进行划分,形成若干个含有k条不确定轨迹的聚类。本发明可让用户根据隐私保护要求,均衡数据信息损失和隐私水平。
The invention discloses a privacy protection method for uncertain trajectories based on graph division, which includes step (1) data preprocessing: preprocessing the original trajectory data set, so that there is intersection between uncertain trajectories in the time dimension, and each Determine that the trajectory has the same number of sampling points; step (2) construction of correlation degree: extract time feature, direction feature and distance feature from the preprocessed trajectory data, and calculate the correlation degree between different uncertain trajectories; step ( 3) Undirected graph construction: map the trajectory data set into an undirected graph, each node in the undirected graph represents a trajectory, and the weight of the edge between nodes represents the degree of association between two corresponding uncertain trajectories; Step (4) Undirected graph division: Use the greedy algorithm to divide the undirected graph to form several clusters containing k uncertain trajectories. The invention allows users to balance data information loss and privacy level according to privacy protection requirements.
Description
技术领域technical field
本发明涉及图论领域,特别是涉及一种基于图划分的不确定轨迹隐私保护方法。The invention relates to the field of graph theory, in particular to an uncertain trajectory privacy protection method based on graph division.
背景技术Background technique
在智慧城市以及物联网的时代,人们很容易从装有GPS的手机等移动设备采集到各种各样的数据,比如位置,时间,速度等。通过采集到的数据可以为人们提供基于位置的服务(LBS),如查找最近的加油站,医院等。由于采集设备的精度不高或者采集错误,采集的经度纬度都可能存在着误差。基于位置的服务LBS在给用户带来便利的同时,也存在着安全隐患。不确定轨迹数据的发布容易泄露用户的隐私,因此对离线场景下不确定轨迹的发布的隐私处理变得极为重要。In the era of smart cities and the Internet of Things, it is easy for people to collect various data, such as location, time, speed, etc., from mobile devices such as mobile phones equipped with GPS. The collected data can provide people with location-based services (LBS), such as finding the nearest gas station, hospital, etc. Due to the low accuracy of the collection equipment or collection errors, there may be errors in the longitude and latitude collected. While location-based service LBS brings convenience to users, it also has potential safety hazards. The release of uncertain trajectory data is easy to leak the privacy of users, so the privacy processing of the release of uncertain trajectory data in offline scenarios becomes extremely important.
在轨迹发布的研究中,k匿名技术已经成为主流。现已有一些工作基于图划分来构建轨迹k匿名集。已有研究证明了k-节点图划分与构建轨迹k匿名集的问题是NP完全的。轨迹数据集可以用无向图表示,其中节点代表轨迹,边的权值代表轨迹间的关联度。然而,他们的方法仅将轨迹间有重叠的点比例来衡量轨迹间的相似度,而没有考虑方向相似度,因此他们又同时考虑轨迹的方向相似性和轨迹间的距离,提出一个个性化的隐私保护方法,在数据实用性和隐私水平间取得一个平衡。虽然这种方法最后达到了隐私保护的目的,仍存在一些不足之处。首先,它没有考虑到轨迹的不确定度,由于GPS采集设备精度有限,或者采集错误,不可避免的会产生误差。轨迹的不确定将会影响到轨迹间距离以及轨迹方向相似性的计算;其次,该方法没有考虑到轨迹在时间维度的相似性,而是在轨迹数据的预处理阶段就已经将轨迹处理成时间在同一个区间内,消除了时间上的差异性,从而使得算法可行性不高。In the research released by the trajectory, the k-anonymity technology has become the mainstream. There have been some works based on graph partitioning to construct trajectory k-anonymity sets. It has been proved that the problem of partitioning k-node graph and constructing trajectory k-anonymous set is NP-complete. Trajectory datasets can be represented by an undirected graph, where nodes represent trajectories, and edge weights represent the degree of association between trajectories. However, their method only uses the proportion of overlapping points between trajectories to measure the similarity between trajectories, but does not consider the directional similarity. Therefore, they also consider the directional similarity of trajectories and the distance between trajectories, and propose a personalized The privacy protection method strikes a balance between data utility and privacy level. Although this method finally achieves the purpose of privacy protection, there are still some deficiencies. First of all, it does not take into account the uncertainty of the trajectory. Due to the limited accuracy of GPS acquisition equipment or acquisition errors, errors will inevitably occur. The uncertainty of trajectories will affect the calculation of the distance between trajectories and the similarity of trajectory directions; secondly, this method does not consider the similarity of trajectories in the time dimension, but processes the trajectories into time in the preprocessing stage of trajectory data. In the same interval, the time difference is eliminated, which makes the algorithm less feasible.
因此,在采集精度不高的不确定轨迹隐私保护问题,数据信息损失和隐私水平平衡问题,已成为本领域技术人员的主要研究方向。Therefore, the privacy protection of uncertain trajectories with low acquisition accuracy, the balance of data information loss and privacy level, has become the main research direction of those skilled in the art.
发明内容Contents of the invention
本发明的目的在于克服现有技术的不足,提供基于图划分的不确定轨迹隐私保护方法。The purpose of the present invention is to overcome the deficiencies of the prior art and provide a privacy protection method for uncertain trajectories based on graph division.
本发明采用的技术方案是:The technical scheme adopted in the present invention is:
基于图划分的不确定轨迹隐私保护方法,其包括以下步骤:An uncertain trajectory privacy protection method based on graph partitioning, which includes the following steps:
步骤(1)数据预处理:对原始轨迹数据集进行预处理,使得不确定轨迹之间在时间维度上有交集,且每条不确定轨迹都有同样的采样点个数;Step (1) Data preprocessing: Preprocessing the original trajectory data set, so that there is an intersection between uncertain trajectories in the time dimension, and each uncertain trajectory has the same number of sampling points;
步骤(2)关联度构建:对预处理之后的轨迹数据提取时间特征、方向特征和距离特征,根据时间特征、方向特征和距离特征计算不同的不确定轨迹之间的关联度;Step (2) Correlation degree construction: extract time feature, direction feature and distance feature from the preprocessed trajectory data, and calculate the correlation degree between different uncertain trajectories according to the time feature, direction feature and distance feature;
步骤(3)无向图构建:将轨迹数据集映射成一个无向图,该无向图中每个节点代表一条轨迹,节点之间边的权值代表两条相应不确定轨迹之间的关联度;Step (3) Undirected graph construction: map the trajectory data set into an undirected graph, each node in the undirected graph represents a trajectory, and the weight of the edges between nodes represents the association between two corresponding uncertain trajectories Spend;
步骤(4)无向图划分:利用贪婪算法对无向图进行划分,形成若干个含有k条不确定轨迹的聚类。我们将边集中的边按权值从小到大升序排序,找出最小的边,取出其两个端点节点放入集合Ri中,当集合Ri中节点个数少于k个时,从未被挑选过的节点集合中挑出与Ri中节点关联度最大的节点,并放入集合Ri中,直到集合Ri中节点数目满足k个为止。挑选结束后剩余的不足以凑足k个节点的集合舍弃掉。Step (4) Undirected graph division: Use the greedy algorithm to divide the undirected graph to form several clusters containing k uncertain trajectories. We sort the edges in the edge set in ascending order of weight from small to large, find out the smallest edge, take out its two end nodes and put them into the set R i . When the number of nodes in the set R i is less than k, never From the selected node set, select the node with the highest degree of correlation with the node in R i , and put it into the set R i , until the number of nodes in the set R i meets k. After the selection, the remaining ones that are not enough to make up the set of k nodes are discarded.
进一步地,所述步骤(1)的采样点个数为30。Further, the number of sampling points in the step (1) is 30.
进一步地,所述步骤(2)中的时间特征、方向特征和距离特征的权重之和必须为1,可根据用户的个性化需求调整三个部分的权重,以实现隐私保护水平与数据实用性之间的权衡。Further, the sum of the weights of the time feature, direction feature, and distance feature in the step (2) must be 1, and the weights of the three parts can be adjusted according to the user's individual needs to achieve the level of privacy protection and data availability balance between.
进一步地,所述步骤(2)中不确定轨迹之间的关联度的计算包括以下步骤:Further, the calculation of the degree of association between uncertain trajectories in the step (2) includes the following steps:
A,分别计算两条不确定轨迹的不确定半径;A, respectively calculate the uncertain radius of the two uncertain trajectories;
B,计算不确定轨迹的轨迹方向相似度;B, calculate the trajectory direction similarity of the uncertain trajectory;
C,计算不确定轨迹的轨迹距离;C, calculate the trajectory distance of the uncertain trajectory;
D,计算不确定轨迹的轨迹时间相似度;D, calculate the trajectory time similarity of the uncertain trajectory;
E,计算两条不确定轨迹的关联度。E, Calculate the degree of correlation between two uncertain trajectories.
所述步骤A中不确定轨迹的不确定半径的计算方法为:The calculation method of the uncertain radius of the uncertain trajectory in the step A is:
A1,令Tp和Tq分别代表需要计算不确定半径的两条不确定轨迹;A1, let Tp and Tq respectively represent two uncertain trajectories that need to calculate the uncertain radius;
A2,计算轨迹Tp在第i个采样点的速度 A2, calculate the velocity of the trajectory Tp at the i-th sampling point
其中代表轨迹Tp第i采样点的经度,代表轨迹Tp第i采样点的纬度;代表轨迹Tp第i+1采样点的经度,代表轨迹Tp第i+1采样点的纬度,代表轨迹Tp第i点的采样时间,代表轨迹Tp第i+1点的采样时间;in represents the longitude of the i-th sampling point of the trajectory Tp, Represents the latitude of the i-th sampling point of the trajectory Tp; Represents the longitude of the i+1 sampling point of the trajectory Tp, Represents the latitude of the i+1 sampling point of the trajectory Tp, Represents the sampling time of the i-th point of the trajectory Tp, Represents the sampling time of the i+1th point of the trajectory Tp;
A3,计算轨迹Tq在第i个采样点的速度 A3, calculate the velocity of the trajectory Tq at the i-th sampling point
其中代表轨迹Tq第i采样点的经度,代表轨迹Tq第i采样点的纬度;代表轨迹Tq第i+1采样点的经度,代表轨迹Tq第i+1采样点的纬度,代表轨迹Tq第i点的采样时间,代表轨迹Tq第i+1点的采样时间;in represents the longitude of the i-th sampling point of the trajectory Tq, Represents the latitude of the i-th sampling point of the trajectory Tq; Represents the longitude of the i+1th sampling point of the trajectory Tq, Represents the latitude of the i+1 sampling point of the trajectory Tq, Represents the sampling time of the i-th point of the trajectory Tq, Represents the sampling time of the i+1th point of the trajectory Tq;
A4,计算轨迹Tp在第i个采样点的不确定半径计算轨迹Tq在第i个采样点的不确定半径 A4, calculate the uncertainty radius of the trajectory Tp at the i-th sampling point Calculate the uncertainty radius of the trajectory Tq at the i-th sampling point
所述步骤B中不确定轨迹的轨迹方向相似度的计算方法为:The calculation method of the trajectory direction similarity of the uncertain trajectory in the step B is:
B1,计算轨迹Tp和Tq在第i段子轨迹的方向期望角度余弦:B1, calculate the expected angle cosine of the direction of the i-th sub-trajectory of the trajectory Tp and Tq:
其中为轨迹Tp和Tq在第i段子轨迹的方向期望角度余弦,为轨迹Tp在第i段子轨迹的方向向量,为轨迹Tq在第i段子轨迹的方向向量;in is the expected angle cosine of the direction of the i-th sub-trajectory of the trajectories Tp and Tq, is the direction vector of the i-th sub-trajectory of the trajectory Tp, is the direction vector of the i-th sub-trajectory of the trajectory Tq;
B2,计算轨迹Tp和Tq在第i段子轨迹的角度变化范围其中为轨迹Tp在第i段子轨迹的角度变化范围,为轨迹Tq在第i段子轨迹的角度变化范围:B2, calculate the angular variation range of the i-th sub-trajectory of the trajectory Tp and Tq in is the angular change range of the i-th sub-trajectory of the trajectory Tp, is the angle change range of the i-th sub-track of the track Tq:
令向量记为向量记为 order vector recorded as vector recorded as
其中表示第p条轨迹Tp的第i个采样点的经度坐标,表示第p条轨迹第i+1个采样点的经度坐标,表示第p条轨迹Tp第i+1个采样点的不确定半径,表示第p条轨迹Tp第i个采样点的不确定半径,表示第p条轨迹Tp第i个采样点的纬度坐标,表示第p条轨迹Tp第i+1个采样点的纬度坐标;in Indicates the longitude coordinates of the i-th sampling point of the p-th trajectory Tp, Indicates the longitude coordinates of the i+1th sampling point of the pth track, Indicates the uncertainty radius of the i+1 sampling point of the p-th trajectory Tp, Indicates the uncertainty radius of the i-th sampling point of the p-th trajectory Tp, Indicates the latitude coordinates of the i-th sampling point of the p-th trajectory Tp, Represents the latitude coordinates of the i+1 sampling point of the p-th track Tp;
则轨迹Tp在第i段子轨迹上的角度变化范围对应的余弦值满足 Then the angle change range of the trajectory T p on the i-th sub-trajectory Corresponding cosine value satisfy
同理计算轨迹Tq在第i段子轨迹上的角度变化范围的余弦值 In the same way, calculate the angle change range of the trajectory T q on the i-th sub-trajectory cosine of
不确定轨迹段之间的角度变化差异为可由和间接得出;The difference in angle change between uncertain trajectory segments is available by and derived indirectly;
B3,计算将轨迹的方向相似性记为Sdire[ζ,p,q],B3, the calculation records the directional similarity of the trajectory as S dire [ζ,p,q],
其中n为采样点个数。Where n is the number of sampling points.
所述步骤C中轨迹距离为两个采样点期望坐标的欧式距离与各自不确定半径之和,轨迹距离的具体计算方法包括以下步骤:In the step C, the trajectory distance is the sum of the Euclidean distance of the expected coordinates of the two sampling points and the respective uncertain radii, and the specific calculation method of the trajectory distance includes the following steps:
C1,定义轨迹距离两个采样点期望坐标的欧式距离记为则任意两条轨迹中的对应某一轨迹段之间的距离为:C1, define the Euclidean distance between the trajectory and the expected coordinates of two sampling points as Then the distance between any two trajectories corresponding to a certain trajectory segment is:
其中为计算轨迹Tp在第i个采样点的不确定半径,为计算轨迹Tq在第i个采样点的不确定半径;in To calculate the uncertainty radius of the trajectory Tp at the i-th sampling point, To calculate the uncertainty radius of the trajectory Tq at the i-th sampling point;
C2,定义任意两条轨迹的距离为:C2, define the distance between any two trajectories as:
所述步骤D中所述轨迹时间相似性的具体计算方法为:The specific calculation method of the trajectory time similarity described in the step D is:
D1,令和分别代表轨迹Tp的起始时间和结束时间,和分别代表轨迹Tq的起始时间和结束时间;D1, order and represent the start time and end time of the trajectory Tp, respectively, and represent the start time and end time of the trajectory Tq, respectively;
D2,不确定轨迹Tp和Tq的轨迹时间相似性Stime[p,q]为:D2, the trajectory time similarity S time [p,q] of the uncertain trajectory Tp and Tq is:
所述步骤E中的两条不确定轨迹的关联度的计算由以下公式获得:The calculation of the correlation degree of the two uncertain trajectories in the step E is obtained by the following formula:
Wpq=α·(1-Stime[Tp,Tq])+β·(1-Sdire[ζ,Tp,Tq])+γ·Dloc[ζ,Tp,Tq] (6)W pq =α·(1-S time [T p ,T q ])+β·(1-S dire [ζ,T p ,T q ])+γ·D loc [ζ,T p ,T q ] (6)
其中Wpq为不确定轨迹Tp和Tq之间的关联度,Stime[Tp,Tq]为不确定轨迹Tp和Tq的时间相似性,Sdire[ζ,Tp,Tq]为不确定轨迹Tp和Tq的轨迹方向相似性,Dloc[ζ,Tp,Tq]为不确定轨迹Tp和Tq的轨迹距离,α为不确定轨迹Tp和Tq的时间相似性Stime[Tp,Tq]的权重系数,β为不确定轨迹Tp和Tq的轨迹方向相似性Sdire[ζ,Tp,Tq]的权重系数,γ为不确定轨迹Tp和Tq的轨迹距离Dloc[ζ,Tp,Tq]的权重系数。where W pq is the correlation degree between uncertain trajectories Tp and Tq, S time [T p , T q ] is the time similarity between uncertain trajectories Tp and Tq, S dire [ζ, T p , T q ] is the Determine the trajectory direction similarity of the trajectory Tp and Tq, D loc [ζ,T p ,T q ] is the trajectory distance of the uncertain trajectory Tp and Tq, α is the time similarity of the uncertain trajectory Tp and Tq S time [T p ,T q ] weight coefficient, β is the weight coefficient of the trajectory direction similarity S dire [ζ,T p ,T q ] of the uncertain trajectory Tp and Tq, and γ is the trajectory distance D loc [ ζ, T p , T q ] weight coefficients.
本发明采用以上技术方案,采用时间重叠相似性,方向相似性和不确定轨迹之间的距离,衡量不确定轨迹之间的关联度,并将轨迹数据集映射成一个无向图,该无向图中每个节点代表一条不确定轨迹,节点之间边的权值代表两条相应不确定轨迹之间的关联度。最后,利用贪婪算法对无向图进行划分,形成若干个含有k条轨迹的聚类,实现k匿名。将采集得到的原始轨迹数据进行预处理,其主要是指需要保证轨迹之间包含在时间维度上有交集,如共同包含有[t1,t2]这个时间区间,且每条轨迹都有同样的采样点个数,如都有30个经纬度采样点;The present invention adopts the above technical scheme, uses time overlap similarity, direction similarity and distance between uncertain trajectories, measures the degree of correlation between uncertain trajectories, and maps the trajectory data set into an undirected graph, the undirected Each node in the graph represents an uncertain trajectory, and the weight of the edges between nodes represents the degree of association between two corresponding uncertain trajectories. Finally, a greedy algorithm is used to divide the undirected graph to form several clusters containing k trajectories to realize k anonymity. Preprocessing the collected original trajectory data mainly refers to the need to ensure that there is an intersection between the trajectories in the time dimension, such as jointly containing the time interval [t1, t2], and each trajectory has the same sampling The number of points, for example, there are 30 longitude and latitude sampling points;
与现有技术相比,本发明的效果是可以更全面灵活地对数据发布场景下不确定轨迹进行聚类,实现隐私水平与数据实用性之间的权衡。在数据采集和处理方面,考虑到采集设备的精度有限以及可能的采集错误,侧重不确定轨迹的隐私保护;而在轨迹间关联度衡量方面,本发明考虑了在不确定轨迹的条件下,不确定因素对轨迹方向相似度以及轨迹间的距离的影响,以及轨迹间在时间维度上的重叠相似性。该不确定轨迹隐私保护方法可以更全面灵活地对数据发布场景下不确定轨迹的聚类,实现隐私水平与数据实用性之间的权衡。Compared with the prior art, the effect of the present invention is that it can more comprehensively and flexibly cluster uncertain trajectories in data publishing scenarios, and realize a trade-off between privacy level and data practicability. In terms of data acquisition and processing, considering the limited accuracy of acquisition equipment and possible acquisition errors, the privacy protection of uncertain trajectories is emphasized; and in the aspect of correlation degree measurement between trajectories, the present invention considers that under the condition of uncertain trajectories, no Determine the influence of factors on the similarity of trajectory direction and the distance between trajectories, as well as the overlapping similarity between trajectories in the time dimension. The privacy protection method for uncertain trajectories can more comprehensively and flexibly cluster uncertain trajectories in data publishing scenarios, and achieve a trade-off between privacy level and data utility.
附图说明Description of drawings
以下结合附图和具体实施方式对本发明做进一步详细说明;The present invention will be described in further detail below in conjunction with accompanying drawing and specific embodiment;
图1为本发明基于图划分的不确定轨迹隐私保护方法的流程图Figure 1 is a flow chart of the privacy protection method for uncertain trajectories based on graph partitioning in the present invention
图2为本发明基于图划分的不确定轨迹隐私保护方法的不确定轨迹方向图;Fig. 2 is the uncertain trajectory direction diagram of the uncertain trajectory privacy protection method based on graph division in the present invention;
图3为本发明基于图划分的不确定轨迹隐私保护方法的不确定轨迹间距离图;Fig. 3 is the distance diagram between uncertain trajectories of the uncertain trajectory privacy protection method based on graph division in the present invention;
图4为本发明基于图划分的不确定轨迹隐私保护方法的隐私水平评估图;Fig. 4 is the privacy level assessment diagram of the uncertain trajectory privacy protection method based on graph division in the present invention;
图5为本发明基于图划分的不确定轨迹隐私保护方法的信息损失评估图;Fig. 5 is an information loss assessment diagram of the uncertain trajectory privacy protection method based on graph division in the present invention;
图6为本发明基于图划分的不确定轨迹隐私保护方法的不确定系数ζ与信息损失关系图。Fig. 6 is a graph showing the relationship between the uncertainty coefficient ζ and the information loss of the privacy protection method for uncertain trajectories based on graph partitioning in the present invention.
具体实施方式detailed description
如图1-6之一所示,本发明的方法中涉及的几个定义如下:As shown in one of Figures 1-6, several definitions involved in the method of the present invention are as follows:
其包括以下步骤:It includes the following steps:
步骤(1)数据预处理:对原始轨迹数据集进行预处理,使得不确定轨迹之间在时间维度上有交集,且每条不确定轨迹都有同样的采样点个数;Step (1) Data preprocessing: preprocessing the original trajectory data set, so that there is an intersection between uncertain trajectories in the time dimension, and each uncertain trajectory has the same number of sampling points;
步骤(2)关联度构建:对预处理之后的轨迹数据提取时间特征、方向特征和距离特征,根据时间特征、方向特征和距离特征计算不同的不确定轨迹之间的关联度;Step (2) Correlation degree construction: extract time feature, direction feature and distance feature from the preprocessed trajectory data, and calculate the correlation degree between different uncertain trajectories according to the time feature, direction feature and distance feature;
步骤(3)无向图构建:将轨迹数据集映射成一个无向图,该无向图中每个节点代表一条轨迹,节点之间边的权值代表两条相应不确定轨迹之间的关联度;Step (3) Undirected graph construction: map the trajectory data set into an undirected graph, each node in the undirected graph represents a trajectory, and the weight of the edges between nodes represents the association between two corresponding uncertain trajectories Spend;
步骤(4)无向图划分:利用贪婪算法对无向图进行划分,形成若干个含有k条不确定轨迹的聚类。我们将边集中的边按权值从小到大升序排序,找出最小的边,取出其两个端点节点放入集合Ri中,当集合Ri中节点个数少于k个时,从未被挑选过的节点集合中挑出与Ri中节点关联度最大的节点,并放入集合Ri中,直到集合Ri中节点数目满足k个为止。挑选结束后剩余的不足以凑足k个节点的集合舍弃掉。Step (4) Undirected graph division: Use the greedy algorithm to divide the undirected graph to form several clusters containing k uncertain trajectories. We sort the edges in the edge set in ascending order of weight from small to large, find out the smallest edge, take out its two end nodes and put them into the set R i . When the number of nodes in the set R i is less than k, never From the selected node set, select the node with the highest degree of correlation with the node in R i , and put it into the set R i , until the number of nodes in the set R i meets k. After the selection, the remaining ones that are not enough to make up the set of k nodes are discarded.
进一步地,所述步骤(1)的采样点个数为30。Further, the number of sampling points in the step (1) is 30.
进一步地,所述步骤(2)中的时间特征、方向特征和距离特征的权重之和必须为1,可根据用户的个性化需求调整三个部分的权重,以实现隐私保护水平与数据实用性之间的权衡。Further, the sum of the weights of the time feature, direction feature, and distance feature in the step (2) must be 1, and the weights of the three parts can be adjusted according to the individual needs of the user to achieve the level of privacy protection and data availability balance between.
进一步地,所述步骤(2)中不确定轨迹之间的关联度的计算包括以下步骤:Further, the calculation of the degree of association between uncertain trajectories in the step (2) includes the following steps:
A,分别计算两条不确定轨迹的不确定半径;A, respectively calculate the uncertain radius of the two uncertain trajectories;
B,计算不确定轨迹的轨迹方向相似度;B, calculate the trajectory direction similarity of the uncertain trajectory;
C,计算不确定轨迹的轨迹距离;C, calculate the trajectory distance of the uncertain trajectory;
D,计算不确定轨迹的轨迹时间相似度;D, calculate the trajectory time similarity of the uncertain trajectory;
E,计算两条不确定轨迹的关联度。E, Calculate the degree of correlation between two uncertain trajectories.
进一步地,所述步骤A中不确定轨迹的不确定半径的计算方法为:Further, the calculation method of the uncertain radius of the uncertain trajectory in the step A is:
A1,令Tp和Tq分别代表需要计算不确定半径的两条不确定轨迹;A1, let Tp and Tq respectively represent two uncertain trajectories that need to calculate the uncertain radius;
A2,计算轨迹Tp在第i个采样点的速度 A2, calculate the velocity of the trajectory Tp at the i-th sampling point
其中代表轨迹Tp第i采样点的经度,代表轨迹Tp第i采样点的纬度;代表轨迹Tp第i+1采样点的经度,代表轨迹Tp第i+1采样点的纬度,代表轨迹Tp第i点的采样时间,代表轨迹Tp第i+1点的采样时间;in represents the longitude of the i-th sampling point of the trajectory Tp, Represents the latitude of the i-th sampling point of the trajectory Tp; Represents the longitude of the i+1 sampling point of the trajectory Tp, Represents the latitude of the i+1 sampling point of the trajectory Tp, Represents the sampling time of the i-th point of the trajectory Tp, Represents the sampling time of the i+1th point of the trajectory Tp;
A3,计算轨迹Tq在第i个采样点的速度 A3, calculate the velocity of the trajectory Tq at the i-th sampling point
其中代表轨迹Tq第i采样点的经度,代表轨迹Tq第i采样点的纬度;代表轨迹Tq第i+1采样点的经度,代表轨迹Tq第i+1采样点的纬度,代表轨迹Tq第i点的采样时间,代表轨迹Tq第i+1点的采样时间;in represents the longitude of the i-th sampling point of the trajectory Tq, Represents the latitude of the i-th sampling point of the trajectory Tq; Represents the longitude of the i+1th sampling point of the trajectory Tq, Represents the latitude of the i+1 sampling point of the trajectory Tq, Represents the sampling time of the i-th point of the trajectory Tq, Represents the sampling time of the i+1th point of the trajectory Tq;
A4,计算轨迹Tp在第i个采样点的不确定半径计算轨迹Tq在第i个采样点的不确定半径 A4, calculate the uncertainty radius of the trajectory Tp at the i-th sampling point Calculate the uncertainty radius of the trajectory Tq at the i-th sampling point
进一步地,如图2所示,所述步骤B中不确定轨迹的轨迹方向相似度的计算方法为:Further, as shown in Figure 2, the calculation method of the trajectory direction similarity of the uncertain trajectory in the step B is:
B1,计算轨迹Tp和Tq在第i段子轨迹的方向期望角度余弦:B1, calculate the expected angle cosine of the direction of the i-th sub-trajectory of the trajectory Tp and Tq:
其中为轨迹Tp和Tq在第i段子轨迹的方向期望角度余弦,为轨迹Tp在第i段子轨迹的方向向量,为轨迹Tq在第i段子轨迹的方向向量;如果该的余弦值小于0,则表示两个子轨迹段的运动方向相反,此时将该余弦值置为0。方向角度余弦的变化范围为[0,1],由于余弦曲线的特性,方向越相似则余弦值越大,方向差异越大则余弦值越小;in is the expected angle cosine of the direction of the i-th sub-trajectory of the trajectories Tp and Tq, is the direction vector of the i-th sub-trajectory of the trajectory Tp, is the direction vector of the i-th sub-trajectory of the trajectory Tq; if the The cosine value of is less than 0, which means that the moving directions of the two sub-track segments are opposite, and the cosine value is set to 0 at this time. The change range of the direction angle cosine is [0,1]. Due to the characteristics of the cosine curve, the more similar the direction is, the larger the cosine value is, and the larger the direction difference is, the smaller the cosine value is;
B2,计算轨迹Tp和Tq在第i段子轨迹的角度变化范围其中为轨迹Tp在第i段子轨迹的角度变化范围,为轨迹Tq在第i段子轨迹的角度变化范围:B2, calculate the angular variation range of the i-th sub-trajectory of the trajectory Tp and Tq in is the angular change range of the i-th sub-trajectory of the trajectory Tp, is the angle change range of the i-th sub-track of the track Tq:
令向量记为向量记为 order vector recorded as vector recorded as
其中表示第p条轨迹Tp的第i个采样点的经度坐标,表示第p条轨迹第i+1个采样点的经度坐标,表示第p条轨迹Tp第i+1个采样点的不确定半径,表示第p条轨迹Tp第i个采样点的不确定半径,表示第p条轨迹Tp第i个采样点的纬度坐标,表示第p条轨迹Tp第i+1个采样点的纬度坐标;in Indicates the longitude coordinates of the i-th sampling point of the p-th trajectory Tp, Indicates the longitude coordinates of the i+1th sampling point of the pth track, Indicates the uncertainty radius of the i+1 sampling point of the p-th trajectory Tp, Indicates the uncertainty radius of the i-th sampling point of the p-th trajectory Tp, Indicates the latitude coordinates of the i-th sampling point of the p-th trajectory Tp, Represents the latitude coordinates of the i+1 sampling point of the p-th track Tp;
则轨迹Tp在第i段子轨迹上的角度变化范围对应的余弦值满足 Then the angle change range of the trajectory T p on the i-th sub-trajectory Corresponding cosine value satisfy
同理计算轨迹Tq在第i段子轨迹上的角度变化范围的余弦值 In the same way, calculate the angle change range of the trajectory T q on the i-th sub-trajectory cosine of
为了衡量轨迹Tp和Tq在第i个轨迹段之间的不确定方向相似性,我们定义不确定轨迹段之间的角度变化差异为可以由和间接得出。变化范围在[0,1]之间,越接近1则代表不确定轨迹段之间的角度变化越小,不确定轨迹段之间在方向纬度上更相似;越接近0则代表越不相似。其中当小于0,则代表两个轨迹段之间的角度变化差异过大,我们将其置为0。To measure the uncertain direction similarity between trajectories T p and T q at the i-th trajectory segment, we define the angular change difference between uncertain trajectory segments as can be made by and derived indirectly. The range of variation is between [0, 1]. The closer to 1, the smaller the angle change between uncertain trajectory segments, and the more similar in direction and latitude between uncertain trajectory segments; the closer to 0, the less similar they are. which when If it is less than 0, it means that the angle change difference between the two trajectory segments is too large, so we set it to 0.
B3,计算将轨迹的方向相似性记为Sdire[ζ,p,q],B3, the calculation records the directional similarity of the trajectory as S dire [ζ,p,q],
其中n为采样点个数。Where n is the number of sampling points.
进一步地,如图3所示,所述步骤C中轨迹距离为两个采样点期望坐标的欧式距离与各自不确定半径之和,轨迹距离的具体计算方法包括以下步骤:Further, as shown in Figure 3, the trajectory distance in the step C is the sum of the Euclidean distance of the expected coordinates of the two sampling points and the respective uncertain radii, and the specific calculation method of the trajectory distance includes the following steps:
C1,定义轨迹距离两个采样点期望坐标的欧式距离记为则任意两条轨迹中的对应某一轨迹段之间的距离为:C1, define the Euclidean distance between the trajectory and the expected coordinates of two sampling points as Then the distance between any two trajectories corresponding to a certain trajectory segment is:
其中为计算轨迹Tp在第i个采样点的不确定半径,为计算轨迹Tq在第i个采样点的不确定半径;in To calculate the uncertainty radius of the trajectory Tp at the i-th sampling point, To calculate the uncertainty radius of the trajectory Tq at the i-th sampling point;
C2,定义任意两条轨迹的距离为:C2, define the distance between any two trajectories as:
进一步地,所述步骤D中所述轨迹时间相似性的具体计算方法为:Further, the specific calculation method of the trajectory time similarity in the step D is:
D1,令和分别代表轨迹Tp的起始时间和结束时间,和分别代表轨迹Tq的起始时间和结束时间;D1, order and represent the start time and end time of the trajectory Tp, respectively, and represent the start time and end time of the trajectory Tq, respectively;
D2,不确定轨迹Tp和Tq的轨迹时间相似性Stime[p,q]为:D2, the trajectory time similarity S time [p,q] of the uncertain trajectory Tp and Tq is:
进一步地,所述步骤E中的两条不确定轨迹的关联度的计算由以下公式获得:Further, the calculation of the correlation degree of the two uncertain trajectories in the step E is obtained by the following formula:
Wpq=α·(1-Stime[Tp,Tq])+β·(1-Sdire[ζ,Tp,Tq])+γ·Dloc[ζ,Tp,Tq] (6)W pq =α·(1-S time [T p ,T q ])+β·(1-S dire [ζ,T p ,T q ])+γ·D loc [ζ,T p ,T q ] (6)
其中Wpq为不确定轨迹Tp和Tq之间的关联度,Stime[Tp,Tq]为不确定轨迹Tp和Tq的时间相似性,Sdire[ζ,Tp,Tq]为不确定轨迹Tp和Tq的轨迹方向相似性,Dloc[ζ,Tp,Tq]为不确定轨迹Tp和Tq的轨迹距离,α为不确定轨迹Tp和Tq的时间相似性Stime[Tp,Tq]的权重系数,β为不确定轨迹Tp和Tq的轨迹方向相似性Sdire[ζ,Tp,Tq]的权重系数,γ为不确定轨迹Tp和Tq的轨迹距离Dloc[ζ,Tp,Tq]的权重系数。where W pq is the correlation degree between uncertain trajectories Tp and Tq, S time [T p , T q ] is the time similarity between uncertain trajectories Tp and Tq, S dire [ζ, T p , T q ] is the Determine the trajectory direction similarity of the trajectory Tp and Tq, D loc [ζ,T p ,T q ] is the trajectory distance of the uncertain trajectory Tp and Tq, α is the time similarity of the uncertain trajectory Tp and Tq S time [T p ,T q ] weight coefficient, β is the weight coefficient of the trajectory direction similarity S dire [ζ,T p ,T q ] of the uncertain trajectory Tp and Tq, and γ is the trajectory distance D loc [ ζ, T p , T q ] weight coefficients.
本发明采用以上技术方案,采用时间重叠相似性,方向相似性和不确定轨迹之间的距离,衡量不确定轨迹之间的关联度,并将轨迹数据集映射成一个无向图,该无向图中每个节点代表一条不确定轨迹,节点之间边的权值代表两条相应不确定轨迹之间的关联度。最后,利用贪婪算法对无向图进行划分,形成若干个含有k条轨迹的聚类,实现k匿名。将采集得到的原始轨迹数据进行预处理,其主要是指需要保证轨迹之间包含在时间维度上有交集,如共同包含有[t1,t2]这个时间区间,且每条轨迹都有同样的采样点个数,如都有30个经纬度采样点;The present invention adopts the above technical scheme, uses time overlap similarity, direction similarity and distance between uncertain trajectories, measures the degree of correlation between uncertain trajectories, and maps the trajectory data set into an undirected graph, the undirected Each node in the graph represents an uncertain trajectory, and the weight of the edges between nodes represents the degree of association between two corresponding uncertain trajectories. Finally, a greedy algorithm is used to divide the undirected graph to form several clusters containing k trajectories to realize k anonymity. Preprocessing the collected original trajectory data mainly refers to the need to ensure that there is an intersection between the trajectories in the time dimension, such as jointly containing the time interval [t1, t2], and each trajectory has the same sampling The number of points, for example, there are 30 longitude and latitude sampling points;
与现有技术相比,本发明的效果是可以更全面灵活地对数据发布场景下不确定轨迹进行聚类,实现隐私水平与数据实用性之间的权衡。在数据采集和处理方面,考虑到采集设备的精度有限以及可能的采集错误,侧重不确定轨迹的隐私保护;而在轨迹间关联度衡量方面,本发明考虑了在不确定轨迹的条件下,不确定因素对轨迹方向相似度以及轨迹间的距离的影响,以及轨迹间在时间维度上的重叠相似性。该不确定轨迹隐私保护方法可以更全面灵活地对数据发布场景下不确定轨迹的聚类,实现隐私水平与数据实用性之间的权衡。Compared with the prior art, the effect of the present invention is that it can more comprehensively and flexibly cluster uncertain trajectories in a data release scenario, and realize a trade-off between privacy level and data practicability. In terms of data acquisition and processing, considering the limited accuracy of acquisition equipment and possible acquisition errors, the privacy protection of uncertain trajectories is emphasized; and in the aspect of correlation degree measurement between trajectories, the present invention considers that under the condition of uncertain trajectories, no Determine the influence of factors on the similarity of trajectory direction and the distance between trajectories, as well as the overlapping similarity between trajectories in the time dimension. The privacy protection method for uncertain trajectories can more comprehensively and flexibly cluster uncertain trajectories in data release scenarios, and achieve a trade-off between privacy level and data utility.
以上所述仅为本发明的实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等同变换,或直接或间接运用在相关的技术领域,均同理包括在本发明的专利保护范围。The above description is only an embodiment of the present invention, and does not limit the patent scope of the present invention. All equivalent transformations made by using the description of the present invention and the contents of the accompanying drawings, or directly or indirectly used in related technical fields, are all included in the same principle. The patent protection scope of the present invention.
Claims (9)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610597003.7A CN106295395A (en) | 2016-07-27 | 2016-07-27 | The uncertain method for protecting track privacy divided based on figure |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610597003.7A CN106295395A (en) | 2016-07-27 | 2016-07-27 | The uncertain method for protecting track privacy divided based on figure |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN106295395A true CN106295395A (en) | 2017-01-04 |
Family
ID=57652841
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201610597003.7A Withdrawn CN106295395A (en) | 2016-07-27 | 2016-07-27 | The uncertain method for protecting track privacy divided based on figure |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN106295395A (en) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108491730A (en) * | 2018-03-08 | 2018-09-04 | 湖南大学 | Correlation method for secret protection between track based on lagrangian optimization |
| CN108734022A (en) * | 2018-04-03 | 2018-11-02 | 安徽师范大学 | The secret protection track data dissemination method divided based on three-dimensional grid |
| CN109788001A (en) * | 2019-03-07 | 2019-05-21 | 武汉极意网络科技有限公司 | Suspicious Internet protocol address discovery method, user equipment, storage medium and device |
| CN111259434A (en) * | 2020-01-08 | 2020-06-09 | 广西师范大学 | A privacy protection method for personal preference location in trajectory data release |
| US11042648B2 (en) | 2019-07-17 | 2021-06-22 | Here Global B.V. | Quantification of privacy risk in location trajectories |
| CN114967972A (en) * | 2022-04-27 | 2022-08-30 | 华南理工大学 | Method, system and device for adjusting sampling rate of touch screen and storage medium |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101895866A (en) * | 2010-04-16 | 2010-11-24 | 华中师范大学 | Trajectory Privacy Metrics in Location Services |
| US20140379628A1 (en) * | 2013-06-19 | 2014-12-25 | International Business Machines Corporation | Privacy risk metrics in location based services |
| CN104394509A (en) * | 2014-11-21 | 2015-03-04 | 西安交通大学 | High-efficiency difference disturbance location privacy protection system and method |
| CN105760780A (en) * | 2016-02-29 | 2016-07-13 | 福建师范大学 | Trajectory data privacy protection method based on road network |
-
2016
- 2016-07-27 CN CN201610597003.7A patent/CN106295395A/en not_active Withdrawn
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101895866A (en) * | 2010-04-16 | 2010-11-24 | 华中师范大学 | Trajectory Privacy Metrics in Location Services |
| US20140379628A1 (en) * | 2013-06-19 | 2014-12-25 | International Business Machines Corporation | Privacy risk metrics in location based services |
| CN104394509A (en) * | 2014-11-21 | 2015-03-04 | 西安交通大学 | High-efficiency difference disturbance location privacy protection system and method |
| CN105760780A (en) * | 2016-02-29 | 2016-07-13 | 福建师范大学 | Trajectory data privacy protection method based on road network |
Non-Patent Citations (4)
| Title |
|---|
| JIANCHUAN XIAO,ETC.: ""A Privacy-Preserving Approach Based on Graph Partition for Uncertain Trajectory Publishing"", 《2016 15TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING》 * |
| 杨静等: ""基于图划分的个性化轨迹隐私保护方法"", 《通信学报》 * |
| 王爽等: ""移动对象不确定轨迹隐私保护算法研究"", 《通信学报》 * |
| 薛俊超: ""面向轨迹数据发布的隐私保护技术研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108491730A (en) * | 2018-03-08 | 2018-09-04 | 湖南大学 | Correlation method for secret protection between track based on lagrangian optimization |
| CN108491730B (en) * | 2018-03-08 | 2021-11-19 | 湖南大学 | Inter-track correlation privacy protection method based on Lagrange optimization |
| CN108734022A (en) * | 2018-04-03 | 2018-11-02 | 安徽师范大学 | The secret protection track data dissemination method divided based on three-dimensional grid |
| CN109788001A (en) * | 2019-03-07 | 2019-05-21 | 武汉极意网络科技有限公司 | Suspicious Internet protocol address discovery method, user equipment, storage medium and device |
| CN109788001B (en) * | 2019-03-07 | 2021-06-25 | 武汉极意网络科技有限公司 | Suspicious internet protocol address discovery method, user equipment, storage medium and device |
| US11042648B2 (en) | 2019-07-17 | 2021-06-22 | Here Global B.V. | Quantification of privacy risk in location trajectories |
| CN111259434A (en) * | 2020-01-08 | 2020-06-09 | 广西师范大学 | A privacy protection method for personal preference location in trajectory data release |
| CN111259434B (en) * | 2020-01-08 | 2022-04-12 | 广西师范大学 | Privacy protection method for individual preference position in track data release |
| CN114967972A (en) * | 2022-04-27 | 2022-08-30 | 华南理工大学 | Method, system and device for adjusting sampling rate of touch screen and storage medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN106295395A (en) | The uncertain method for protecting track privacy divided based on figure | |
| CN105263113B (en) | A kind of WiFi location fingerprints map constructing method and its system based on crowdsourcing | |
| CN105528359B (en) | For storing the method and system of travel track | |
| CN105424030B (en) | Fusion navigation device and method based on wireless fingerprint and MEMS sensor | |
| CN106123897B (en) | Multi-feature-based indoor fusion localization method | |
| JP6820967B2 (en) | Indoor positioning systems and methods based on geomagnetic signals combined with computer vision | |
| CN102809376B (en) | Isoline-based assistant navigation positioning method | |
| US20200019815A1 (en) | Clustering for k-anonymity in location trajectory data | |
| CN102722541B (en) | Method and system for calculating space-time locus similarity | |
| CN105760780B (en) | Track data method for secret protection based on road network | |
| CN106441302B (en) | Indoor positioning methods in large open areas | |
| CN108995657A (en) | Operate the method and data processing system of automatic driving vehicle | |
| CN107633067A (en) | A kind of Stock discrimination method based on human behavior rule and data digging method | |
| CN107796391A (en) | A kind of strapdown inertial navigation system/visual odometry Combinated navigation method | |
| CN103379619A (en) | Method and system for positioning | |
| CN102802260A (en) | WLAN Indoor Positioning Method Based on Matrix Correlation | |
| CN105044668A (en) | Wifi fingerprint database construction method based on multi-sensor device | |
| CN105674989B (en) | A kind of indoor objects movement locus method of estimation based on mobile phone built-in sensors | |
| CN107255474A (en) | A kind of PDR course angles of fusion electronic compass and gyroscope determine method | |
| US20160023661A1 (en) | Road Geometry Generation from Sparse Data | |
| CN103369466A (en) | Map matching-assistant indoor positioning method | |
| Lin et al. | Noise filtering, trajectory compression and trajectory segmentation on GPS data | |
| CN103776449A (en) | Moving base initial alignment method for improving robustness | |
| CN114511080A (en) | Model construction method and device and abnormal track point real-time detection method | |
| Song et al. | Improved indoor position estimation algorithm based on geo-magnetism intensity |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| WW01 | Invention patent application withdrawn after publication |
Application publication date: 20170104 |
|
| WW01 | Invention patent application withdrawn after publication |