CN108427965B - Hot spot area mining method based on road network clustering - Google Patents
Hot spot area mining method based on road network clustering Download PDFInfo
- Publication number
- CN108427965B CN108427965B CN201810179464.1A CN201810179464A CN108427965B CN 108427965 B CN108427965 B CN 108427965B CN 201810179464 A CN201810179464 A CN 201810179464A CN 108427965 B CN108427965 B CN 108427965B
- Authority
- CN
- China
- Prior art keywords
- clustering
- points
- data
- road network
- city
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000005065 mining Methods 0.000 title claims abstract description 11
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 29
- 238000010586 diagram Methods 0.000 claims abstract description 5
- 230000000694 effects Effects 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 7
- 238000007781 pre-processing Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 4
- 230000006872 improvement Effects 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims description 2
- 238000012360 testing method Methods 0.000 claims description 2
- 238000005192 partition Methods 0.000 claims 2
- 238000013507 mapping Methods 0.000 claims 1
- 238000000638 solvent extraction Methods 0.000 claims 1
- 238000004458 analytical method Methods 0.000 abstract description 4
- 238000005457 optimization Methods 0.000 abstract description 2
- 230000033001 locomotion Effects 0.000 description 8
- 230000006399 behavior Effects 0.000 description 4
- 238000007621 cluster analysis Methods 0.000 description 3
- 238000007418 data mining Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000029305 taxis Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9038—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Traffic Control Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明请求保护一种基于路网轨迹聚类的出行热点区域挖掘方法。在本方法中,将出租车轨迹映射到道路网络中,并且采用实际道路中采集到的兴趣点和轨迹结合的聚类方法。结合密度峰值聚类算法,提出了基于密度峰值优化初始中心的OPAM算法,即DP‑OPAM。算法采用数据点的局部密度和这些点到更高密度点的最短距离,采用决策图挑选出密度更高且距离最近的数据点所属的类别,作为初始聚类中心。根据初始聚类中心,采用增加反向学习的OPAM聚类算法,得到聚类结果。将新算法与原OPAM算法进行对比,新算法不仅能自动确定聚类中心,并且提高了准确率和聚类时间,实现用户出行热点区域分析。
The present invention claims to protect a travel hotspot area mining method based on road network trajectory clustering. In this method, the taxi trajectories are mapped into the road network, and a clustering method combining interest points and trajectories collected from actual roads is adopted. Combined with the density peak clustering algorithm, an OPAM algorithm based on the density peak optimization of the initial center is proposed, namely DP‑OPAM. The algorithm uses the local density of data points and the shortest distance from these points to higher-density points, and uses a decision diagram to select the category of data points with higher density and the closest distance as the initial cluster center. According to the initial clustering center, the OPAM clustering algorithm with reverse learning is used to obtain the clustering result. Comparing the new algorithm with the original OPAM algorithm, the new algorithm can not only automatically determine the cluster centers, but also improve the accuracy and clustering time, and realize the analysis of user travel hotspots.
Description
技术领域technical field
本发明属于一种数据挖掘方法,尤其涉及一种基于道路网络的出租车轨迹聚类方法。The invention belongs to a data mining method, in particular to a taxi trajectory clustering method based on a road network.
背景技术Background technique
智能交通作为当今世界交通运输发展的热点,在支撑交通运输管理的同时,更加注重满足民众出行和公众交通出行的需求。近几年来,智能交通系统建设迅速发展,许多先进的技术广泛应用于智能交通系统。GPS设备的广泛应用使得轨迹的提取变得更加方便。这些GPS设备能够收集到大量的移动位置序列信息和车载状态信息,这些数据蕴含着丰富的交通信息和用户行为信息。通过对轨迹数据进行分析和挖掘,我们能够了解交通状况,合理规划行程,发现人群行为特征,协助改善交通状况等。As a hot spot in the development of transportation in the world today, intelligent transportation not only supports transportation management, but also pays more attention to meeting the needs of people's travel and public transportation. In recent years, the construction of intelligent transportation systems has developed rapidly, and many advanced technologies have been widely used in intelligent transportation systems. The wide application of GPS equipment makes the extraction of trajectory more convenient. These GPS devices can collect a large amount of mobile location sequence information and vehicle status information, and these data contain rich traffic information and user behavior information. By analyzing and mining trajectory data, we can understand traffic conditions, plan trips reasonably, discover crowd behavior characteristics, and assist in improving traffic conditions.
出租车轨迹能够全方位覆盖城市路网交通,既能反映出实时的交通密集度和流通度,也能反映出人群的出行规律和区域特征。所以,通过对出租车轨迹的海量数据进行分析,发现隐藏在数据中的深层次信息,借助于数据挖掘技术,分析出数据整体特征描述和交通态势发展预测,为交通管理部门进行交通检测和道路控制提供支持等方面发挥着重大作用。Taxi trajectories can cover urban road network traffic in an all-round way, reflecting not only the real-time traffic density and circulation, but also the travel patterns and regional characteristics of the crowd. Therefore, by analyzing the massive data of taxi trajectories, we can find the deep-level information hidden in the data. With the help of data mining technology, we can analyze the overall characteristics of the data and predict the development of traffic situation, and carry out traffic detection and road traffic detection for the traffic management department. Controls provide support, etc. play a major role.
聚类分析作为一种常用的数据挖掘技术,可以作为获得数据的分布状况的工具,便于观察每一簇数据的特征,集中对特定的聚簇集合作进一步地分析。此外,还可以作为其他算法(如分类和定性归纳算法)的预处理步骤。移动对象的轨迹聚类,通过发现相似的运动轨迹、提取运动特征等方式,发现移动对象的运动规律和行为模式。出租车的轨迹是由间断的序列点构成。轨迹传统的聚类分析在度量轨迹相似性时,大多考虑的时点与点之间的直线距离,而忽略了现实的距离可达情况。As a common data mining technique, cluster analysis can be used as a tool to obtain the distribution of data, which is convenient to observe the characteristics of each cluster of data, and focus on specific clusters for further analysis. In addition, it can be used as a preprocessing step for other algorithms such as classification and qualitative induction algorithms. Trajectory clustering of moving objects, by finding similar motion trajectories, extracting motion features, etc., finds the motion laws and behavior patterns of moving objects. The trajectories of taxis are composed of discontinuous sequences of points. When measuring the similarity of trajectories, the traditional cluster analysis of trajectories mostly considers the straight-line distance between time points and points, while ignoring the actual distance reachability.
车辆轨迹的聚类分析研究,主要有两种方法:一种是将整条轨迹作为对象进行分类比较,另一种则是将轨迹按照一定的标准分为子轨迹段,对得到的子轨迹段进行分类。前者的优点在于方法简单,便于直观的评价轨迹之间的相似性,但同时,这种方法不能很好的分辨出轨迹的局部特征,聚类效果常常不够理想。后一种方法,可以改善前者在轨迹局部特征方面带来的问题,对于不同形状的轨迹,聚类效果更佳。但缺点是,轨迹分割的方法对聚类结果的影响较大,不同的分割方法可能造成结果的差异很大。There are two main methods for cluster analysis of vehicle trajectories: one is to classify and compare the entire trajectory as an object; sort. The advantage of the former is that the method is simple, and it is convenient to intuitively evaluate the similarity between trajectories, but at the same time, this method cannot distinguish the local features of the trajectories well, and the clustering effect is often not ideal. The latter method can improve the problems caused by the former in terms of local characteristics of the trajectory, and the clustering effect is better for trajectories of different shapes. However, the disadvantage is that the method of trajectory segmentation has a great influence on the clustering results, and different segmentation methods may cause great differences in the results.
发明内容SUMMARY OF THE INVENTION
本发明旨在解决以上现有技术的问题。提出了一种可显著提高聚类效果,实现用户出行区域挖掘的基于路网聚类的热点区域挖掘方法。本发明的技术方案如下:The present invention aims to solve the above problems of the prior art. A method for mining hotspot areas based on road network clustering is proposed, which can significantly improve the clustering effect and realize user travel area mining. The technical scheme of the present invention is as follows:
一种基于路网聚类的热点区域挖掘方法,其包括以下步骤:A method for mining hot spots based on road network clustering, comprising the following steps:
步骤1:搜集出租车轨迹数据集,进行包括数据标准化、归一化的数据预处理,保留有效字段,删除冗余数据,得到预处理后的车辆上下客轨迹点;Step 1: Collect taxi trajectory data sets, perform data preprocessing including data standardization and normalization, retain valid fields, delete redundant data, and obtain preprocessed vehicle boarding and passenger trajectory points;
步骤2:确定城市经纬度范围,在开源网站上提取该城市包括商场、学校在内的兴趣点;Step 2: Determine the latitude and longitude range of the city, and extract the city's points of interest including shopping malls and schools on the open source website;
步骤3:获取城市的路网信息,将轨迹点映射到道路网络中;Step 3: Obtain the road network information of the city and map the trajectory points to the road network;
步骤4:选取经过步骤1预处理后的车辆上下客轨迹点中的80%作为训练集,采用改进的基于密度峰值优化初始中心的OPAM算法聚类出代表上下车热点的区域,改进点主要在于:使用密度峰选取初始聚类中心,初始点的选取更准确、便捷,其余20%作为测试集,测试由上下客轨迹点中的80%作为训练集搭建好模型的聚类效果;Step 4: Select 80% of the pre-processed vehicle pick-up and drop-off trajectory points in
步骤5:将步骤4的模型中输入步骤2采集到的具有路网信息的兴趣点,聚类得到具有路网特征的居民热点活动区域,将聚类结果和采集到的兴趣点对比,判断居民出行的热点区域。Step 5: Input the points of interest with road network information collected in step 2 into the model of step 4, and cluster to obtain the hotspot activity areas of residents with road network characteristics, and compare the clustering results with the collected points of interest to judge residents. Travel hotspots.
进一步的,所述步骤1具体为:首先搜集城市某月的出租车轨迹数据集,选取该城市数据量较为集中一周的轨迹数据,进行数据预处理,保留上下车轨迹点经纬度数据,上下车时间数据等有效字段,删除冗余数据。Further, the
进一步的,所述步骤2确定城市经纬度范围,在开源网站上提取该城市包括商场、学校在内的兴趣点,具体为:Further, the step 2 determines the range of longitude and latitude of the city, and extracts the points of interest of the city including shopping malls and schools on the open source website, specifically:
首先在开源网站openstreetmap上输入目标城市的经纬度范围,下载整个城市的地图,导出的OSM地图数据中way代表用户的移动轨迹,node代表路径。选取node标签为residence、school、shop为代表兴趣点。First, enter the latitude and longitude range of the target city on the open source website openstreetmap, and download the map of the entire city. In the exported OSM map data, the way represents the user's movement trajectory, and the node represents the path. Select the node label as residence, school, and shop to represent points of interest.
进一步的,所述步骤3获取城市的路网信息,将轨迹点映射到道路网络中,具体为:Further, the step 3 obtains the road network information of the city, and maps the trajectory points to the road network, specifically:
采用TAREEG网络服务项目得到电子地图数据,提取该城市的路网信息,提取城市路网数据后,通过ST-Matching模型将上述所得的GPS移动轨迹投射到获取到的路网地图上,得到司机经过每一个路段e上(j-1+1)个连续时刻pi,…,pj的轨迹点。Using the TREEG network service project to obtain the electronic map data, extract the road network information of the city, and after extracting the city road network data, the GPS movement trajectory obtained above is projected onto the obtained road network map through the ST-Matching model. The trajectory points of (j-1+1) consecutive times p i ,...,p j on each road segment e.
进一步的,所述步骤4具体为:首先选取处理好的车辆上下客轨迹点中的80%作为训练集,采用改进基于反向学习围绕中心点划分聚类算法(OPAM)聚类出代表上下车热点的区域,改进OPAM算法分为三个阶段:第一个阶段初始化,构造决策图,选取远离大部分样本的右上角区域的密度峰值点作为初始聚类中心,密度峰值点个数为类簇数k;第二阶段构造初始聚类中心,计算数据集中的各点与每个聚类中心的最小距离,将其余样本点分配到最近初始类簇中心,形成初始划分,计算聚类误差平方和;第三阶段反向学习并代入围绕中心点划分聚类算法(PAM),将典型PAM聚类算法得到的k个簇和经反向学习后得到k个反向簇进行排列组合得到k×k个类簇组合,寻找轮廓系数最大的类簇组合。Further, the step 4 is specifically as follows: first, 80% of the processed vehicle disembarkation and passenger trajectory points are selected as the training set, and an improved clustering algorithm (OPAM) based on reverse learning is used to cluster the representative getting on and off the vehicle. In the hotspot area, the improved OPAM algorithm is divided into three stages: the first stage initializes, constructs a decision diagram, selects the density peak point in the upper right corner area far away from most samples as the initial cluster center, and the number of density peak points is the cluster Number k; the second stage constructs the initial cluster center, calculates the minimum distance between each point in the data set and each cluster center, assigns the remaining sample points to the nearest initial cluster center, forms the initial division, and calculates the sum of squares of clustering errors ; The third stage is reverse learning and substituting the clustering algorithm around the center point (PAM), the k clusters obtained by the typical PAM clustering algorithm and the k reverse clusters obtained after reverse learning are arranged and combined to obtain k × k Find the cluster combination with the largest silhouette coefficient.
进一步的,所述PAM算法的步骤如下:Further, the steps of the PAM algorithm are as follows:
(1)从给定数据集D中任意选取k个元素,将选定的k个元素标记为初始代表对象或种子oj;(1) arbitrarily select k elements from a given data set D, and mark the selected k elements as initial representative objects or seeds o j ;
(2)根据欧氏距离计算方式,计算数据集D中的任一非代表对象oi与k个代表对象之间的距离,并将oi分配到与其距离最近的代表对象所代表的簇;(2) According to the Euclidean distance calculation method, calculate the distance between any non-representative object o i in the data set D and k representative objects, and assign o i to the cluster represented by the representative object with the closest distance;
(3)任意选取一个非代表对象orandom;(3) arbitrarily select a non-representative object o random ;
(4)计算总代价S:(4) Calculate the total cost S:
S=dist(p,orandom)-dist(p,oj)S=dist(p,o random )-dist(p,o j )
(5)如果总代价S<0,表明非代表对象orandom是较优解,元素orandom可以代替元素oj,形成新的k个代表对象的集合,继续返回到步骤(2),做新一轮的对象分配;(5) If the total cost S < 0, it indicates that the non-representative object o random is a better solution, and the element o random can replace the element o j to form a new set of k representative objects, and continue to return to step (2) to make a new One round of object allocation;
(6)如果总代价S>0,表明代表对象oj是较优解,转到步骤(3),重新选取非代表对象进行总代价的比较,直至送代价S不再发生变化,即得到总代价最小的k个类簇。(6) If the total cost S>0, it indicates that the representative object o j is a better solution, go to step (3), and re-select non-representative objects to compare the total cost until the sending cost S does not change, that is, the total cost is obtained. The k clusters with the least cost.
本发明的优点及有益效果如下:The advantages and beneficial effects of the present invention are as follows:
本发明将居民的出行热点分析结合道路网络,采用具体道路网络中的兴趣区域和原有聚类簇相结合聚类的方法,原有簇再次聚集到新簇中包含的兴趣区域特征表示出居民出行的热点区域,解决了欧式空间中时间、空间方面存在的不足。该方法采用基于密度峰值优化初始中心的方法构造决策树确定初始中心,减少了计算量并使聚类准确率更高。并且通过特殊兴趣点和轨迹结合聚类,解决数据稀疏性和计算量庞大的问题,实现用户的轨迹行为分析。The invention combines the analysis of residents' travel hotspots with the road network, and adopts the method of combining the interest area in the specific road network with the original cluster cluster. The travel hotspot solves the deficiencies of time and space in European space. This method uses the method of optimizing the initial center based on the peak density to construct a decision tree to determine the initial center, which reduces the amount of calculation and makes the clustering accuracy higher. And through the combination of special interest points and trajectories clustering, the problems of data sparsity and huge amount of calculation are solved, and user trajectory behavior analysis is realized.
附图说明Description of drawings
图1是本发明提供优选实施例PAM聚类算法流程图;Fig. 1 is that the present invention provides a preferred embodiment PAM clustering algorithm flow chart;
图2OPAM聚类算法流程图。Figure 2. Flow chart of OPAM clustering algorithm.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、详细地描述。所描述的实施例仅仅是本发明的一部分实施例。The technical solutions in the embodiments of the present invention will be described clearly and in detail below with reference to the accompanying drawings in the embodiments of the present invention. The described embodiments are only some of the embodiments of the invention.
本发明解决上述技术问题的技术方案是:The technical scheme that the present invention solves the above-mentioned technical problems is:
如图2所示,该发明采用基于密度峰值优化初始中心的OPAM聚类算法和道路网路结合进行热点区域挖掘方法的具体步骤是:As shown in Figure 2, the invention adopts the OPAM clustering algorithm based on the density peak optimization initial center and the road network to combine the specific steps of the hot spot mining method as follows:
步骤1:搜集城市某月的出租车轨迹数据集,选取该城市数据量较为集中一周的的轨迹数据。进行数据预处理,保留上下车轨迹点经纬度数据,上下车时间数据等有效字段,删除冗余数据。Step 1: Collect the taxi trajectory data set of a city in a certain month, and select the trajectory data for a week with a relatively concentrated amount of data in the city. Carry out data preprocessing, retain valid fields such as the latitude and longitude data of the track point of getting on and off, and the time data of getting on and off, and delete redundant data.
步骤2:在开源网站openstreetmap上输入目标城市的经纬度范围,下载整个城市的地图。导出的OSM地图数据中way代表用户的移动轨迹,node代表路径。由于OSM源数据中way对象记录的是用户移动轨迹,所以way并不是只表示道路信息,也会表示建筑信息,如:一栋楼的外部轮廓。所以,我们也需要对way对象进行相应的筛选过滤,将不需要的way信息去掉。另外一方面,下载的地图源数据中的node采样点非常的多,实际路线规划过程中只需要知道way与way之间相交的十字路口地点即可。way和node中的有很多代表不同属性的tag,way中只保留属性为highway属性值为residential、service、living_street、unclassified、trunk、trunk_link、secondary、secondary_link、primary、tertiary、tertiary_link,node中删除不同way却node相同的节点。选取node标签为residence、school、shop为代表兴趣点。Step 2: Enter the latitude and longitude range of the target city on the open source website openstreetmap, and download the map of the entire city. In the exported OSM map data, the way represents the user's movement trajectory, and the node represents the path. Since the way object in the OSM source data records the user's movement trajectory, the way does not only represent road information, but also building information, such as the external outline of a building. Therefore, we also need to filter the way object accordingly to remove the unnecessary way information. On the other hand, there are many node sampling points in the downloaded map source data. In the actual route planning process, you only need to know the intersection location between the way and the way. There are many tags representing different attributes in way and node. In way, only the attribute value of highway is reserved. The attribute value is residential, service, living_street, unclassified, trunk, trunk_link, secondary, secondary_link, primary, tertiary, tertiary_link, and different ways are deleted in node. But the node is the same node. Select the node label as residence, school, and shop to represent points of interest.
步骤3:Step 3:
上述采集的司机的GPS轨迹包含瞬时位置坐标隐藏了司机每个时刻在哪一个路段上的信息。采用TAREEG网络服务项目得到电子地图数据,提取该城市的路网信息。提取城市路网数据后,通过ST-Matching模型将上述所得的GPS移动轨迹投射到获取到的路网地图上,得到司机经过每一个路段e上(j-1+1)个连续时刻pi,…,pj的轨迹点。The GPS track of the driver collected above contains the instantaneous position coordinates, which hides the information on which road section the driver is on at each moment. The electronic map data is obtained by using the TREEG network service project, and the road network information of the city is extracted. After extracting the urban road network data, the above-obtained GPS movement trajectory is projected onto the obtained road network map through the ST-Matching model, and the driver passes (j-1+1) consecutive moments p i on each road section e, ..., the trajectory points of p j .
步骤4Step 4
选取处理好的车辆上下客轨迹点中的80%作为训练集,采用改进OPAM算法聚类出代表上下车热点的区域。DP-OPAM算法分为三个阶段:初始化、构造初始聚类中心和反向学习并代入PAM算法。Select 80% of the processed vehicle disembarkation and passenger trajectory points as the training set, and use the improved OPAM algorithm to cluster the areas representing the hotspots of getting on and off. The DP-OPAM algorithm is divided into three stages: initialization, construction of initial cluster centers, reverse learning and substitution into the PAM algorithm.
设样本的数据点为i,局部密度为ρi,数据点i的局部密度ρi的计算方式为:其中,dc为截断距离,对于大量数据而言,局部密度实质是数据点之间的相对密度,所以dc具有鲁棒性。定义δi是数据点i到任何比其密度大的点的距离的最小值:δi=minj:ρj>ρi(dij)对于局部密度最大的点,需要特殊处理,一般改点的值为:δi=maxj(dij)Assuming that the data point of the sample is i, the local density is ρ i , the calculation method of the local density ρ i of the data point i is: in, dc is the cutoff distance. For a large amount of data, the local density is essentially the relative density between data points, so dc is robust. Definition δ i is the minimum value of the distance from data point i to any point with greater density: δ i =min j:ρj>ρi (d ij ) For the point with the largest local density, special treatment is required, and the value of the point is generally changed. is: δ i =max j (d ij )
第一阶段初始化first stage initialization
(1)初始化求出各数据点之间的距离矩阵D={dij}i,j=1,...,n,并确定截断距离。(1) Initialize the distance matrix D={d ij }i,j=1,...,n between each data point, and determine the cutoff distance.
(2)根据公式S=dist(p,orandom)-dist(p,oj)求出局部密度,利用公式计算样本的高密度距离δi。(2) Calculate the local density according to the formula S=dist(p,o random )-dist(p,o j ), and use the formula Calculate the high density distance δ i of the sample.
(3)构造ρ以为横轴,为δ纵轴的决策图,选择局部密度ρ和高密度距离δ都较高的数据点,且明显远离大部分样本的右上角区域的密度峰值点作为初始聚类中心,密度峰值点个数为类簇数k。(3) Construct a decision diagram with ρ as the horizontal axis and δ as the vertical axis. Select the data points with high local density ρ and high density distance δ, and the density peak point in the upper right corner of most samples is obviously far away as the initial cluster. Class center, the number of density peak points is the number of clusters k.
第二阶段构造初始聚类中心The second stage constructs initial cluster centers
(1)计算数据集中的各点与每个聚类中心的最小距离,将其余样本点分配到最近初始类簇中心,形成初始划分,计算聚类误差平方和。(1) Calculate the minimum distance between each point in the data set and each cluster center, assign the remaining sample points to the nearest initial cluster center, form an initial division, and calculate the sum of squares of clustering errors.
第三阶段反向学习并代入PAM算法The third stage of reverse learning and substituting into the PAM algorithm
(1)将上述得到的k个原始簇进行反向学习,求得k个对应的反向簇。(1) Perform reverse learning on the k original clusters obtained above to obtain k corresponding reverse clusters.
(2)将典型PAM聚类算法得到的k个簇和经反向学习后得到k个反向簇进行排列组合得到k×k个类簇组合。(2) The k clusters obtained by the typical PAM clustering algorithm and the k reverse clusters obtained after reverse learning are arranged and combined to obtain k × k cluster combinations.
(3)计算每一种簇类组合的簇内间距a(o)、簇间间距b(o)和轮廓系数s(o),比较s1,s2,...,sk×k,寻找轮廓系数最大的类簇组合。(3) Calculate the intra-cluster spacing a(o), inter-cluster spacing b(o) and silhouette coefficient s(o) of each cluster combination, compare s 1 , s 2 ,...,s k×k , Find the cluster combination with the largest silhouette coefficient.
其中,PAM算法的步骤如下:Among them, the steps of the PAM algorithm are as follows:
(1)从给定数据集D中任意选取k个元素,将选定的k个元素标记为初始代表对象或种子oj;(1) arbitrarily select k elements from a given data set D, and mark the selected k elements as initial representative objects or seeds o j ;
(2)根据欧氏距离计算方式,计算数据集D中的任一非代表对象oi与k个代表对象之间的距离,并将oi分配到与其距离最近的代表对象所代表的簇;(2) According to the Euclidean distance calculation method, calculate the distance between any non-representative object o i in the data set D and k representative objects, and assign o i to the cluster represented by the representative object with the closest distance;
(3)任意选取一个非代表对象orandom;(3) arbitrarily select a non-representative object o random ;
(4)计算总代价S:(4) Calculate the total cost S:
S=dist(p,orandom)-dist(p,oj)S=dist(p,o random )-dist(p,o j )
(5)如果总代价S<0,表明非代表对象orandom是较优解,元素orandom可以代替元素oj,形成新的k个代表对象的集合,继续返回到步骤(2),做新一轮的对象分配;(5) If the total cost S < 0, it indicates that the non-representative object o random is a better solution, and the element o random can replace the element o j to form a new set of k representative objects, and continue to return to step (2) to make a new One round of object allocation;
(6)如果总代价S>0,表明代表对象oj是较优解,转到步骤(3),重新选取非代表对象进行总代价的比较,直至送代价S不再发生变化,即得到总代价最小的k个类簇。(6) If the total cost S>0, it indicates that the representative object o j is a better solution, go to step (3), and re-select non-representative objects to compare the total cost until the sending cost S does not change, that is, the total cost is obtained. The k clusters with the least cost.
步骤5Step 5
将步骤4的模型中输入步骤2采集到的具有路网信息的兴趣点,聚类结果得到的k个类簇和采集到的具有代表性的兴趣点进行相似性度量,分析居民出行区域属于哪些兴趣点,从而分析居民热点区域。聚类得到具有路网特征的居民热点活动区域。将聚类结果和采集到的兴趣点对比,判断居民出行的热点区域。Input the points of interest with road network information collected in step 2 into the model of step 4, and measure the similarity between the k clusters obtained from the clustering results and the representative points of interest collected, and analyze which areas the residents travel to belong to. Points of interest to analyze residential hotspots. Clustering obtains residents' hotspot activity areas with road network characteristics. Compare the clustering results with the collected points of interest to determine the hotspot areas for residents to travel.
对于待测量轨迹tra和trb采用Hausdorff距离测量轨迹相似度。H(tra,trb)=max{h(tra,trb),h(trb,tra)},其中 应用Hausdorff距离计算两条轨迹中每个点到另外一条轨迹上所有点的最小值,然后从各自的最小值集合中找出最大的。当小于相似度阈值时认为和兴趣点空间上相似,将被保存到候选集合中。把候选集合中距离大于某个阈值的轨迹删除,得到离轨迹最近的路网兴趣点,即居民出行热点区域。For the trajectories tra and tr b to be measured, the Hausdorff distance is used to measure the trajectory similarity. H(tr a ,tr b )=max{h(t a ,tr b ),h(tr b ,tr a )}, where The Hausdorff distance is used to calculate the minimum value from each point in the two trajectories to all points on the other trajectory, and then find the maximum from the respective sets of minimum values. When it is smaller than the similarity threshold, it is considered to be spatially similar to the interest point and will be saved into the candidate set. The trajectories with distances greater than a certain threshold in the candidate set are deleted, and the road network interest points closest to the trajectories are obtained, that is, the residents' travel hotspots.
以上这些实施例应理解为仅用于说明本发明而不用于限制本发明的保护范围。在阅读了本发明的记载的内容之后,技术人员可以对本发明作各种改动或修改,这些等效变化和修饰同样落入本发明权利要求所限定的范围。The above embodiments should be understood as only for illustrating the present invention and not for limiting the protection scope of the present invention. After reading the contents of the description of the present invention, the skilled person can make various changes or modifications to the present invention, and these equivalent changes and modifications also fall within the scope defined by the claims of the present invention.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810179464.1A CN108427965B (en) | 2018-03-05 | 2018-03-05 | Hot spot area mining method based on road network clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810179464.1A CN108427965B (en) | 2018-03-05 | 2018-03-05 | Hot spot area mining method based on road network clustering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108427965A CN108427965A (en) | 2018-08-21 |
CN108427965B true CN108427965B (en) | 2022-08-23 |
Family
ID=63157793
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810179464.1A Active CN108427965B (en) | 2018-03-05 | 2018-03-05 | Hot spot area mining method based on road network clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108427965B (en) |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109408562B (en) * | 2018-11-07 | 2021-11-26 | 广东工业大学 | Grouping recommendation method and device based on client characteristics |
CN109739585B (en) * | 2018-12-29 | 2022-02-18 | 广西交通科学研究院有限公司 | Spark cluster parallelization calculation-based traffic congestion point discovery method |
CN110135450B (en) * | 2019-03-26 | 2020-06-23 | 中电莱斯信息系统有限公司 | Hot spot path analysis method based on density clustering |
CN110427444B (en) * | 2019-07-26 | 2022-05-24 | 北京百度网讯科技有限公司 | Navigation guide point mining method, device, equipment and storage medium |
CN110399686B (en) * | 2019-07-30 | 2024-03-19 | 中国民航大学 | Parameter-independent aircraft flight trajectory clustering method based on contour coefficients |
CN110609824B (en) * | 2019-09-09 | 2022-09-09 | 南京师范大学 | Detection method of hot spots based on dynamic spatial network model in urban road network environment |
CN110726418B (en) * | 2019-10-10 | 2021-08-03 | 北京百度网讯科技有限公司 | Method, device and equipment for determining interest point region and storage medium |
CN110849379B (en) * | 2019-10-23 | 2023-04-25 | 南通大学 | Entrance and exit traffic state symbol expression method for navigation map |
CN112783991A (en) * | 2019-11-07 | 2021-05-11 | 北京京东尚科信息技术有限公司 | User data processing method, device, medium and electronic equipment |
CN110866559A (en) * | 2019-11-14 | 2020-03-06 | 上海中信信息发展股份有限公司 | Poultry behavior analysis method and device |
CN111121803B (en) * | 2019-11-27 | 2021-12-03 | 北京中交兴路信息科技有限公司 | Method and device for acquiring common stop points of road |
CN111275962B (en) * | 2019-12-30 | 2021-09-03 | 深圳市麦谷科技有限公司 | Vehicle track data aggregation effect prediction method and device |
CN111708853B (en) * | 2020-05-25 | 2022-08-30 | 安徽师范大学 | Taxi hot spot region extraction method based on characteristic density peak clustering |
CN114120018B (en) * | 2020-08-25 | 2023-07-11 | 四川大学 | A Spatial Vitality Quantification Method Based on Crowd Clustering Trajectory Entropy |
CN112163370A (en) * | 2020-09-18 | 2021-01-01 | 国网冀北电力有限公司承德供电公司 | Power distribution network line loss calculation method based on improved ASMDE algorithm |
CN112070179B (en) * | 2020-09-22 | 2024-05-31 | 中国人民解放军国防科技大学 | Self-adaptive space-time track clustering method based on density peak value |
CN113158817B (en) * | 2021-03-29 | 2023-07-18 | 南京信息工程大学 | An Objective Weather Typing Method Based on Fast Density Peak Clustering |
CN113611115B (en) * | 2021-08-06 | 2022-06-21 | 安徽师范大学 | A Vehicle Trajectory Clustering Method Based on Sensitive Features of Road Network |
CN113869465A (en) * | 2021-12-06 | 2021-12-31 | 深圳大学 | I-nice algorithm optimization method, apparatus, device, and computer-readable storage medium |
CN114485696B (en) * | 2021-12-23 | 2024-06-28 | 高德软件有限公司 | Arrival point acquisition method, electronic device and storage medium |
CN114677048B (en) * | 2022-04-22 | 2024-01-16 | 北京阿帕科蓝科技有限公司 | A demand area mining method |
CN114999151B (en) * | 2022-05-24 | 2023-01-24 | 电子科技大学 | Density-based urban traffic flow hierarchical analysis method and device in GPS track |
CN115099864A (en) * | 2022-07-01 | 2022-09-23 | 芜湖雄狮汽车科技有限公司 | Target crowd circling method and device, electronic equipment and storage medium |
CN118747956A (en) * | 2024-06-24 | 2024-10-08 | 上海理工大学 | A short-term prediction method for road traffic OD flow considering travel frequency |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104167092A (en) * | 2014-07-30 | 2014-11-26 | 北京市交通信息中心 | Method and device for determining taxi pick-up and drop-off hot spot region center |
CN105825672A (en) * | 2016-04-11 | 2016-08-03 | 中山大学 | City guidance area extraction method based on floating car data |
CN107301254A (en) * | 2017-08-24 | 2017-10-27 | 电子科技大学 | A kind of road network hot spot region method for digging |
-
2018
- 2018-03-05 CN CN201810179464.1A patent/CN108427965B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104167092A (en) * | 2014-07-30 | 2014-11-26 | 北京市交通信息中心 | Method and device for determining taxi pick-up and drop-off hot spot region center |
CN105825672A (en) * | 2016-04-11 | 2016-08-03 | 中山大学 | City guidance area extraction method based on floating car data |
CN107301254A (en) * | 2017-08-24 | 2017-10-27 | 电子科技大学 | A kind of road network hot spot region method for digging |
Non-Patent Citations (4)
Title |
---|
An improved clustering algorithm based on reverse learning in intelligent transportation;Guoqing Qiu et al;《AIP Conference Proceedings 1839》;20170508;全文 * |
基于出租车轨迹的居民出行热点路径和区域挖掘;冯琦森;《中国优秀硕士学位论文全文数据库信息科技辑》;20170315;摘要、第2-5节 * |
基于时间序列聚类方法分析北京出租车出行量的时空特征;程静等;《地球信息科学学报》;20161231;摘要、第1-5节 * |
密度峰值优化初始中心的K-medoids聚类算法;谢娟英等;《计算机科学与探索》;20161231;摘要、第1-5节 * |
Also Published As
Publication number | Publication date |
---|---|
CN108427965A (en) | 2018-08-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108427965B (en) | Hot spot area mining method based on road network clustering | |
Wang et al. | Automatic intersection and traffic rule detection by mining motor-vehicle GPS trajectories | |
CN103533501B (en) | A kind of geography fence generation method | |
US20160063516A1 (en) | Methods and apparatus to estimate commercial characteristics based on geospatial data | |
CN109272170B (en) | A Traffic Area Division System Based on Louvain Algorithm | |
CN106503714B (en) | Method for identifying city functional area based on point of interest data | |
CN108446470A (en) | Medical facilities analysis method of reachability based on track of vehicle data and population distribution | |
Guyot et al. | The urban form of Brussels from the street perspective: The role of vegetation in the definition of the urban fabric | |
CN105788273A (en) | Urban intersection automatic identification method based on low precision space-time trajectory data | |
CN109948737A (en) | Poverty spatial classification and identification method and device based on big data and machine learning | |
CN110263717A (en) | It is a kind of incorporate streetscape image land used status determine method | |
CN111814596B (en) | Automatic city function partitioning method for fusing remote sensing image and taxi track | |
Zagorskas | GIS-based modelling and estimation of land use mix in urban environment | |
CN113806419B (en) | Urban area function recognition model and recognition method based on space-time big data | |
CN111782741A (en) | Interest point mining method and device, electronic equipment and storage medium | |
CN113393149A (en) | Method and system for optimizing urban citizen destination, computer equipment and storage medium | |
CN116308956B (en) | A method for detecting differences between dominant functions and planned uses of urban areas | |
Smith et al. | Classification of sidewalks in street view images | |
Alhasoun et al. | Streetify: Using street view imagery and deep learning for urban streets development | |
CN113342873A (en) | Refined population analysis unit division method based on city morphology and population convergence mode | |
CN116542709A (en) | Electric vehicle charging station planning analysis method based on traffic situation awareness | |
CN116340563A (en) | Urban scene geographic position positioning method with pattern matching | |
CN112559909B (en) | A business district discovery method based on GCN embedded spatial clustering model | |
CN113379269A (en) | Urban business function zoning method, device and medium for multi-factor spatial clustering | |
Santos et al. | Estimating building's anthropogenic heat: a joint local climate zone and land use classification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |