WO2020191876A1 - Hotspot path analysis method based on density clustering - Google Patents

Hotspot path analysis method based on density clustering Download PDF

Info

Publication number
WO2020191876A1
WO2020191876A1 PCT/CN2019/086517 CN2019086517W WO2020191876A1 WO 2020191876 A1 WO2020191876 A1 WO 2020191876A1 CN 2019086517 W CN2019086517 W CN 2019086517W WO 2020191876 A1 WO2020191876 A1 WO 2020191876A1
Authority
WO
WIPO (PCT)
Prior art keywords
path
corep
core
density
index
Prior art date
Application number
PCT/CN2019/086517
Other languages
French (fr)
Chinese (zh)
Inventor
徐欣
刁联旺
易侃
李青山
Original Assignee
中国电子科技集团公司第二十八研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国电子科技集团公司第二十八研究所 filed Critical 中国电子科技集团公司第二十八研究所
Priority to JP2020545145A priority Critical patent/JP6912672B2/en
Publication of WO2020191876A1 publication Critical patent/WO2020191876A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Definitions

  • the invention relates to the field of target path analysis and mining, in particular to a hot path analysis method based on density clustering.
  • the present invention proposes a method for analyzing hotspot paths based on density clustering, which includes the following steps:
  • Step 1 Construct a similarity distance matrix for characterizing the target path as a set of path points composed of several path points;
  • Step 2 Compare the similarity between the pair of path point sets. Based on the similarity distance matrix, distance threshold ⁇ , and density threshold MinPts, the core path set is mined from the path point set, and then based on the "direct density of the core path set" “Da” relationship, using density clustering to iteratively generate clusters aggregated by core path sets;
  • Step 3 Output the mode of the path point set of each cluster as the target hot path.
  • Step 1 Compared with the similarity distance matrix in traditional density clustering, the rows and columns of the matrix in step 1 are no longer a vector of fixed dimensions, but a set of path points of non-fixed length.
  • Step 1 includes:
  • Step 1-1 set the collection of n waypoint sets corresponding to n target paths, each waypoint set corresponds to a target path, and each element in the waypoint set is a waypoint in the corresponding target path ,
  • Jaccard distance JaccardDist(P i ,P j ) between the i-th path point set P i and the j-th path point set P j as:
  • Step 1-2 sort the set of path points: firstly sort the set of n path points according to the collection size from large to small, and then according to the index value from small to large, denoted as P 1 , P 2 , ..., P n , satisfying
  • Steps 1-3 initialize the similarity distance matrix: set the distance threshold ⁇ , and its value range is 0 ⁇ 1. In general, it can be the mean value of the nearest neighbor distance of the path point set, namely:
  • the initial similarity distance matrix DistArray is empty, and its matrix size is n ⁇ n, that is, the number of rows and columns of the matrix are both n. Because the similarity distance matrix is symmetric about the polygon, only the upper triangular part is retained.
  • Step 2 innovatively proposes a similarity comparison strategy based on the size of the path point set and the distance threshold ⁇ (Step 2-3), which greatly simplifies the calculation cost of the similarity comparison of the pair of path point sets, and is similar in the set type
  • the concepts of " ⁇ neighborhood”, “core path set”, “direct density reachability”, “indirect density reachability”, and “density connection” for the set of path points are further innovatively proposed ( Steps 2-8, 2-9), so as to extend the traditional density clustering rules for fixed-dimensional vectors to set data.
  • Step 2 includes:
  • Step 2-3 judge the set index to be compared: judge the set index of the path point to be compared, if t ⁇ n and
  • Step 2-5 judge the current collection index: judge the current collection index, if s ⁇ n, continue to step 2-8, otherwise, return to step 2-2;
  • Step 2-6 Calculate the similarity distance: Calculate the Jaccard distance JaccardDist(P s ,P t ) between the two path point sets corresponding to the current set index and the set index to be compared, if JaccardDist(P s ,P t ) is satisfied ⁇ , update the corresponding matrix cell value in the similarity matrix:
  • DistArray[s,t] JaccardDist(P s ,P t ) (3)
  • DistArray[s,t] represents the value of the sth row and tth column of the similarity distance matrix DistArray
  • Step 2-8 calculate the size of the path point neighborhood: given any path point set P, define all other path point sets whose similarity distance to the path point set P is within the distance threshold ⁇ as the ⁇ of the path point set P Neighborhood, denoted as N ⁇ (P):
  • N ⁇ (P) ⁇ Q
  • Q represents an arbitrary path point set Q, (. 4) was calculated for each path a set of points P i [epsilon] neighborhood size according to the formula, denoted by
  • Step 2-9 construct the core path set: set the density threshold MinPts, and define the path point set with the ⁇ neighborhood size not less than MinPts as the core path set, and its value is a natural number greater than or equal to 1 and less than n.
  • Possible values are That is, any core path set CoreP satisfies:
  • Step 2-10 density-based iterative aggregation: each core path set is used as the initial cluster, and the distance threshold ⁇ and the density threshold MinPts are given. If the two core path sets CoreP and CoreQ satisfy:
  • core path set CoreQ is "directly density accessible" from the core path set CoreP, which is expressed as:
  • the core path set CoreQ and the core path set CoreP satisfy the following conditions (a) and (b):
  • core path set CoreQ is "indirect density reachable" from the core path set CoreP, expressed as:
  • the core path sets CoreP and CoreQ are directly or indirectly accessible from the core path set CoreO, that is, the following conditions (c) and (d) are satisfied:
  • Steps 2-10 include:
  • Step 2-10-1 judge whether there is an unprocessed core path set, if so, continue to step 2-10-2, if not, continue to step 2-10-3;
  • Step 2-10-2 for any unprocessed core path set CoreP, aggregate all core path sets that meet the direct density of the core path set CoreP, and return to step 2-10-1;
  • Step 2-10-3 take all the aggregated core path sets as the same cluster, output the formed clusters, and mark the number of clusters as u.
  • the core path set CoreQ is directly accessible from the core path set CoreP;
  • the core path set CoreP is directly or indirectly density reachable from the core path set CoreO, or While at the same time Therefore, the core path set CoreP and CoreQ are densely connected via the core path set CoreO;
  • Core path set CoreO is directly or indirectly density reachable from the core path set CoreP, that is or While at the same time therefore Core path set CoreQ can reach indirect density from core path set CoreP;
  • step 2-11 calculate the path set mode Mode k of cluster C k according to the following formula,
  • Mode k argmin P ⁇ 1 ⁇ q ⁇ k' JaccardDist(P,CoreP q )
  • P represents the set of path points
  • CoreP q represents the qth core path set in the cluster C k
  • the path set mode Mode k represents the path corresponding to the smallest sum of Jaccard distances from all core path sets in the cluster C k Point collection.
  • Steps 2-11 include:
  • ⁇ k ⁇ 1 ⁇ q ⁇ k 'CoreP q ,
  • the path point dictionary is the union of all core path sets in the cluster C k , and then for each path point p r in the path point dictionary, calculate the intersection coefficient ⁇ of the path point p r in the core path set CoreP q of the cluster C k rq and the union coefficient ⁇ rq are shown in the following formula:
  • Step 2-11-3 calculate the mode of the path point set based on the intersection coefficient and the union coefficient:
  • Step 3 includes: output Mode k as the path hot spot of the kth cluster C k .
  • the distance threshold ⁇ is used to compare the similarity between the set of path points. Since the Jaccard distance between two sets of path points is in the interval [0,1], the distance threshold ⁇ is also in the interval [0, 1] within.
  • the traditional density clustering method is only suitable for fixed-dimensional vector data, and not suitable for non-fixed-length path point collection data.
  • the present invention innovatively proposes the "core path set” and its concepts of "direct density reachability”, “indirect density reachability”, and “density connection” specifically for the set of path points, so that it will only be applicable to fixed-dimensional vector
  • the traditional density clustering method is extended to apply to non-fixed length path point collection data.
  • the invention also proposes a hot path mining method based on intersection and union coefficients, which significantly improves the efficiency of hot path analysis and proposes a hot path mining method based on intersection and union coefficients, which significantly improves the efficiency of hot path analysis.
  • a similarity comparison method for the set of target path points is proposed; (2) The selection of the density threshold MinPts has a certain flexibility and robustness; (3) The calculation cost is low, and the method is engineered.
  • the invention adopts the analysis and mining method based on the set of path points, which simplifies the order of the path points, facilitates the aggregation of measurement data with the same path points, and can greatly reduce the calculation cost and improve the calculation efficiency.
  • FIG. 1 is a flowchart of the present invention.
  • the present invention aims at characterizing the target path as a set of path points composed of several path points, constructs a similarity distance matrix, compares the similarity between the two path point sets, and adopts the similarity distance matrix, the distance threshold ⁇ , and the density threshold MinPts. Density clustering calculates the clusters of the path point set iteratively, and finally outputs the path set mode of each cluster as the target hot path.
  • the method of the present invention specifically includes the following steps:
  • Sorting of path point collections First, sort the set of n path points according to the collection size from large to small, and then according to the index value from small to large, denoted as P 1 , P 2 , ..., P n , satisfying
  • Judgment of the set index to be compared judge the set index of the path point to be compared, if t ⁇ n and
  • Judgment of current collection index judge the current collection index, if s ⁇ n, continue to step (10), otherwise, return to step (4);
  • Similarity distance calculation Calculate the Jaccard distance between the two path point sets corresponding to the current set index and the set index to be compared. If JaccardDist(P s , P t ) ⁇ is satisfied, update the corresponding matrix in the similarity matrix Unit value:
  • DistArray[s,t] JaccardDist(P s ,P t );
  • Path point neighborhood size calculation Given any path point set P, define all other path point sets whose similarity distance to the path point set P is within the distance threshold ⁇ as the ⁇ neighborhood of the path point set P , Denoted as N ⁇ (P):
  • N ⁇ (P) ⁇ Q
  • Construction of core path set Set the density threshold MinPts, and define the set of path points whose ⁇ neighborhood size is not less than MinPts as the core path set, that is, any core path set CoreP satisfies:
  • the core path set CoreQ is "directly density accessible" from the core path set CoreP, which is expressed as If there is a core path set chain with a non-zero length, the core path set CoreQ and the core path set CoreP satisfy:
  • the core path set CoreQ is "indirect density reachable" from the core path set CoreP, expressed as: CoreQ;
  • the core path sets CoreP and CoreQ can be directly or indirectly reachable from the core path set CoreO, that is,
  • the core path set CoreP and CoreQ are "density-connected"; then, according to the distance threshold ⁇ and the density threshold MinPts, iterative aggregation is performed based on density clustering, and the aggregation can reach the core density directly and indirectly.
  • the number of clusters generated after the path set is denoted as u;
  • C k represents the k-th cluster
  • CoreP j represents the j-th core path set
  • Mode k is output as the path hot spot of the cluster C k .
  • the method of the present invention can improve the target path analysis ability in the case of inaccurate target position measurement, is beneficial to reduce the redundancy of target position measurement, increase the flexibility of space granularity, and can better complete the target path analysis task.
  • the following uses an example to illustrate the hotspot path analysis method based on density clustering of the present invention.
  • the hot path analysis steps based on density clustering are as follows:
  • Step 1 Sort the set of path points, firstly according to the size of the way point set from large to small, and then by index value from small to large, as P 1 , P 2 , P 3 , P 4 , P 5 , as shown in Table 1. :
  • Step 2 Initialize the similarity distance matrix.
  • the distance threshold ⁇ is 0.3.
  • the initial similarity distance matrix DistArray is empty and the matrix size is 5 ⁇ 5. Because the similarity distance matrix is symmetrical about the polygon, only the upper triangle part is retained, such as Table 2 shows:
  • Step 5 the set index to be compared is judged and satisfies "t ⁇ n and
  • Step 5 To determine the set index to be compared and satisfy "t ⁇ n and
  • 0.75>1- ⁇ ", proceed to step 8;
  • Step 8 Similarity distance calculation, calculate the Jaccard distance between the path point sets P 1 and P 3 , and update the similarity matrix DistArray, as shown in Table 4:
  • Step 5 Judge the index of the set to be compared, and judge that the target index value to be compared does not satisfy "
  • 0.5 ⁇ 1- ⁇ ", and proceed to step 6;
  • Step 7 current collection index judgment, judgment current collection index s ⁇ n, return to step 4;
  • 1 ⁇ 1- ⁇ ", and proceed to step 8;
  • Step 8 Calculate the similarity distance, calculate the Jaccard distance between the set of path points P 2 and P 3 , and update the similarity matrix DistArray, as shown in Table 5:
  • 0.667 ⁇ 1- ⁇ ", and proceed to step 6;
  • Step 7 current collection index judgment, judgment current collection index s ⁇ n, return to step 4;
  • Step 7 current collection index judgment, judgment current collection index s ⁇ n, return to step 4;
  • 1 ⁇ 1- ⁇ , continue to step 8;
  • Step 8 Calculate the similarity distance. Calculate the Jaccard distance between the set of path points P 4 and P 5 to be zero and satisfy JaccardDist(P 4 , P 5 ) ⁇ 0.3. Update the similarity matrix DistArray, as shown in Table 6:
  • Step 10 the waypoint calculation neighborhood size, calculates the size of the neighborhood set of points P i [epsilon] of each path
  • Step 11 Build the core path set.
  • the value can be P 1 , P 2 , P 3 , P 4 , and P 5 are all core path sets;
  • the present invention provides a hotspot path analysis method based on density clustering. There are many methods and ways to implement this technical solution. The above are only preferred embodiments of the present invention. It should be noted that for those of ordinary skill in the art In other words, without departing from the principle of the present invention, several improvements and modifications can be made, and these improvements and modifications should also be regarded as the protection scope of the present invention. All the components that are not clear in this embodiment can be implemented using existing technology.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A hotspot path analysis method based on density clustering. The method comprises: for representing a target path as path point sets constituted by several path points, constructing a similarity distance matrix; comparing the similarities between every two path point sets, and based on the similarity distance matrix, a distance threshold ε and a density threshold MinPts, using density clustering to iteratively calculate clusters constituted by the path point sets; and finally outputting a path set mode of clusters as a target hotspot path. The method has the advantages that (1) a similarity comparison method for a target path point set is provided; (2) the selection of the density threshold MinPts has certain flexibility and robustness; and (3) the calculation cost is low, and method engineering is realized.

Description

一种基于密度聚类的热点路径分析方法A hot path analysis method based on density clustering 技术领域Technical field
本发明涉及目标路径分析挖掘领域,尤其涉及一种基于密度聚类的热点路径分析方法。The invention relates to the field of target path analysis and mining, in particular to a hot path analysis method based on density clustering.
背景技术Background technique
众所周知,当今目标路径相关测量数据量越来越大,仅靠人工进行分析处理难以及时、准确的总结出目标路径规律,难以及时支撑高实时的辅助决策。传统的目标路径分析预测技术大多针对目标位置测量数据,没有基于关键路径点进行分析,无法聚焦高层次的路径特征、提取多粒度的目标路径模式,计算成本高。As we all know, the amount of measurement data related to the target path is increasing. It is difficult to summarize the target path law in time and accurately by manual analysis and processing, and it is difficult to support high-real-time auxiliary decision-making in time. Traditional target path analysis and prediction technologies mostly focus on target location measurement data, do not analyze based on critical path points, cannot focus on high-level path features, extract multi-granular target path patterns, and have high computational costs.
发明内容Summary of the invention
发明目的:针对现有技术的问题,本发明提出一种基于密度聚类的热点路径分析方法,包括如下步骤:Purpose of the invention: In view of the problems of the prior art, the present invention proposes a method for analyzing hotspot paths based on density clustering, which includes the following steps:
步骤1,针对将目标路径表征为由若干路径点构成的路径点集合,构建相似度距离矩阵;Step 1. Construct a similarity distance matrix for characterizing the target path as a set of path points composed of several path points;
步骤2,比较两两路径点集合之间的相似度,基于相似度距离矩阵、距离门限ε与密度门限MinPts从路径点集合中挖掘出核心路径集,再根据针对核心路径集的“直接密度可达”关系,采用密度聚类迭代式地生成由核心路径集聚合成的簇;Step 2: Compare the similarity between the pair of path point sets. Based on the similarity distance matrix, distance threshold ε, and density threshold MinPts, the core path set is mined from the path point set, and then based on the "direct density of the core path set" “Da” relationship, using density clustering to iteratively generate clusters aggregated by core path sets;
步骤3,将各簇的路径点集合众数作为目标热点路径输出。Step 3: Output the mode of the path point set of each cluster as the target hot path.
相比传统密度聚类中的相似度距离矩阵,步骤1中矩阵的行、列对应的不再是固定维数的向量,而是非固定长度的路径点集合,步骤1包括:Compared with the similarity distance matrix in traditional density clustering, the rows and columns of the matrix in step 1 are no longer a vector of fixed dimensions, but a set of path points of non-fixed length. Step 1 includes:
步骤1-1,设定采集了n条目标路径相对应的n个路径点集合,每个路径点集合对应一条目标路径,而路径点集合中的每个元素为对应目标路径中的一个路径点,则定义第i个路径点集合P i和第j个路径点集合P j之间的Jaccard距离JaccardDist(P i,P j)为: Step 1-1, set the collection of n waypoint sets corresponding to n target paths, each waypoint set corresponds to a target path, and each element in the waypoint set is a waypoint in the corresponding target path , Define the Jaccard distance JaccardDist(P i ,P j ) between the i-th path point set P i and the j-th path point set P j as:
Figure PCTCN2019086517-appb-000001
Figure PCTCN2019086517-appb-000001
步骤1-2,对路径点集合排序:将n个路径点集合首先按集合大小由大到小、其次按索引值由小到大排序,记为P 1、P 2、…、P n,满足|P 1|≥|P 2|≥…≥|P n|; Step 1-2, sort the set of path points: firstly sort the set of n path points according to the collection size from large to small, and then according to the index value from small to large, denoted as P 1 , P 2 , ..., P n , satisfying |P 1 |≥|P 2 |≥…≥|P n |;
步骤1-3,初始化相似度距离矩阵:设定距离门限ε,其取值范围为0<ε<1,一般情况下可以取值为路径点集合最近邻距离的均值,即:Steps 1-3, initialize the similarity distance matrix: set the distance threshold ε, and its value range is 0<ε<1. In general, it can be the mean value of the nearest neighbor distance of the path point set, namely:
Figure PCTCN2019086517-appb-000002
Figure PCTCN2019086517-appb-000002
初始化相似度距离矩阵DistArray为空,其矩阵大小n×n,即矩阵的行数和列数均为n,因为相似度距离矩阵关于多角线对称,所以只保留上三角部分。The initial similarity distance matrix DistArray is empty, and its matrix size is n×n, that is, the number of rows and columns of the matrix are both n. Because the similarity distance matrix is symmetric about the polygon, only the upper triangular part is retained.
步骤2创新性地提出了一种基于路径点集合大小与距离门限ε的相似度比较策略(步骤2-3),大大简化了两两路径点集合的相似度比较计算成本,并在集合型相似度距离计算的基础上进一步创新性地提出了针对路径点集合的“ε邻域”、“核心路径集”、“直接密度可达”、“间接密度可达”、“密度相连”的概念(步骤2-8、2-9),从而将传统针对固定维数向量的密度聚类规则拓展到集合型数据上,步骤2包括: Step 2 Innovatively proposes a similarity comparison strategy based on the size of the path point set and the distance threshold ε (Step 2-3), which greatly simplifies the calculation cost of the similarity comparison of the pair of path point sets, and is similar in the set type Based on the calculation of degree distance, the concepts of "ε neighborhood", "core path set", "direct density reachability", "indirect density reachability", and "density connection" for the set of path points are further innovatively proposed ( Steps 2-8, 2-9), so as to extend the traditional density clustering rules for fixed-dimensional vectors to set data. Step 2 includes:
步骤2-1,设置当前集合索引:设置当前路径点集合索引s=1;Step 2-1, set the current collection index: set the current path point collection index s=1;
步骤2-2,设置待比较集合索引:设置待比较路径点集合索引t=s+1;Step 2-2, set the index of the set to be compared: set the index of the path point to be compared t=s+1;
步骤2-3,判断待比较集合索引:判断待比较路径点集合索引,如果不满足t≤n且|P t|/|P s|≥1-ε,继续步骤2-4,如果满足则执行步骤2-6; Step 2-3, judge the set index to be compared: judge the set index of the path point to be compared, if t≤n and |P t |/|P s |≥1-ε are not satisfied, continue to step 2-4, if it is satisfied, execute Step 2-6;
步骤2-4,更新当前集合索引:更新当前集合索引值s=s+1;Step 2-4, update the current collection index: update the current collection index value s=s+1;
步骤2-5,判断当前集合索引:判断当前集合索引,如果满足s≥n,继续步骤2-8,否则,返回步骤2-2;Step 2-5, judge the current collection index: judge the current collection index, if s≥n, continue to step 2-8, otherwise, return to step 2-2;
步骤2-6,计算相似度距离:计算当前集合索引与待比较集合索引对应的两个路径点集合之间的Jaccard距离JaccardDist(P s,P t),如果满足JaccardDist(P s,P t)≤ε,更新相似度矩阵中对应矩阵单元值: Step 2-6, Calculate the similarity distance: Calculate the Jaccard distance JaccardDist(P s ,P t ) between the two path point sets corresponding to the current set index and the set index to be compared, if JaccardDist(P s ,P t ) is satisfied ≤ε, update the corresponding matrix cell value in the similarity matrix:
DistArray[s,t]=JaccardDist(P s,P t)     (3) DistArray[s,t]=JaccardDist(P s ,P t ) (3)
DistArray[s,t]表示相似度距离矩阵DistArray第s行第t列的值;DistArray[s,t] represents the value of the sth row and tth column of the similarity distance matrix DistArray;
步骤2-7,更新待比较集合索引:t=t+1,返回步骤2-3;Step 2-7, update the index of the set to be compared: t=t+1, return to step 2-3;
步骤2-8,计算路径点邻域大小:给定任意路径点集合P,将与路径点集合P的相似度距离在距离门限ε以内的其他所有路径点集合定义为该路径点集合P的ε邻域,记为N ε(P): Step 2-8, calculate the size of the path point neighborhood: given any path point set P, define all other path point sets whose similarity distance to the path point set P is within the distance threshold ε as the ε of the path point set P Neighborhood, denoted as N ε (P):
N ε(P)={Q|JaccardDist(P,Q)≤ε&&Q≠P}       (4), N ε (P)={Q|JaccardDist(P,Q)≤ε&&Q≠P} (4),
其中Q表示任意路径点集合Q,根据公式(4)计算各路径点集合P i的ε邻域大小,记为|N ε(P i)|; Wherein Q represents an arbitrary path point set Q, (. 4) was calculated for each path a set of points P i [epsilon] neighborhood size according to the formula, denoted by | N ε (P i) | ;
步骤2-9,构建核心路径集:设定密度门限MinPts,将ε邻域大小不小于MinPts的路径点集合定义为核心路径集,其取值为大于等于1并小于n的自然数,一般情况下可取值为
Figure PCTCN2019086517-appb-000003
即任一核心路径集CoreP满足:
Step 2-9, construct the core path set: set the density threshold MinPts, and define the path point set with the ε neighborhood size not less than MinPts as the core path set, and its value is a natural number greater than or equal to 1 and less than n. Generally, Possible values are
Figure PCTCN2019086517-appb-000003
That is, any core path set CoreP satisfies:
|N ε(CoreP)|≥MinPts      (5); |N ε (CoreP)|≥MinPts (5);
步骤2-10,基于密度的迭代聚合:分别以各核心路径集作为初始簇,给定距离门限ε与密度门限MinPts,如果两核心路径集CoreP与CoreQ满足:Step 2-10, density-based iterative aggregation: each core path set is used as the initial cluster, and the distance threshold ε and the density threshold MinPts are given. If the two core path sets CoreP and CoreQ satisfy:
CoreQ∈N ε(CoreP)     (6), CoreQ∈N ε (CoreP) (6),
则称核心路径集CoreQ从核心路径集CoreP“直接密度可达的”,表示为:It is said that the core path set CoreQ is "directly density accessible" from the core path set CoreP, which is expressed as:
Figure PCTCN2019086517-appb-000004
Figure PCTCN2019086517-appb-000004
如果存在一个长度非零的核心路径集链,使得核心路径集CoreQ与核心路径集CoreP满足如下条件(a)和(b):If there is a core path set chain with a non-zero length, the core path set CoreQ and the core path set CoreP satisfy the following conditions (a) and (b):
(a)
Figure PCTCN2019086517-appb-000005
(a)
Figure PCTCN2019086517-appb-000005
And
(b)n≥1          (7),(b) n≥1 (7),
则称核心路径集CoreQ是从核心路径集CoreP“间接密度可达的”,表示为:It is said that the core path set CoreQ is "indirect density reachable" from the core path set CoreP, expressed as:
Figure PCTCN2019086517-appb-000006
Figure PCTCN2019086517-appb-000006
另外,如果存在一核心路径集CoreO,使得核心路径集CoreP与CoreQ分别从核心路径集CoreO直接或间接密度可达,即满足如下条件(c)和(d):In addition, if there is a core path set CoreO, the core path sets CoreP and CoreQ are directly or indirectly accessible from the core path set CoreO, that is, the following conditions (c) and (d) are satisfied:
(c)
Figure PCTCN2019086517-appb-000007
或者
Figure PCTCN2019086517-appb-000008
(c)
Figure PCTCN2019086517-appb-000007
or
Figure PCTCN2019086517-appb-000008
And
(d)
Figure PCTCN2019086517-appb-000009
或者
Figure PCTCN2019086517-appb-000010
(d)
Figure PCTCN2019086517-appb-000009
or
Figure PCTCN2019086517-appb-000010
则称核心路径集CoreP与CoreQ是“密度相连”的;It is said that the core path set CoreP and CoreQ are "density connected";
继而,根据距离门限ε与密度门限MinPts,基于密度聚类进行迭代式聚合,聚合直接密度可达、间接密度可达与密度相连的核心路径集后生成的簇数目记为u;Then, according to the distance threshold ε and the density threshold MinPts, iterative aggregation is performed based on density clustering, and the number of clusters generated after aggregating the core path sets whose direct density can reach and indirect density can reach the density is recorded as u;
步骤2-11,计算路径集众数:分别针对u个簇C 1,C 2,……,C u中的各个簇C k,C k包含k’个核心路径集:C k={CoreP 1,CoreP 2,……,CoreP k’},CoreP k’表示第k’个核心路径集,计算簇C k的路径集众数Mode k,其中1≤k≤u,C k表示第k个簇。 Step 2-11, calculate the path set mode: for each cluster C k in u clusters C 1 , C 2 , ..., C u , C k contains k'core path sets: C k = {CoreP 1 , CoreP 2, ......, CoreP k '}, CoreP k'denotes' core set of path k, calculates the number of clusters set of all paths Mode C k k, wherein 1≤k≤u, C k denotes the k-th cluster .
步骤2-10包括:Steps 2-10 include:
给定距离门限ε与密度门限MinPts,从任一核心路径集CoreP开始,首先把所有与 核心路径集CoreP直接密度可达的核心路径集聚合在一起,直至所有核心路径集都已经被处理,具体过程包括:Given a distance threshold ε and a density threshold MinPts, starting from any core path set CoreP, first aggregate all core path sets that are directly density-reachable with the core path set CoreP, until all core path sets have been processed. The process includes:
步骤2-10-1,判断是否有未处理的核心路径集,如果有继续步骤2-10-2,如果没有继续步骤2-10-3;Step 2-10-1, judge whether there is an unprocessed core path set, if so, continue to step 2-10-2, if not, continue to step 2-10-3;
步骤2-10-2,针对未处理的任一核心路径集CoreP,把所有满足与核心路径集CoreP直接密度可达的核心路径集聚合在一起,回到步骤2-10-1;Step 2-10-2, for any unprocessed core path set CoreP, aggregate all core path sets that meet the direct density of the core path set CoreP, and return to step 2-10-1;
步骤2-10-3,将所有聚合在一起的核心路径集作为同一个簇,输出形成的簇,簇数目记为u。Step 2-10-3, take all the aggregated core path sets as the same cluster, output the formed clusters, and mark the number of clusters as u.
步骤2-10-3中,同一个簇C中,两两核心路径集之间的关系必然属于以下三种情况之一:直接密度可达、间接密度可达或者密度相连,具体证明如下:In steps 2-10-3, in the same cluster C, the relationship between the two core path sets must belong to one of the following three situations: direct density reachable, indirect density reachable, or density connected. The specific proof is as follows:
设定当前簇C中两两核心路径集是满足直接密度可达、间接密度可达或者密度相连的,当新聚合一个从核心路径集CoreO直接密度可达的核心路径集CoreQ时,即
Figure PCTCN2019086517-appb-000011
且CoreO∈C,簇C中原有的任意核心路径集CoreP与新加入的核心路径集CoreQ存在以下四种情况:
Set the two-by-two core path set in the current cluster C to meet the requirements of direct density reachability, indirect density reachability, or density connection. When a core path set CoreQ that is directly density reachable from the core path set CoreO is newly aggregated, that is
Figure PCTCN2019086517-appb-000011
And CoreO∈C, the original arbitrary core path set CoreP in cluster C and the newly added core path set CoreQ have the following four situations:
1、当核心路径集CoreP就是核心路径集CoreO时,
Figure PCTCN2019086517-appb-000012
核心路径集CoreQ从核心路径集CoreP直接密度可达;
1. When the core path set CoreP is the core path set CoreO,
Figure PCTCN2019086517-appb-000012
The core path set CoreQ is directly accessible from the core path set CoreP;
2、当核心路径集CoreP从核心路径集CoreO直接密度可达或间接密度可达时,
Figure PCTCN2019086517-appb-000013
Figure PCTCN2019086517-appb-000014
或者
Figure PCTCN2019086517-appb-000015
而同时
Figure PCTCN2019086517-appb-000016
因此核心路径集CoreP与CoreQ是经核心路径集CoreO密度相连的;
2. When the core path set CoreP is directly or indirectly density reachable from the core path set CoreO,
Figure PCTCN2019086517-appb-000013
Figure PCTCN2019086517-appb-000014
or
Figure PCTCN2019086517-appb-000015
While at the same time
Figure PCTCN2019086517-appb-000016
Therefore, the core path set CoreP and CoreQ are densely connected via the core path set CoreO;
3、当核心路径集CoreO从核心路径集CoreP直接密度可达或间接密度可达时,即
Figure PCTCN2019086517-appb-000017
Figure PCTCN2019086517-appb-000018
或者
Figure PCTCN2019086517-appb-000019
而同时
Figure PCTCN2019086517-appb-000020
因此
Figure PCTCN2019086517-appb-000021
核心路径集CoreQ从核心路径集CoreP间接密度可达;
3. When the core path set CoreO is directly or indirectly density reachable from the core path set CoreP, that is
Figure PCTCN2019086517-appb-000017
Figure PCTCN2019086517-appb-000018
or
Figure PCTCN2019086517-appb-000019
While at the same time
Figure PCTCN2019086517-appb-000020
therefore
Figure PCTCN2019086517-appb-000021
Core path set CoreQ can reach indirect density from core path set CoreP;
4、当核心路径集CoreO与核心路径集CoreP是密度相连时,即存在某核心路径集CoreR,使得
Figure PCTCN2019086517-appb-000022
或者
Figure PCTCN2019086517-appb-000023
Figure PCTCN2019086517-appb-000024
或者
Figure PCTCN2019086517-appb-000025
则有
Figure PCTCN2019086517-appb-000026
因此核心路径集CoreP与核心路径集CoreQ也是经核心路径集CoreR密度相连的。
4. When the core path set CoreO and the core path set CoreP are densely connected, there is a core path set CoreR, so that
Figure PCTCN2019086517-appb-000022
or
Figure PCTCN2019086517-appb-000023
And
Figure PCTCN2019086517-appb-000024
or
Figure PCTCN2019086517-appb-000025
Then there is
Figure PCTCN2019086517-appb-000026
Therefore, the core path set CoreP and the core path set CoreQ are also densely connected by the core path set CoreR.
由此可见,新聚合的核心路径集CoreQ与簇中原有的核心路径集仍然满足直接密度可达、间接密度可达或密度相连的关系。It can be seen that the newly aggregated core path set CoreQ and the original core path set in the cluster still satisfy the relationship of direct density reachability, indirect density reachability, or density connection.
步骤2-11中,根据如下公式计算簇C k的路径集众数Mode kIn step 2-11, calculate the path set mode Mode k of cluster C k according to the following formula,
Mode k=argmin P1≤q≤k’JaccardDist(P,CoreP q) Mode k = argmin P1≤q≤k' JaccardDist(P,CoreP q )
(9),(9),
其中,P表示路径点集合,CoreP q表示簇C k中的第q个核心路径集,而路径集众数Mode k表示当与簇C k中所有核心路径集Jaccard距离之和最小时对应的路径点集合。 Among them, P represents the set of path points, CoreP q represents the qth core path set in the cluster C k , and the path set mode Mode k represents the path corresponding to the smallest sum of Jaccard distances from all core path sets in the cluster C k Point collection.
步骤2-11包括:Steps 2-11 include:
步骤2-11-1,计算交集系数和并集系数:给定簇C k,包含k’个核心路径集:C k={CoreP 1,CoreP 2,……,CoreP k’},先计算簇C k中包含的路径点字典Ω kStep 2-11-1, calculate the intersection coefficient and union coefficient: Given a cluster C k , including k'core path sets: C k = {CoreP 1 ,CoreP 2 ,...,CoreP k' }, first calculate the cluster The waypoint dictionary contained in C k Ω k :
Ω k=∪ 1≤q≤k’CoreP q Ω k = ∪ 1≤q≤k 'CoreP q ,
即路径点字典是簇C k中所有核心路径集的并集,然后针对路径点字典中的各路径点p r,计算路径点p r在簇C k各核心路径集CoreP q中的交集系数α rq与并集系数β rq,如下式所示: That is, the path point dictionary is the union of all core path sets in the cluster C k , and then for each path point p r in the path point dictionary, calculate the intersection coefficient α of the path point p r in the core path set CoreP q of the cluster C k rq and the union coefficient β rq are shown in the following formula:
Figure PCTCN2019086517-appb-000027
Figure PCTCN2019086517-appb-000027
Figure PCTCN2019086517-appb-000028
Figure PCTCN2019086517-appb-000028
步骤2-11-2,基于交集系数和并集系数计算路径点与核心路径集的Jaccard距离,基于交并集系数,路径点集合P={p r}与各核心路径集CoreP q的Jaccard距离可以简化为: Step 2-11-2, calculate the Jaccard distance between the path point and the core path set based on the intersection coefficient and the union coefficient. Based on the intersection coefficient, the path point set P={p r } and the Jaccard distance of each core path set CoreP q Can be simplified to:
Figure PCTCN2019086517-appb-000029
Figure PCTCN2019086517-appb-000029
步骤2-11-3,基于交集系数和并集系数计算路径点集合众数:Step 2-11-3, calculate the mode of the path point set based on the intersection coefficient and the union coefficient:
Figure PCTCN2019086517-appb-000030
Figure PCTCN2019086517-appb-000030
步骤3包括:将Mode k作为第k个簇C k的路径热点输出。 Step 3 includes: output Mode k as the path hot spot of the kth cluster C k .
距离门限ε用于比较路径点集合之间的相似度,由于两两路径点集合之间的Jaccard距离取值范围在区间[0,1]之内,距离门限ε取值同样在区间[0,1]之内。The distance threshold ε is used to compare the similarity between the set of path points. Since the Jaccard distance between two sets of path points is in the interval [0,1], the distance threshold ε is also in the interval [0, 1] within.
由于两路径点集合之间的Jaccard距离取值满足上限条件:
Figure PCTCN2019086517-appb-000031
因此如果要满足JaccardDist(P s,P t)≤ε,则必须
Figure PCTCN2019086517-appb-000032
Since the Jaccard distance between two path point sets meets the upper limit condition:
Figure PCTCN2019086517-appb-000031
Therefore, if you want to satisfy JaccardDist(P s ,P t )≤ε, you must
Figure PCTCN2019086517-appb-000032
传统密度聚类方法仅适用于固定维数向量数据,并不适用与非固定长度的路径点集合数据上。本发明创新性地提出了专门针对路径点集合的“核心路径集”及其“直接密度可达”、“间接密度可达”、“密度相连”概念,从而将仅适用于固定维数向量的传统密度聚类方法拓展应用到非固定长度的路径点集合数据上。本发明还提出了基于交集、并集系数的热点路径挖掘方法,显著提升了热点路径分析效能并提出了基于交集、并集系数的热点路径挖掘方法,显著提升了热点路径分析效能。The traditional density clustering method is only suitable for fixed-dimensional vector data, and not suitable for non-fixed-length path point collection data. The present invention innovatively proposes the "core path set" and its concepts of "direct density reachability", "indirect density reachability", and "density connection" specifically for the set of path points, so that it will only be applicable to fixed-dimensional vector The traditional density clustering method is extended to apply to non-fixed length path point collection data. The invention also proposes a hot path mining method based on intersection and union coefficients, which significantly improves the efficiency of hot path analysis and proposes a hot path mining method based on intersection and union coefficients, which significantly improves the efficiency of hot path analysis.
有益效果:(1)提出了针对目标路径点集合的相似度比较方法;(2)密度门限MinPts的选择具有一定的灵活性、鲁棒性;(3)计算成本低,实现方法工程化。本发明采用基于路径点集合的分析挖掘方法,简化了路径点顺序,有利于聚合具有相同路径点的测量数据,能够大大降低计算成本、提高计算效率。Beneficial effects: (1) A similarity comparison method for the set of target path points is proposed; (2) The selection of the density threshold MinPts has a certain flexibility and robustness; (3) The calculation cost is low, and the method is engineered. The invention adopts the analysis and mining method based on the set of path points, which simplifies the order of the path points, facilitates the aggregation of measurement data with the same path points, and can greatly reduce the calculation cost and improve the calculation efficiency.
附图说明Description of the drawings
下面结合附图和具体实施方式对本发明做更进一步的具体说明,本发明的上述或其他方面的优点将会变得更加清楚。In the following, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments, and the above-mentioned or other advantages of the present invention will become clearer.
图1是本发明的流程图。Figure 1 is a flowchart of the present invention.
具体实施方式detailed description
下面结合附图及实施例对本发明做进一步说明。The present invention will be further described below in conjunction with the drawings and embodiments.
本发明针对将目标路径表征为由若干路径点构成的路径点集合,构建相似度距离矩阵,比较两两路径点集合之间的相似度,基于相似度距离矩阵、距离门限ε与密度门限MinPts采用密度聚类迭代式地计算路径点集合的簇,最后将各簇的路径集众数的作为目标热点路径输出。The present invention aims at characterizing the target path as a set of path points composed of several path points, constructs a similarity distance matrix, compares the similarity between the two path point sets, and adopts the similarity distance matrix, the distance threshold ε, and the density threshold MinPts. Density clustering calculates the clusters of the path point set iteratively, and finally outputs the path set mode of each cluster as the target hot path.
如图1所示,本发明方法具体包括以下步骤:As shown in Figure 1, the method of the present invention specifically includes the following steps:
假设采集了n条目标路径相对应的n个路径点集合,每个路径点集合对应一条目标路径,而路径点集合中的每个元素为对应目标路径中的一个路径点,则定义两两路径点集合P i和P j之间的Jaccard距离为: Assuming that n set of path points corresponding to n target paths are collected, and each set of path points corresponds to a target path, and each element in the set of path points is a path point in the corresponding target path, then define two paths The Jaccard distance between the points P i and P j is:
Figure PCTCN2019086517-appb-000033
Figure PCTCN2019086517-appb-000033
(1)路径点集合排序:将n个路径点集合首先按集合大小由大到小、其次按索引值由小到大排序,记为P 1、P 2、…、P n,满足|P 1|≥|P 2|≥…≥|P n|; (1) Sorting of path point collections: First, sort the set of n path points according to the collection size from large to small, and then according to the index value from small to large, denoted as P 1 , P 2 , ..., P n , satisfying |P 1 |≥|P 2 |≥…≥|P n |;
(2)相似度距离矩阵初始化:设定距离门限ε,其取值范围满足0<ε<1,初始化相似度距离矩阵DistArray为空,其矩阵大小n×n,即矩阵的行数和列数均为n,因为相似度距离矩阵关于多角线对称,所以只保留上三角部分;(2) Initialization of the similarity distance matrix: set the distance threshold ε, the value range of which satisfies 0<ε<1, the initial similarity distance matrix DistArray is empty, and its matrix size is n×n, which is the number of rows and columns of the matrix Both are n, because the similarity distance matrix is symmetric about the polygon, so only the upper triangle part is kept;
(3)当前集合索引设置:设置当前路径点集合索引s=1;(3) Current collection index setting: set current path point collection index s=1;
(4)待比较集合索引设置:设置待比较路径点集合索引t=s+1;(4) Set index of the set to be compared: Set the set index of the path points to be compared t=s+1;
(5)待比较集合索引判断:判断待比较路径点集合索引,如果不满足t≤n且|P t|/|P s|≥1-ε,继续步骤(6),如果满足继续步骤(8); (5) Judgment of the set index to be compared: judge the set index of the path point to be compared, if t≤n and |P t |/|P s |≥1-ε are not satisfied, continue to step (6), if it is satisfied, continue to step (8 );
(6)当前集合索引更新:更新当前集合索引值s=s+1;(6) Current collection index update: update the current collection index value s=s+1;
(7)当前集合索引判断:判断当前集合索引,如果满足s≥n,继续步骤(10),否则,返回步骤(4);(7) Judgment of current collection index: judge the current collection index, if s≥n, continue to step (10), otherwise, return to step (4);
(8)相似度距离计算:计算当前集合索引与待比较集合索引对应的两个路径点集合之间的Jaccard距离,如果满足JaccardDist(P s,P t)≤ε,更新相似度矩阵中对应矩阵单元值: (8) Similarity distance calculation: Calculate the Jaccard distance between the two path point sets corresponding to the current set index and the set index to be compared. If JaccardDist(P s , P t )≤ε is satisfied, update the corresponding matrix in the similarity matrix Unit value:
DistArray[s,t]=JaccardDist(P s,P t);     (2) DistArray[s,t]=JaccardDist(P s ,P t ); (2)
(9)待比较集合索引更新:t=t+1,返回步骤(5);(9) Update the index of the set to be compared: t=t+1, return to step (5);
(10)路径点邻域大小计算:给定任意路径点集合P,将与路径点集合P的相似度距离在距离门限ε以内的其他所有路径点集合定义为该路径点集合P的ε邻域,记为N ε(P): (10) Path point neighborhood size calculation: Given any path point set P, define all other path point sets whose similarity distance to the path point set P is within the distance threshold ε as the ε neighborhood of the path point set P , Denoted as N ε (P):
N ε(P)={Q|JaccardDist(P,Q)≤ε&&Q≠P}      (3),计算各路径点集合P i邻域大小,|N ε(P i)|; N ε (P)={Q|JaccardDist(P,Q)≤ε&&Q≠P} (3), calculate the size of the neighborhood of each path point set P i , |N ε (P i )|;
(11)核心路径集构建:设定密度门限MinPts,将ε邻域大小不小于MinPts的路径点集合定义为核心路径集,即任一核心路径集CoreP满足:(11) Construction of core path set: Set the density threshold MinPts, and define the set of path points whose ε neighborhood size is not less than MinPts as the core path set, that is, any core path set CoreP satisfies:
|N ε(CoreP)|≥MinPts      (4); |N ε (CoreP)|≥MinPts (4);
(12)基于密度的迭代聚合:分别以各核心路径集作为初始簇,给定距离门限ε与密度门限MinPts,如果两核心路径集CoreP与CoreQ满足:(12) Density-based iterative aggregation: Taking each core path set as the initial cluster, given the distance threshold ε and the density threshold MinPts, if the two core path sets CoreP and CoreQ satisfy:
CoreQ∈N ε(CoreP)      (5), CoreQ∈N ε (CoreP) (5),
则称核心路径集CoreQ从核心路径集CoreP“直接密度可达”,表示为
Figure PCTCN2019086517-appb-000034
如果存在一个长度非零的核心路径集链,使得核心路径集CoreQ与核心路径集CoreP满足:
It is said that the core path set CoreQ is "directly density accessible" from the core path set CoreP, which is expressed as
Figure PCTCN2019086517-appb-000034
If there is a core path set chain with a non-zero length, the core path set CoreQ and the core path set CoreP satisfy:
(a)
Figure PCTCN2019086517-appb-000035
(a)
Figure PCTCN2019086517-appb-000035
And
(b)n≥1         (6),(b) n≥1 (6),
则称核心路径集CoreQ是从核心路径集CoreP“间接密度可达”的,表示为:
Figure PCTCN2019086517-appb-000036
CoreQ;另外,如果存在一核心路径集CoreO,使得核心路径集CoreP与CoreQ分别从核心路径集CoreO直接或间接密度可达,即,
It is said that the core path set CoreQ is "indirect density reachable" from the core path set CoreP, expressed as:
Figure PCTCN2019086517-appb-000036
CoreQ; In addition, if there is a core path set CoreO, the core path sets CoreP and CoreQ can be directly or indirectly reachable from the core path set CoreO, that is,
(a)
Figure PCTCN2019086517-appb-000037
或者
Figure PCTCN2019086517-appb-000038
(a)
Figure PCTCN2019086517-appb-000037
or
Figure PCTCN2019086517-appb-000038
And
(b)
Figure PCTCN2019086517-appb-000039
或者
Figure PCTCN2019086517-appb-000040
(b)
Figure PCTCN2019086517-appb-000039
or
Figure PCTCN2019086517-appb-000040
则称核心路径集CoreP与CoreQ是“密度相连”的;继而,根据距离门限ε与密度门限MinPts,基于密度聚类进行迭代式聚合,聚合直接密度可达、间接密度可达与密度 相连的核心路径集后生成的簇数目记为u;It is said that the core path set CoreP and CoreQ are "density-connected"; then, according to the distance threshold ε and the density threshold MinPts, iterative aggregation is performed based on density clustering, and the aggregation can reach the core density directly and indirectly. The number of clusters generated after the path set is denoted as u;
(13)路径集众数计算:分别针对u个簇C 1,C 2,……,C u中的各个簇C k,包含k’个核心路径集:C k={CoreP 1,CoreP 2,……,CoreP k’},并计算簇C k的路径集众数Mode k,Mode k=argmin P1≤q≤k’JaccardDist(P,CoreP q)    (8), (13) Path set mode calculation: for each cluster C k in u clusters C 1 , C 2 ,..., C u , including k'core path sets: C k = {CoreP 1 ,CoreP 2 , ……,CoreP k' }, and calculate the path set mode Mode k of cluster C k , Mode k = argmin P1≤q≤k' JaccardDist(P,CoreP q ) (8),
其中1≤k≤u,C k表示第k个簇,CoreP j表示第j个核心路径集,将Mode k其作为簇C k的路径热点输出。 Among them, 1≤k≤u, C k represents the k-th cluster, CoreP j represents the j-th core path set, and Mode k is output as the path hot spot of the cluster C k .
本发明方法可以提升目标位置量测不精确情形下的目标路径分析能力,有利于减少目标位置量测的冗余性,增加空间粒度的灵活性,可以更好地完成目标路径分析任务。下面通过一个实例来说明本发明的基于密度聚类的热点路径分析方法。The method of the present invention can improve the target path analysis ability in the case of inaccurate target position measurement, is beneficial to reduce the redundancy of target position measurement, increase the flexibility of space granularity, and can better complete the target path analysis task. The following uses an example to illustrate the hotspot path analysis method based on density clustering of the present invention.
本实施例中,在某城市道路交通管理中,基于出租车轨迹信息采集到n=5条高频目标路径,对应着5个路径点集合,路径点集合中的每个元素对应该路径中的一个路径点,距离门限ε取值为0.3,密度门限MinPts取值为1,则基于密度聚类的热点路径分析步骤如下:In this embodiment, in the road traffic management of a certain city, n=5 high-frequency target paths are collected based on taxi trajectory information, corresponding to 5 way point sets, and each element in the way point set corresponds to the For a path point, the distance threshold ε is 0.3, and the density threshold MinPts is 1. The hot path analysis steps based on density clustering are as follows:
步骤1,路径点集合排序,首先按路径点集合大小由大到小、其次按索引值由小到大排序为P 1、P 2、P 3、P 4、P 5,,如表1所示: Step 1. Sort the set of path points, firstly according to the size of the way point set from large to small, and then by index value from small to large, as P 1 , P 2 , P 3 , P 4 , P 5 , as shown in Table 1. :
表1Table 1
路径索引Path index 对应路径点集合Corresponding way point collection 集合大小Collection size
11 P 1={a,b,c,d} P 1 ={a,b,c,d} 44
22 P 2={a,b,c} P 2 ={a,b,c} 33
33 P 3={a,b,c} P 3 ={a,b,c} 33
44 P 4={e,f} P 4 ={e,f} 22
55 P 5={e,f} P 5 ={e,f} 22
步骤2,相似度距离矩阵初始化,距离门限ε取值为0.3,初始化相似度距离矩阵DistArray为空,矩阵大小5×5,因为相似度距离矩阵关于多角线对称,所以只保留上三角部分,如表2所示: Step 2. Initialize the similarity distance matrix. The distance threshold ε is 0.3. The initial similarity distance matrix DistArray is empty and the matrix size is 5×5. Because the similarity distance matrix is symmetrical about the polygon, only the upper triangle part is retained, such as Table 2 shows:
表2Table 2
路径点集合Waypoint collection P 1 P 1 P 2 P 2 P 3 P 3 P 4 P 4 P 5 P 5
P 1 P 1 -- -- -- -- --
P 2 P 2 -- -- -- -- --
P 3 P 3 -- -- -- -- --
P 4 P 4 -- -- -- -- --
P 5 P 5 -- -- -- -- --
步骤3,当前集合索引设置,设置当前路径点集合索引s=1; Step 3. Set current collection index, set current path point collection index s=1;
步骤4,待比较集合索引设置,设置待比较路径点集合索引t=s+1=2;Step 4. Set the set index to be compared, and set the set index of the path points to be compared t=s+1=2;
步骤5,待比较集合索引判断,满足“t≤n且|P t|/|P s|=0.75>1-ε=0.7”,继续步骤8; Step 5, the set index to be compared is judged and satisfies "t≤n and |P t |/|P s |=0.75>1-ε=0.7", continue to step 8;
步骤8,相似度距离计算,计算路径点集合P 1和P 2之间的Jaccard距离为0.25,小于距离门限ε=0.3,更新相似度矩阵DistArray,如表3所示: Step 8. Calculate the similarity distance. Calculate the Jaccard distance between the path point sets P 1 and P 2 as 0.25, which is less than the distance threshold ε = 0.3, and update the similarity matrix DistArray, as shown in Table 3:
表3table 3
路径点集合Waypoint collection P 1 P 1 P 2 P 2 P 3 P 3 P 4 P 4 P 5 P 5
P 1 P 1 -- 0.250.25 -- -- --
P 2 P 2 -- -- -- -- --
P 3 P 3 -- -- -- -- --
P 4 P 4 -- -- -- -- --
P 5 P 5 -- -- -- -- --
步骤9,待比较集合索引更新,更新待比较雷达辐射源索引t=t+1=3,返回步骤5; Step 9, update the index of the set to be compared, update the index of the radar source to be compared t=t+1=3, and return to step 5;
步骤5,待比较集合索引判断,满足“t≤n且|P t|/|P s|=0.75>1-ε”,继续步骤8; Step 5. To determine the set index to be compared and satisfy "t≤n and |P t |/|P s |=0.75>1-ε", proceed to step 8;
步骤8,相似度距离计算,计算路径点集合P 1和P 3之间的Jaccard距离,更新相似度矩阵DistArray,如表4所示: Step 8. Similarity distance calculation, calculate the Jaccard distance between the path point sets P 1 and P 3 , and update the similarity matrix DistArray, as shown in Table 4:
表4Table 4
路径点集合Waypoint collection P 1 P 1 P 2 P 2 P 3 P 3 P 4 P 4 P 5 P 5
P 1 P 1 -- 0.250.25 0.250.25 -- --
P 2 P 2 -- -- -- -- --
P 3 P 3 -- -- -- -- --
P 4 P 4 -- -- -- -- --
P 5 P 5 -- -- -- -- --
步骤9,待比较集合索引更新,更新待比较雷达辐射源索引t=t+1=4,返回步骤5; Step 9. Update the index of the set to be compared, update the radar source index to be compared t=t+1=4, and return to step 5;
步骤5,待比较集合索引判断,判断待比较目标索引值不满足“|P t|/|P s|=0.5≥1-ε”,继续步骤6; Step 5: Judge the index of the set to be compared, and judge that the target index value to be compared does not satisfy "|P t |/|P s |=0.5≥1-ε", and proceed to step 6;
步骤6,当前集合索引更新,更新当前集合索引值s=s+1=2;Step 6. The current collection index is updated, and the current collection index value s=s+1=2;
步骤7,当前集合索引判断,判断当前集合索引s<n,返回步骤4; Step 7, current collection index judgment, judgment current collection index s<n, return to step 4;
步骤4,待比较集合索引设置,设置待比较集合索引t=s+1=3;Step 4. Set the index of the set to be compared, and set the index of the set to be compared t=s+1=3;
步骤5,待比较集合索引判断,判断待比较目标索引值t=3满足“t<n且|P t|/|P s|=1≥1-ε”,继续步骤8; Step 5: Judging the set index to be compared, judging that the target index value t=3 to be compared satisfies "t<n and |P t |/|P s |=1≥1-ε", and proceed to step 8;
步骤8,相似度距离计算,计算路径点集合P 2和P 3之间的Jaccard距离,更新相似度矩阵DistArray,如表5所示: Step 8. Calculate the similarity distance, calculate the Jaccard distance between the set of path points P 2 and P 3 , and update the similarity matrix DistArray, as shown in Table 5:
表5table 5
路径点集合Waypoint collection P 1 P 1 P 2 P 2 P 3 P 3 P 4 P 4 P 5 P 5
P 1 P 1 -- 0.250.25 0.250.25 -- --
P 2 P 2 -- -- 0.000.00 -- --
P 3 P 3 -- -- -- -- --
P 4 P 4 -- -- -- -- --
P 5 P 5 -- -- -- -- --
步骤9,待比较集合索引更新,更新待比较雷达辐射源索引t=t+1=4,返回步骤5; Step 9. Update the index of the set to be compared, update the radar source index to be compared t=t+1=4, and return to step 5;
步骤5,待比较集合索引判断,判断待比较目标索引值t=4不满足“|P t|/|P s|=0.667≥1-ε”,继续步骤6; Step 5. Judging the set index to be compared, judging that the target index value t=4 to be compared does not satisfy "|P t |/|P s |=0.667≥1-ε", and proceed to step 6;
步骤6,当前集合索引更新,更新当前集合索引值s=s+1=3;Step 6. The current collection index is updated, and the current collection index value s=s+1=3;
步骤7,当前集合索引判断,判断当前集合索引s<n,返回步骤4; Step 7, current collection index judgment, judgment current collection index s<n, return to step 4;
步骤4,待比较集合索引设置,设置待比较路径点集合索引t=s+1=4;Step 4. Set the index of the set to be compared, and set the index of the path point to be compared t=s+1=4;
步骤5,待比较集合索引判断,判断待比较目标索引值t=4不满足“|P t|/|P s|≥1-ε”,继续步骤6; Step 5: Judging the set index to be compared, judging that the target index value t=4 to be compared does not satisfy "|P t |/|P s |≥1-ε", continue to step 6;
步骤6,当前集合索引更新,更新当前集合索引值s=s+1=4;Step 6. The current collection index is updated, and the current collection index value s=s+1=4;
步骤7,当前集合索引判断,判断当前集合索引s<n,返回步骤4; Step 7, current collection index judgment, judgment current collection index s<n, return to step 4;
步骤4,待比较集合索引设置,设置待比较路径点集合索引t=s+1=5;Step 4. Set the index of the set to be compared, and set the index of the path point to be compared t=s+1=5;
步骤5,待比较集合索引判断,判断待比较路径点集合索引满足“t=5≤n且|P t|/|P s|=1≥1-ε,继续步骤8; Step 5: Judging the set index to be compared, judging that the set index of the path point to be compared satisfies "t=5≤n and |P t |/|P s |=1≥1-ε, continue to step 8;
步骤8,相似度距离计算,计算路径点集合P 4和P 5之间的Jaccard距离为零,满足JaccardDist(P 4,P 5)≤0.3,更新相似度矩阵DistArray,如表6所示: Step 8. Calculate the similarity distance. Calculate the Jaccard distance between the set of path points P 4 and P 5 to be zero and satisfy JaccardDist(P 4 , P 5 ) ≤ 0.3. Update the similarity matrix DistArray, as shown in Table 6:
表6Table 6
路径点集合Waypoint collection P 1 P 1 P 2 P 2 P 3 P 3 P 4 P 4 P 5 P 5
P 1 P 1 -- 0.250.25 0.250.25 -- --
P 2 P 2 -- -- 0.000.00 -- --
P 3 P 3 -- -- -- -- --
P 4 P 4 -- -- -- -- 0.000.00
P 5 P 5 -- -- -- -- --
步骤9,待比较集合索引更新,更新待比较雷达辐射源索引t=t+1=6,返回步骤5; Step 9. Update the index of the set to be compared, update the radar source index to be compared t=t+1=6, and return to step 5;
步骤5,待比较集合索引判断,判断待比较目标索引值t=6不满足“t≤n”,继续步骤6;Step 5: Judging the set index to be compared, judging that the target index value t=6 does not satisfy "t≤n", and proceed to step 6;
步骤6,当前集合索引更新,更新当前集合索引值s=s+1=5;Step 6. The current collection index is updated, and the current collection index value s=s+1=5;
步骤7,当前集合索引判断,判断当前集合索引s=n,继续步骤10; Step 7, judge the current collection index, judge the current collection index s=n, continue to step 10;
步骤10,路径点邻域大小计算,计算各路径点集合P i的ε邻域大小|N ε(P i)|,如表7所示: Step 10, the waypoint calculation neighborhood size, calculates the size of the neighborhood set of points P i [epsilon] of each path | N ε (P i) | , as shown in Table 7:
表7Table 7
ii 路径点集合Waypoint collection |N ε(P i)| |N ε (P i )|
11 P 1={a,b,c,d} P 1 ={a,b,c,d} 22
22 P 2={a,b,c} P 2 ={a,b,c} 22
33 P 3={a,b,c} P 3 ={a,b,c} 22
44 P 4={e,f} P 4 ={e,f} 11
55 P 5={e,f} P 5 ={e,f} 11
步骤11,核心路径集构建,将ε邻域大小不小于MinPts的路径点集合设置为核心路径集,其取值为大于等于1并小于n的自然数,一般情况下可取值为
Figure PCTCN2019086517-appb-000041
P 1,P 2,P 3,P 4,P 5均为核心路径集;
Step 11. Build the core path set. Set the path point set whose ε neighborhood size is not less than MinPts as the core path set, and its value is a natural number greater than or equal to 1 and less than n. Generally, the value can be
Figure PCTCN2019086517-appb-000041
P 1 , P 2 , P 3 , P 4 , and P 5 are all core path sets;
步骤12,基于密度的迭代聚合,初始簇有5个,分别为{P 1},{P 2},{P 3},{P 4}和{P 5},经过迭代式聚合,最终生成的簇为u=2个:C 1={P 1,P 2,P 3}与C 2={P 4,P 5},簇C 1中,P 1,P 2,P 3两两之间是直接密度可达的,簇C 2中,P 4与P 5也是直接密度可达的; Step 12. Iterative aggregation based on density. There are 5 initial clusters, namely {P 1 }, {P 2 }, {P 3 }, {P 4 } and {P 5 }. After iterative aggregation, the final generated The clusters are u=2: C 1 ={P 1 ,P 2 ,P 3 } and C 2 ={P 4 ,P 5 }, in the cluster C 1 , P 1 , P 2 , P 3 are If the direct density is reachable, in cluster C 2 , P 4 and P 5 are also directly reachable;
步骤13,路径集众数计算,针对每个簇构建由其所有核心路径集合组成的核心集,C 1={P 1,P 2,P 3}与C 2={P 4,P 5},分别计算其众数为:Mode 1={a,b,c},Mode 2={e,f},以Mode 1为例,其交并集系数如表8所示: Step 13, path set mode calculation, construct a core set consisting of all core path sets for each cluster, C 1 ={P 1 ,P 2 ,P 3 } and C 2 ={P 4 ,P 5 }, Calculate the modes respectively as: Mode 1 = {a, b, c}, Mode 2 = {e, f}, take Mode 1 as an example, the intersection coefficients are shown in Table 8:
表8Table 8
Figure PCTCN2019086517-appb-000042
Figure PCTCN2019086517-appb-000042
对应的最小Jaccard距离总和为:
Figure PCTCN2019086517-appb-000043
The corresponding minimum sum of Jaccard distances is:
Figure PCTCN2019086517-appb-000043
在城市道路交通管理中,则可以针对挖掘出的热点路径{a,b,c}与{e,f},加强对应道路、红绿灯,以保障道路畅通、控制车流量。本发明的研究成果有利于提升目标位置量测不精确情形下的目标路径分析能力,有利于减少目标位置量测的冗余性,增加空间粒度的灵活性,可以更好地完成目标路径分析任务。In urban road traffic management, you can strengthen the corresponding roads and traffic lights for the excavated hot routes {a,b,c} and {e,f} to ensure smooth roads and control traffic flow. The research results of the present invention are beneficial to improve the target path analysis ability in the case of inaccurate target position measurement, reduce the redundancy of target position measurement, increase the flexibility of spatial granularity, and better complete the target path analysis task .
本发明的研究工作得到了国家自然科学基金(No.61771177)资助。The research work of the present invention was funded by the National Natural Science Foundation of China (No. 61771177).
本发明提供了一种基于密度聚类的热点路径分析方法,具体实现该技术方案的方法和途径很多,以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。本实施例中未明确的各组成部分均可用现有技术加以实现。The present invention provides a hotspot path analysis method based on density clustering. There are many methods and ways to implement this technical solution. The above are only preferred embodiments of the present invention. It should be noted that for those of ordinary skill in the art In other words, without departing from the principle of the present invention, several improvements and modifications can be made, and these improvements and modifications should also be regarded as the protection scope of the present invention. All the components that are not clear in this embodiment can be implemented using existing technology.

Claims (8)

  1. 一种基于密度聚类的热点路径分析方法,其特征在于,包括如下步骤:A hotspot path analysis method based on density clustering is characterized in that it comprises the following steps:
    步骤1,针对将目标路径表征为由若干路径点构成的路径点集合,构建相似度距离矩阵;Step 1. Construct a similarity distance matrix for characterizing the target path as a set of path points composed of several path points;
    步骤2,比较两两路径点集合之间的相似度,基于相似度距离矩阵、距离门限ε与密度门限MinPts从路径点集合中挖掘出核心路径集,再根据针对核心路径集的直接密度可达关系,采用密度聚类迭代式地生成由核心路径集聚合成的簇;Step 2. Compare the similarity between the pair of path point sets. Based on the similarity distance matrix, distance threshold ε, and density threshold MinPts, the core path set is mined from the path point set, and then the core path set can be reached according to the direct density Relationship, using density clustering to iteratively generate clusters aggregated by core path sets;
    步骤3,将各簇的路径点集合众数作为目标热点路径输出。Step 3: Output the mode of the path point set of each cluster as the target hot path.
  2. 根据权利要求1所述的方法,其特征在于,步骤1包括:The method according to claim 1, wherein step 1 comprises:
    步骤1-1,设定采集了n条目标路径相对应的n个路径点集合,每个路径点集合对应一条目标路径,而路径点集合中的每个元素为对应目标路径中的一个路径点,则定义第i个路径点集合P i和第j个路径点集合P j之间的Jaccard距离JaccardDist(P i,P j)为: Step 1-1, set the collection of n waypoint sets corresponding to n target paths, each waypoint set corresponds to a target path, and each element in the waypoint set is a waypoint in the corresponding target path , Define the Jaccard distance JaccardDist(P i ,P j ) between the i-th path point set P i and the j-th path point set P j as:
    Figure PCTCN2019086517-appb-100001
    Figure PCTCN2019086517-appb-100001
    步骤1-2,对路径点集合排序:将n个路径点集合首先按集合大小由大到小、其次按索引值由小到大排序,记为P 1、P 2、…、P n,满足|P 1|≥|P 2|≥…≥|P n|; Step 1-2, sort the set of path points: firstly sort the set of n path points according to the collection size from large to small, and then according to the index value from small to large, denoted as P 1 , P 2 , ..., P n , satisfying |P 1 |≥|P 2 |≥…≥|P n |;
    步骤1-3,初始化相似度距离矩阵:设定距离门限ε,初始化相似度距离矩阵DistArray为空,其矩阵大小n×n,即矩阵的行数和列数均为n。Step 1-3, initialize the similarity distance matrix: set the distance threshold ε, initialize the similarity distance matrix DistArray to be empty, and its matrix size is n×n, that is, the number of rows and columns of the matrix are both n.
  3. 根据权利要求2所述的方法,其特征在于,步骤1-3中,距离门限ε取值为所有路径点集合最近邻距离的均值,即:The method according to claim 2, characterized in that, in steps 1-3, the distance threshold ε is the mean value of the nearest neighbor distances of all path point sets, namely:
    Figure PCTCN2019086517-appb-100002
    Figure PCTCN2019086517-appb-100002
  4. 根据权利要求3所述的方法,其特征在于,步骤2包括:The method according to claim 3, wherein step 2 comprises:
    步骤2-1,设置当前集合索引:设置当前路径点集合索引s=1;Step 2-1, set the current collection index: set the current path point collection index s=1;
    步骤2-2,设置待比较集合索引:设置待比较路径点集合索引t=s+1;Step 2-2, set the index of the set to be compared: set the index of the path point to be compared t=s+1;
    步骤2-3,判断待比较集合索引:判断待比较路径点集合索引,如果不满足t≤n且|P t|/|P s|≥1-ε,继续步骤2-4,如果满足则执行步骤2-6; Step 2-3, judge the set index to be compared: judge the set index of the path point to be compared, if t≤n and |P t |/|P s |≥1-ε are not satisfied, continue to step 2-4, if it is satisfied, execute Step 2-6;
    步骤2-4,更新当前集合索引:更新当前集合索引值s=s+1;Step 2-4, update the current collection index: update the current collection index value s=s+1;
    步骤2-5,判断当前集合索引:判断当前集合索引,如果满足s≥n,继续步骤2-8,否则,返回步骤2-2;Step 2-5, judge the current collection index: judge the current collection index, if s≥n, continue to step 2-8, otherwise, return to step 2-2;
    步骤2-6,计算相似度距离:计算当前集合索引与待比较集合索引对应的两个路径点集合之间的Jaccard距离JaccardDist(P s,P t),如果满足JaccardDist(P s,P t)≤ε,更新相似度矩阵中对应矩阵单元值: Step 2-6, Calculate the similarity distance: Calculate the Jaccard distance JaccardDist(P s ,P t ) between the two path point sets corresponding to the current set index and the set index to be compared, if JaccardDist(P s ,P t ) is satisfied ≤ε, update the corresponding matrix cell value in the similarity matrix:
    DistArray[s,t]=JaccardDist(P s,P t)      (3), DistArray[s,t]=JaccardDist(P s ,P t ) (3),
    DistArray[s,t]表示相似度距离矩阵DistArray第s行第t列的值;DistArray[s,t] represents the value of the sth row and tth column of the similarity distance matrix DistArray;
    步骤2-7,更新待比较集合索引:t=t+1,返回步骤2-3;Step 2-7, update the index of the set to be compared: t=t+1, return to step 2-3;
    步骤2-8,计算路径点邻域大小:给定任意路径点集合P,将与路径点集合P的相似度距离在距离门限ε以内的其他所有路径点集合定义为该路径点集合P的ε邻域,记为N ε(P): Step 2-8, calculate the size of the path point neighborhood: given any path point set P, define all other path point sets whose similarity distance to the path point set P is within the distance threshold ε as the ε of the path point set P Neighborhood, denoted as N ε (P):
    N ε(P)={Q|JaccardDist(P,Q)≤ε&&Q≠P} N ε (P)={Q|JaccardDist(P,Q)≤ε&&Q≠P}
    (4),(4),
    其中Q表示任意路径点集合Q,根据公式(4)计算各路径点集合P i的ε邻域大小,记为|N ε(P i)|; Wherein Q represents an arbitrary path point set Q, (. 4) was calculated for each path a set of points P i [epsilon] neighborhood size according to the formula, denoted by | N ε (P i) | ;
    步骤2-9,构建核心路径集:设定密度门限MinPts,将ε邻域大小不小于MinPts的路径点集合定义为核心路径集,即任一核心路径集CoreP满足:Step 2-9, construct the core path set: set the density threshold MinPts, and define the path point set whose ε neighborhood size is not less than MinPts as the core path set, that is, any core path set CoreP satisfies:
    |N ε(CoreP)|≥MinPts          (5); |N ε (CoreP)|≥MinPts (5);
    步骤2-10,基于密度的迭代聚合:分别以各核心路径集作为初始簇,给定距离门限ε与密度门限MinPts,如果两核心路径集CoreP与CoreQ满足:Step 2-10, density-based iterative aggregation: each core path set is used as the initial cluster, and the distance threshold ε and the density threshold MinPts are given. If the two core path sets CoreP and CoreQ satisfy:
    CoreQ∈N ε(CoreP)             (6), CoreQ∈N ε (CoreP) (6),
    则称核心路径集CoreQ从核心路径集CoreP直接密度可达的,表示为:It is said that the core path set CoreQ is directly accessible from the core path set CoreP, which is expressed as:
    CoreP<CoreQ;CoreP<CoreQ;
    如果存在一个长度非零的核心路径集链,使得核心路径集CoreQ与核心路径集CoreP满足如下条件(a)和(b):If there is a core path set chain with a non-zero length, the core path set CoreQ and the core path set CoreP satisfy the following conditions (a) and (b):
    (a)CoreP<CoreP 1<CoreP 2<……<CoreP n<CoreQ,且 (a) CoreP<CoreP 1 <CoreP 2 <……<CoreP n <CoreQ, and
    (b)n≥1          (7),(b) n≥1 (7),
    则称核心路径集CoreQ是从核心路径集CoreP间接密度可达的,表示为:It is said that the core path set CoreQ is indirectly density accessible from the core path set CoreP, expressed as:
    CoreP< ICoreQ; CoreP< I CoreQ;
    如果存在一核心路径集CoreO,使得核心路径集CoreP与CoreQ分别从核心路径 集CoreO直接或间接密度可达,即满足如下条件(c)和(d):If there is a core path set CoreO, the core path sets CoreP and CoreQ can be directly or indirectly accessible from the core path set CoreO, that is, the following conditions (c) and (d) are satisfied:
    (c)CoreO< I CoreP或者CoreO<CoreP,且 (c) CoreO< I CoreP or CoreO<CoreP, and
    (d)CoreO< I CoreQ或者CoreO<CoreQ       (8) (d) CoreO< I CoreQ or CoreO<CoreQ (8)
    则称核心路径集CoreP与CoreQ是密度相连的;It is said that the core path set CoreP and CoreQ are densely connected;
    继而,根据距离门限ε与密度门限MinPts,基于密度聚类进行迭代式聚合,聚合直接密度可达、间接密度可达与密度相连的核心路径集后生成的簇数目记为u;Then, according to the distance threshold ε and the density threshold MinPts, iterative aggregation is performed based on density clustering, and the number of clusters generated after aggregating the core path sets whose direct density can reach and indirect density can reach the density is recorded as u;
    步骤2-11,计算路径集众数:分别针对u个簇C 1,C 2,……,C u中的各个簇C k,C k包含k’个核心路径集:C k={CoreP 1,CoreP 2,……,CoreP k’},CoreP k’表示第k’个核心路径集,计算簇C k的路径集众数Mode k,其中1≤k≤u,C k表示第k个簇。 Step 2-11, calculate the path set mode: for each cluster C k in u clusters C 1 , C 2 , ..., C u , C k contains k'core path sets: C k = {CoreP 1 , CoreP 2, ......, CoreP k '}, CoreP k'denotes' core set of path k, calculates the number of clusters set of all paths Mode C k k, wherein 1≤k≤u, C k denotes the k-th cluster .
  5. 根据权利要求4所述的方法,其特征在于,步骤2-10包括:The method according to claim 4, wherein steps 2-10 comprise:
    给定距离门限ε与密度门限MinPts,从任一核心路径集CoreP开始,首先把所有与核心路径集CoreP直接密度可达的核心路径集聚合在一起,直至所有核心路径集都已经被处理,具体过程包括:Given a distance threshold ε and a density threshold MinPts, starting from any core path set CoreP, first aggregate all core path sets that are directly density-reachable with the core path set CoreP, until all core path sets have been processed. The process includes:
    步骤2-10-1,判断是否有未处理的核心路径集,如果有继续步骤2-10-2,如果没有继续步骤2-10-3;Step 2-10-1, judge whether there is an unprocessed core path set, if so, continue to step 2-10-2, if not, continue to step 2-10-3;
    步骤2-10-2,针对未处理的任一核心路径集CoreP,把所有满足与核心路径集CoreP直接密度可达的核心路径集聚合在一起,回到步骤2-10-1;Step 2-10-2, for any unprocessed core path set CoreP, aggregate all core path sets that meet the direct density of the core path set CoreP, and return to step 2-10-1;
    步骤2-10-3,将所有聚合在一起的核心路径集作为同一个簇,输出形成的簇,簇数目记为u。Step 2-10-3, take all the aggregated core path sets as the same cluster, output the formed clusters, and mark the number of clusters as u.
  6. 根据权利要求5所述的方法,其特征在于,步骤2-11中,根据如下公式计算簇C k的路径集众数Mode kThe method according to claim 5, wherein in step 2-11, the path set mode Mode k of the cluster C k is calculated according to the following formula,
    Mode k=argmin P1≤q≤k’JaccardDist(P,CoreP q)   (9),其中,P表示路径点集合,CoreP q表示簇C k中的第q个核心路径集,而路径集众数Mode k表示当与簇C k中所有核心路径集Jaccard距离之和最小时对应的路径点集合。 Mode k = argmin P1≤q≤k' JaccardDist(P,CoreP q ) (9), where P represents the path point set, CoreP q represents the qth core path set in the cluster C k , and the path set The number Mode k represents the set of path points corresponding to the minimum sum of Jaccard distances from all core path sets in the cluster C k .
  7. 根据权利要求6所述的方法,其特征在于,步骤2-11包括:The method according to claim 6, wherein steps 2-11 comprise:
    步骤2-11-1,计算交集系数和并集系数:给定簇C k,包含k’个核心路径集:C k={CoreP 1,CoreP 2,……,CoreP k’},先计算簇C k中包含的路径点字典Ω kStep 2-11-1, calculate the intersection coefficient and union coefficient: Given a cluster C k , including k'core path sets: C k = {CoreP 1 ,CoreP 2 ,...,CoreP k' }, first calculate the cluster The waypoint dictionary contained in C k Ω k :
    Ω k=∪ 1≤q≤k’CoreP qΩ k =∪ 1≤q≤k' CoreP q ,
    即路径点字典是簇C k中所有核心路径集的并集,然后针对路径点字典中的各路径点p r,计算路径点p r在簇C k各核心路径集CoreP q中的交集系数α rq与并集系数β rq,如下式所示: That is, the path point dictionary is the union of all core path sets in the cluster C k , and then for each path point p r in the path point dictionary, calculate the intersection coefficient α of the path point p r in the core path set CoreP q of the cluster C k rq and the union coefficient β rq are shown in the following formula:
    Figure PCTCN2019086517-appb-100003
    Figure PCTCN2019086517-appb-100003
    Figure PCTCN2019086517-appb-100004
    Figure PCTCN2019086517-appb-100004
    步骤2-11-2,基于交集系数和并集系数计算路径点与核心路径集的Jaccard距离:路径点集合P={p r}与各核心路径集CoreP q的Jaccard距离简化为: Step 2-11-2, calculate the Jaccard distance between the path point and the core path set based on the intersection coefficient and the union coefficient: The Jaccard distance between the path point set P = {p r } and each core path set CoreP q is simplified as:
    Figure PCTCN2019086517-appb-100005
    Figure PCTCN2019086517-appb-100005
    步骤2-11-3,基于交集系数和并集系数计算路径点集合众数:Step 2-11-3, calculate the mode of the path point set based on the intersection coefficient and the union coefficient:
    Figure PCTCN2019086517-appb-100006
    Figure PCTCN2019086517-appb-100006
  8. 根据权利要求7所述的方法,其特征在于:步骤3包括:将Mode k作为第k个簇C k的路径热点输出。 The method according to claim 7, characterized in that: step 3 comprises: outputting Mode k as the path hot spot of the kth cluster C k .
PCT/CN2019/086517 2019-03-26 2019-05-13 Hotspot path analysis method based on density clustering WO2020191876A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2020545145A JP6912672B2 (en) 2019-03-26 2019-05-13 Hot route analysis method based on density clustering

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910231648.2A CN110135450B (en) 2019-03-26 2019-03-26 Hot spot path analysis method based on density clustering
CN201910231648.2 2019-03-26

Publications (1)

Publication Number Publication Date
WO2020191876A1 true WO2020191876A1 (en) 2020-10-01

Family

ID=67568587

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/086517 WO2020191876A1 (en) 2019-03-26 2019-05-13 Hotspot path analysis method based on density clustering

Country Status (3)

Country Link
JP (1) JP6912672B2 (en)
CN (1) CN110135450B (en)
WO (1) WO2020191876A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112749743A (en) * 2021-01-04 2021-05-04 清华大学 Track space-time clustering method, system and storage device
CN117633563A (en) * 2024-01-24 2024-03-01 中国电子科技集团公司第十四研究所 Multi-target top-down hierarchical grouping method based on OPTICS algorithm

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110990537B (en) * 2019-12-11 2023-06-27 中山大学 Sentence similarity calculation method based on edge information and semantic information
CN113627702B (en) * 2020-05-08 2023-07-25 中国移动通信集团浙江有限公司 Service path analysis method and device and computing equipment
CN111915631A (en) * 2020-06-18 2020-11-10 湖南农业大学 Agricultural machinery working area calculation method based on path point analysis
CN111968365B (en) * 2020-07-24 2022-02-15 武汉理工大学 Non-signalized intersection vehicle behavior analysis method and system and storage medium
CN112116806B (en) * 2020-08-12 2021-08-24 深圳技术大学 Traffic flow characteristic extraction method and system
CN112382398B (en) * 2020-11-12 2022-08-30 平安科技(深圳)有限公司 Multi-scale clinical path mining method and device, computer equipment and storage medium
CN113011472B (en) * 2021-02-26 2023-09-01 广东电网有限责任公司电力调度控制中心 Multi-section electric power quotation curve similarity judging method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150127323A1 (en) * 2013-11-04 2015-05-07 Xerox Corporation Refining inference rules with temporal event clustering
CN105091889A (en) * 2014-04-23 2015-11-25 华为技术有限公司 Hotspot path determination method and hotspot path determination equipment
CN105930862A (en) * 2016-04-13 2016-09-07 江南大学 Density peak clustering algorithm based on density adaptive distance
CN106153031A (en) * 2015-04-13 2016-11-23 骑记(厦门)科技有限公司 Movement locus method for expressing and device
CN108427965A (en) * 2018-03-05 2018-08-21 重庆邮电大学 A kind of hot spot region method for digging based on road network cluster

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095281B (en) * 2014-05-13 2018-12-25 南京理工大学 A kind of web catalogue method for optimization analysis based on Web log mining
US9984310B2 (en) * 2015-01-23 2018-05-29 Highspot, Inc. Systems and methods for identifying semantically and visually related content
US10176198B1 (en) * 2016-05-09 2019-01-08 A9.Com, Inc. Techniques for identifying visually similar content
CN106909805B (en) * 2017-03-01 2019-04-02 广西大学 The method for rebuilding species phylogenetic tree is compared based on a plurality of metabolic pathway
US10909369B2 (en) * 2017-07-14 2021-02-02 Mitsubishi Electric Research Laboratories, Inc Imaging system and method for object detection and localization
CN108345864B (en) * 2018-03-06 2020-09-08 中国电子科技集团公司第二十八研究所 Random set type radar radiation source signal parameter high-frequency mode mining method based on weighted clustering
US10176405B1 (en) * 2018-06-18 2019-01-08 Inception Institute Of Artificial Intelligence Vehicle re-identification techniques using neural networks for image analysis, viewpoint-aware pattern recognition, and generation of multi- view vehicle representations

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150127323A1 (en) * 2013-11-04 2015-05-07 Xerox Corporation Refining inference rules with temporal event clustering
CN105091889A (en) * 2014-04-23 2015-11-25 华为技术有限公司 Hotspot path determination method and hotspot path determination equipment
CN106153031A (en) * 2015-04-13 2016-11-23 骑记(厦门)科技有限公司 Movement locus method for expressing and device
CN105930862A (en) * 2016-04-13 2016-09-07 江南大学 Density peak clustering algorithm based on density adaptive distance
CN108427965A (en) * 2018-03-05 2018-08-21 重庆邮电大学 A kind of hot spot region method for digging based on road network cluster

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112749743A (en) * 2021-01-04 2021-05-04 清华大学 Track space-time clustering method, system and storage device
CN112749743B (en) * 2021-01-04 2023-07-21 清华大学 Track space-time clustering method, system and storage device
CN117633563A (en) * 2024-01-24 2024-03-01 中国电子科技集团公司第十四研究所 Multi-target top-down hierarchical grouping method based on OPTICS algorithm
CN117633563B (en) * 2024-01-24 2024-05-10 中国电子科技集团公司第十四研究所 Multi-target top-down hierarchical grouping method based on OPTICS algorithm

Also Published As

Publication number Publication date
CN110135450B (en) 2020-06-23
JP2021514090A (en) 2021-06-03
CN110135450A (en) 2019-08-16
JP6912672B2 (en) 2021-08-04

Similar Documents

Publication Publication Date Title
WO2020191876A1 (en) Hotspot path analysis method based on density clustering
Wei et al. Superpixel hierarchy
Wang et al. Learning context-sensitive similarity by shortest path propagation
CN101334786B (en) Formulae neighborhood based data dimensionality reduction method
CN107784598A (en) A kind of network community discovery method
CN108922174B (en) Dynamic classification method for paths in group of intersections around expressway entrance ramp
CN112749743B (en) Track space-time clustering method, system and storage device
Liang et al. Comparison detector for cervical cell/clumps detection in the limited data scenario
CN109271427A (en) A kind of clustering method based on neighbour&#39;s density and manifold distance
Mei et al. Differential reinforcement and global collaboration network for rgbt tracking
Qian et al. Grid-based Data Stream Clustering for Intrusion Detection.
Wang et al. Weakly supervised object detection based on active learning
Han et al. Algorithms for Trajectory Points Clustering in Location-Based Social Networks
Chen et al. Field-road classification for GNSS recordings of agricultural machinery using pixel-level visual features
CN106446947A (en) High-dimension data soft and hard clustering integration method based on random subspace
Zhang et al. Toward more efficient locality‐sensitive hashing via constructing novel hash function cluster
Gao et al. MR-DARTS: Restricted connectivity differentiable architecture search in multi-path search space
WO2023056802A1 (en) Image classification method for maximizing mutual information, and device, medium and system
Ding et al. Density peaks clustering algorithm based on improved similarity and allocation strategy
CN115169501A (en) Community detection method based on close similarity of common neighbor node clustering entropy
Zhang et al. A method for k nearest neighbor query of line segment in obstructed spaces
Zhou et al. A Distributed Storage Strategy For Trajectory Data Based On Nosql Database
Wang et al. Efficient aggregate farthest neighbour query processing on road networks
CN109256215A (en) A kind of disease association miRNA prediction technique and system based on from avoidance random walk
CN114067293B (en) Vehicle weight identification rearrangement method and system based on dual attributes and electronic equipment

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2020545145

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19922103

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19922103

Country of ref document: EP

Kind code of ref document: A1