WO2020191876A1

WO2020191876A1 - Hotspot path analysis method based on density clustering

Info

Publication number: WO2020191876A1
Application number: PCT/CN2019/086517
Authority: WO
Inventors: 徐欣; 刁联旺; 易侃; 李青山
Original assignee: 中国电子科技集团公司第二十八研究所
Priority date: 2019-03-26
Filing date: 2019-05-13
Publication date: 2020-10-01
Also published as: CN110135450B; JP2021514090A; CN110135450A; JP6912672B2

Abstract

A hotspot path analysis method based on density clustering. The method comprises: for representing a target path as path point sets constituted by several path points, constructing a similarity distance matrix; comparing the similarities between every two path point sets, and based on the similarity distance matrix, a distance threshold ε and a density threshold MinPts, using density clustering to iteratively calculate clusters constituted by the path point sets; and finally outputting a path set mode of clusters as a target hotspot path. The method has the advantages that (1) a similarity comparison method for a target path point set is provided; (2) the selection of the density threshold MinPts has certain flexibility and robustness; and (3) the calculation cost is low, and method engineering is realized.

Description

A hot path analysis method based on density clustering

Technical field

The invention relates to the field of target path analysis and mining, in particular to a hot path analysis method based on density clustering.

Background technique

As we all know, the amount of measurement data related to the target path is increasing. It is difficult to summarize the target path law in time and accurately by manual analysis and processing, and it is difficult to support high-real-time auxiliary decision-making in time. Traditional target path analysis and prediction technologies mostly focus on target location measurement data, do not analyze based on critical path points, cannot focus on high-level path features, extract multi-granular target path patterns, and have high computational costs.

Summary of the invention

Purpose of the invention: In view of the problems of the prior art, the present invention proposes a method for analyzing hotspot paths based on density clustering, which includes the following steps:

Step 1. Construct a similarity distance matrix for characterizing the target path as a set of path points composed of several path points;

Step 2: Compare the similarity between the pair of path point sets. Based on the similarity distance matrix, distance threshold ε, and density threshold MinPts, the core path set is mined from the path point set, and then based on the "direct density of the core path set" “Da” relationship, using density clustering to iteratively generate clusters aggregated by core path sets;

Step 3: Output the mode of the path point set of each cluster as the target hot path.

Compared with the similarity distance matrix in traditional density clustering, the rows and columns of the matrix in step 1 are no longer a vector of fixed dimensions, but a set of path points of non-fixed length. Step 1 includes:

Step 1-1, set the collection of n waypoint sets corresponding to n target paths, each waypoint set corresponds to a target path, and each element in the waypoint set is a waypoint in the corresponding target path , Define the Jaccard distance JaccardDist(P _i ,P _j ) between the i-th path point set P _i and the j-th path point set P _j as:

Step 1-2, sort the set of path points: firstly sort the set of n path points according to the collection size from large to small, and then according to the index value from small to large, denoted as P ₁ , P ₂ , ..., P _n , satisfying |P ₁ |≥|P ₂ |≥…≥|P _n |;

Steps 1-3, initialize the similarity distance matrix: set the distance threshold ε, and its value range is 0<ε<1. In general, it can be the mean value of the nearest neighbor distance of the path point set, namely:

The initial similarity distance matrix DistArray is empty, and its matrix size is n×n, that is, the number of rows and columns of the matrix are both n. Because the similarity distance matrix is symmetric about the polygon, only the upper triangular part is retained.

Step 2 Innovatively proposes a similarity comparison strategy based on the size of the path point set and the distance threshold ε (Step 2-3), which greatly simplifies the calculation cost of the similarity comparison of the pair of path point sets, and is similar in the set type Based on the calculation of degree distance, the concepts of "ε neighborhood", "core path set", "direct density reachability", "indirect density reachability", and "density connection" for the set of path points are further innovatively proposed ( Steps 2-8, 2-9), so as to extend the traditional density clustering rules for fixed-dimensional vectors to set data. Step 2 includes:

Step 2-1, set the current collection index: set the current path point collection index s=1;

Step 2-2, set the index of the set to be compared: set the index of the path point to be compared t=s+1;

Step 2-3, judge the set index to be compared: judge the set index of the path point to be compared, if t≤n and |P _t |/|P _s |≥1-ε are not satisfied, continue to step 2-4, if it is satisfied, execute Step 2-6;

Step 2-4, update the current collection index: update the current collection index value s=s+1;

Step 2-5, judge the current collection index: judge the current collection index, if s≥n, continue to step 2-8, otherwise, return to step 2-2;

Step 2-6, Calculate the similarity distance: Calculate the Jaccard distance JaccardDist(P _s ,P _t ) between the two path point sets corresponding to the current set index and the set index to be compared, if JaccardDist(P _s ,P _t ) is satisfied ≤ε, update the corresponding matrix cell value in the similarity matrix:

DistArray[s,t]=JaccardDist(P _s ,P _t ) (3)

DistArray[s,t] represents the value of the sth row and tth column of the similarity distance matrix DistArray;

Step 2-7, update the index of the set to be compared: t=t+1, return to step 2-3;

Step 2-8, calculate the size of the path point neighborhood: given any path point set P, define all other path point sets whose similarity distance to the path point set P is within the distance threshold ε as the ε of the path point set P Neighborhood, denoted as N _ε (P):

N _ε (P)={Q|JaccardDist(P,Q)≤ε&&Q≠P} (4),

Wherein Q represents an arbitrary path point set Q, (. 4) was calculated for each path a set of points P _i [epsilon] neighborhood size according to the formula, denoted by _{_{| N ε (P i) |}} ;

Step 2-9, construct the core path set: set the density threshold MinPts, and define the path point set with the ε neighborhood size not less than MinPts as the core path set, and its value is a natural number greater than or equal to 1 and less than n. Generally, Possible values are

That is, any core path set CoreP satisfies:

|N _ε (CoreP)|≥MinPts (5);

Step 2-10, density-based iterative aggregation: each core path set is used as the initial cluster, and the distance threshold ε and the density threshold MinPts are given. If the two core path sets CoreP and CoreQ satisfy:

CoreQ∈N _ε (CoreP) (6),

It is said that the core path set CoreQ is "directly density accessible" from the core path set CoreP, which is expressed as:

If there is a core path set chain with a non-zero length, the core path set CoreQ and the core path set CoreP satisfy the following conditions (a) and (b):

(a)

And

(b) n≥1 (7),

It is said that the core path set CoreQ is "indirect density reachable" from the core path set CoreP, expressed as:

In addition, if there is a core path set CoreO, the core path sets CoreP and CoreQ are directly or indirectly accessible from the core path set CoreO, that is, the following conditions (c) and (d) are satisfied:

(c)

or

And

(d)

or

It is said that the core path set CoreP and CoreQ are "density connected";

Then, according to the distance threshold ε and the density threshold MinPts, iterative aggregation is performed based on density clustering, and the number of clusters generated after aggregating the core path sets whose direct density can reach and indirect density can reach the density is recorded as u;

Step 2-11, calculate the path set mode: for each cluster C _{k in} u clusters C ₁ , C ₂ , ..., C _u , C _k contains k'core path sets: C _k = {CoreP ₁ _{_{, CoreP 2, ......, CoreP k}} '}, CoreP k'denotes' core set of path k, calculates the number of clusters set of all paths Mode C _k _k, wherein 1≤k≤u, C _k denotes the k-th cluster .

Steps 2-10 include:

Given a distance threshold ε and a density threshold MinPts, starting from any core path set CoreP, first aggregate all core path sets that are directly density-reachable with the core path set CoreP, until all core path sets have been processed. The process includes:

Step 2-10-1, judge whether there is an unprocessed core path set, if so, continue to step 2-10-2, if not, continue to step 2-10-3;

Step 2-10-2, for any unprocessed core path set CoreP, aggregate all core path sets that meet the direct density of the core path set CoreP, and return to step 2-10-1;

Step 2-10-3, take all the aggregated core path sets as the same cluster, output the formed clusters, and mark the number of clusters as u.

In steps 2-10-3, in the same cluster C, the relationship between the two core path sets must belong to one of the following three situations: direct density reachable, indirect density reachable, or density connected. The specific proof is as follows:

Set the two-by-two core path set in the current cluster C to meet the requirements of direct density reachability, indirect density reachability, or density connection. When a core path set CoreQ that is directly density reachable from the core path set CoreO is newly aggregated, that is

And CoreO∈C, the original arbitrary core path set CoreP in cluster C and the newly added core path set CoreQ have the following four situations:

1. When the core path set CoreP is the core path set CoreO,

The core path set CoreQ is directly accessible from the core path set CoreP;

2. When the core path set CoreP is directly or indirectly density reachable from the core path set CoreO,

or

While at the same time

Therefore, the core path set CoreP and CoreQ are densely connected via the core path set CoreO;

3. When the core path set CoreO is directly or indirectly density reachable from the core path set CoreP, that is

or

While at the same time

therefore

Core path set CoreQ can reach indirect density from core path set CoreP;

4. When the core path set CoreO and the core path set CoreP are densely connected, there is a core path set CoreR, so that

or

And

or

Then there is

Therefore, the core path set CoreP and the core path set CoreQ are also densely connected by the core path set CoreR.

It can be seen that the newly aggregated core path set CoreQ and the original core path set in the cluster still satisfy the relationship of direct density reachability, indirect density reachability, or density connection.

In step 2-11, calculate the path set mode Mode _{k of} cluster C _k according to the following formula,

Mode _k = _argmin _P ∑ _1≤q≤k' JaccardDist(P,CoreP _q )

(9),

Among them, P represents the set of path points, CoreP _q represents the qth core path set in the cluster C _k , and the path set mode Mode _k represents the path corresponding to the smallest sum of Jaccard distances from all core path sets in the cluster C _k Point collection.

Steps 2-11 include:

Step 2-11-1, calculate the intersection coefficient and union coefficient: Given a cluster C _k , including k'core path sets: C _k = {CoreP ₁ ,CoreP ₂ ,...,CoreP _k' }, first calculate the cluster The waypoint dictionary contained in C _k Ω _k :

_{_{_{Ω k = ∪ 1≤q≤k 'CoreP q}}} ,

That is, the path point dictionary is the union of all core path sets in the cluster C _k , and then for each path point p _r in the path point dictionary, calculate the intersection coefficient α of the path point p _r in the core path set CoreP _q of the cluster C _k _rq and the union coefficient β _rq are shown in the following formula:

Step 2-11-2, calculate the Jaccard distance between the path point and the core path set based on the intersection coefficient and the union coefficient. Based on the intersection coefficient, the path point set P={p _r } and the Jaccard distance of each core path set CoreP _q Can be simplified to:

Step 2-11-3, calculate the mode of the path point set based on the intersection coefficient and the union coefficient:

Step 3 includes: output Mode _k as the path hot spot of the kth cluster C _k .

The distance threshold ε is used to compare the similarity between the set of path points. Since the Jaccard distance between two sets of path points is in the interval [0,1], the distance threshold ε is also in the interval [0, 1] within.

Since the Jaccard distance between two path point sets meets the upper limit condition:

Therefore, if you want to satisfy JaccardDist(P _s ,P _t )≤ε, you must

The traditional density clustering method is only suitable for fixed-dimensional vector data, and not suitable for non-fixed-length path point collection data. The present invention innovatively proposes the "core path set" and its concepts of "direct density reachability", "indirect density reachability", and "density connection" specifically for the set of path points, so that it will only be applicable to fixed-dimensional vector The traditional density clustering method is extended to apply to non-fixed length path point collection data. The invention also proposes a hot path mining method based on intersection and union coefficients, which significantly improves the efficiency of hot path analysis and proposes a hot path mining method based on intersection and union coefficients, which significantly improves the efficiency of hot path analysis.

Beneficial effects: (1) A similarity comparison method for the set of target path points is proposed; (2) The selection of the density threshold MinPts has a certain flexibility and robustness; (3) The calculation cost is low, and the method is engineered. The invention adopts the analysis and mining method based on the set of path points, which simplifies the order of the path points, facilitates the aggregation of measurement data with the same path points, and can greatly reduce the calculation cost and improve the calculation efficiency.

Description of the drawings

In the following, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments, and the above-mentioned or other advantages of the present invention will become clearer.

Figure 1 is a flowchart of the present invention.

detailed description

The present invention will be further described below in conjunction with the drawings and embodiments.

The present invention aims at characterizing the target path as a set of path points composed of several path points, constructs a similarity distance matrix, compares the similarity between the two path point sets, and adopts the similarity distance matrix, the distance threshold ε, and the density threshold MinPts. Density clustering calculates the clusters of the path point set iteratively, and finally outputs the path set mode of each cluster as the target hot path.

As shown in Figure 1, the method of the present invention specifically includes the following steps:

Assuming that n set of path points corresponding to n target paths are collected, and each set of path points corresponds to a target path, and each element in the set of path points is a path point in the corresponding target path, then define two paths The Jaccard distance between the points P _i and P _j is:

(1) Sorting of path point collections: First, sort the set of n path points according to the collection size from large to small, and then according to the index value from small to large, denoted as P ₁ , P ₂ , ..., P _n , satisfying |P ₁ |≥|P ₂ |≥…≥|P _n |;

(2) Initialization of the similarity distance matrix: set the distance threshold ε, the value range of which satisfies 0<ε<1, the initial similarity distance matrix DistArray is empty, and its matrix size is n×n, which is the number of rows and columns of the matrix Both are n, because the similarity distance matrix is symmetric about the polygon, so only the upper triangle part is kept;

(3) Current collection index setting: set current path point collection index s=1;

(4) Set index of the set to be compared: Set the set index of the path points to be compared t=s+1;

(5) Judgment of the set index to be compared: judge the set index of the path point to be compared, if t≤n and |P _t |/|P _s |≥1-ε are not satisfied, continue to step (6), if it is satisfied, continue to step (8 );

(6) Current collection index update: update the current collection index value s=s+1;

(7) Judgment of current collection index: judge the current collection index, if s≥n, continue to step (10), otherwise, return to step (4);

(8) Similarity distance calculation: Calculate the Jaccard distance between the two path point sets corresponding to the current set index and the set index to be compared. If JaccardDist(P _s , P _t )≤ε is satisfied, update the corresponding matrix in the similarity matrix Unit value:

DistArray[s,t]=JaccardDist(P _s ,P _t ); (2)

(9) Update the index of the set to be compared: t=t+1, return to step (5);

(10) Path point neighborhood size calculation: Given any path point set P, define all other path point sets whose similarity distance to the path point set P is within the distance threshold ε as the ε neighborhood of the path point set P , Denoted as N _ε (P):

N _ε (P)={Q|JaccardDist(P,Q)≤ε&&Q≠P} (3), calculate the size of the neighborhood of each path point set P _i , |N _ε (P _i )|;

(11) Construction of core path set: Set the density threshold MinPts, and define the set of path points whose ε neighborhood size is not less than MinPts as the core path set, that is, any core path set CoreP satisfies:

|N _ε (CoreP)|≥MinPts (4);

(12) Density-based iterative aggregation: Taking each core path set as the initial cluster, given the distance threshold ε and the density threshold MinPts, if the two core path sets CoreP and CoreQ satisfy:

CoreQ∈N _ε (CoreP) (5),

It is said that the core path set CoreQ is "directly density accessible" from the core path set CoreP, which is expressed as

If there is a core path set chain with a non-zero length, the core path set CoreQ and the core path set CoreP satisfy:

(a)

And

(b) n≥1 (6),

CoreQ; In addition, if there is a core path set CoreO, the core path sets CoreP and CoreQ can be directly or indirectly reachable from the core path set CoreO, that is,

(a)

or

And

(b)

or

It is said that the core path set CoreP and CoreQ are "density-connected"; then, according to the distance threshold ε and the density threshold MinPts, iterative aggregation is performed based on density clustering, and the aggregation can reach the core density directly and indirectly. The number of clusters generated after the path set is denoted as u;

(13) Path set mode calculation: for each cluster C _{k in} u clusters C ₁ , C ₂ ,..., C _u , including k'core path sets: C _k = {CoreP ₁ ,CoreP ₂ , ……,CoreP _k' }, and calculate the path set mode Mode _{k of} cluster C _k , Mode _k = _argmin _P ∑ _1≤q≤k' JaccardDist(P,CoreP _q ) (8),

Among them, 1≤k≤u, C _k represents the k-th cluster, CoreP _j represents the j-th core path set, and Mode _k is output as the path hot spot of the cluster C _k .

The method of the present invention can improve the target path analysis ability in the case of inaccurate target position measurement, is beneficial to reduce the redundancy of target position measurement, increase the flexibility of space granularity, and can better complete the target path analysis task. The following uses an example to illustrate the hotspot path analysis method based on density clustering of the present invention.

In this embodiment, in the road traffic management of a certain city, n=5 high-frequency target paths are collected based on taxi trajectory information, corresponding to 5 way point sets, and each element in the way point set corresponds to the For a path point, the distance threshold ε is 0.3, and the density threshold MinPts is 1. The hot path analysis steps based on density clustering are as follows:

Step 1. Sort the set of path points, firstly according to the size of the way point set from large to small, and then by index value from small to large, as P ₁ , P ₂ , P ₃ , P ₄ , P ₅ , as shown in Table 1. :

Table 1

路径索引Path index	对应路径点集合Corresponding way point collection	集合大小Collection size
11	P ₁＝{a,b,c,d} P ₁ ={a,b,c,d}	44
22	P ₂＝{a,b,c} P ₂ ={a,b,c}	33
33	P ₃＝{a,b,c} P ₃ ={a,b,c}	33
44	P ₄＝{e,f} P ₄ ={e,f}	22
55	P ₅＝{e,f} P ₅ ={e,f}	22

Step 2. Initialize the similarity distance matrix. The distance threshold ε is 0.3. The initial similarity distance matrix DistArray is empty and the matrix size is 5×5. Because the similarity distance matrix is symmetrical about the polygon, only the upper triangle part is retained, such as Table 2 shows:

Table 2

Waypoint collection

P ₁

P ₂

P ₃

P ₄

P ₅

P ₁ P ₁	--	--	--	--	--
P ₂ P ₂	--	--	--	--	--
P ₃ P ₃	--	--	--	--	--
P ₄ P ₄	--	--	--	--	--
P ₅ P ₅	--	--	--	--	--

Step 3. Set current collection index, set current path point collection index s=1;

Step 4. Set the set index to be compared, and set the set index of the path points to be compared t=s+1=2;

Step 5, the set index to be compared is judged and satisfies "t≤n and |P _t |/|P _s |=0.75>1-ε=0.7", continue to step 8;

Step 8. Calculate the similarity distance. Calculate the Jaccard distance between the path point sets P ₁ and P ₂ as 0.25, which is less than the distance threshold ε = 0.3, and update the similarity matrix DistArray, as shown in Table 3:

table 3

路径点集合Waypoint collection	P ₁ P ₁	P ₂ P ₂	P ₃ P ₃	P ₄ P ₄	P ₅ P ₅
P ₁ P ₁	--	0.250.25	--	--	--
P ₂ P ₂	--	--	--	--	--
P ₃ P ₃	--	--	--	--	--
P ₄ P ₄	--	--	--	--	--
P ₅ P ₅	--	--	--	--	--

Step 9, update the index of the set to be compared, update the index of the radar source to be compared t=t+1=3, and return to step 5;

Step 5. To determine the set index to be compared and satisfy "t≤n and |P _t |/|P _s |=0.75>1-ε", proceed to step 8;

Step 8. Similarity distance calculation, calculate the Jaccard distance between the path point sets P ₁ and P ₃ , and update the similarity matrix DistArray, as shown in Table 4:

Table 4

路径点集合Waypoint collection	P ₁ P ₁	P ₂ P ₂	P ₃ P ₃	P ₄ P ₄	P ₅ P ₅
P ₁ P ₁	--	0.250.25	0.250.25	--	--
P ₂ P ₂	--	--	--	--	--
P ₃ P ₃	--	--	--	--	--
P ₄ P ₄	--	--	--	--	--
P ₅ P ₅	--	--	--	--	--

Step 9. Update the index of the set to be compared, update the radar source index to be compared t=t+1=4, and return to step 5;

Step 5: Judge the index of the set to be compared, and judge that the target index value to be compared does not satisfy "|P _t |/|P _s |=0.5≥1-ε", and proceed to step 6;

Step 6. The current collection index is updated, and the current collection index value s=s+1=2;

Step 7, current collection index judgment, judgment current collection index s<n, return to step 4;

Step 4. Set the index of the set to be compared, and set the index of the set to be compared t=s+1=3;

Step 5: Judging the set index to be compared, judging that the target index value t=3 to be compared satisfies "t<n and |P _t |/|P _s |=1≥1-ε", and proceed to step 8;

Step 8. Calculate the similarity distance, calculate the Jaccard distance between the set of path points P ₂ and P ₃ , and update the similarity matrix DistArray, as shown in Table 5:

table 5

路径点集合Waypoint collection	P ₁ P ₁	P ₂ P ₂	P ₃ P ₃	P ₄ P ₄	P ₅ P ₅
P ₁ P ₁	--	0.250.25	0.250.25	--	--
P ₂ P ₂	--	--	0.000.00	--	--
P ₃ P ₃	--	--	--	--	--
P ₄ P ₄	--	--	--	--	--
P ₅ P ₅	--	--	--	--	--

Step 5. Judging the set index to be compared, judging that the target index value t=4 to be compared does not satisfy "|P _t |/|P _s |=0.667≥1-ε", and proceed to step 6;

Step 6. The current collection index is updated, and the current collection index value s=s+1=3;

Step 4. Set the index of the set to be compared, and set the index of the path point to be compared t=s+1=4;

Step 5: Judging the set index to be compared, judging that the target index value t=4 to be compared does not satisfy "|P _t |/|P _s |≥1-ε", continue to step 6;

Step 6. The current collection index is updated, and the current collection index value s=s+1=4;

Step 4. Set the index of the set to be compared, and set the index of the path point to be compared t=s+1=5;

Step 5: Judging the set index to be compared, judging that the set index of the path point to be compared satisfies "t=5≤n and |P _t |/|P _s |=1≥1-ε, continue to step 8;

Step 8. Calculate the similarity distance. Calculate the Jaccard distance between the set of path points P ₄ and P _{5 to} be zero and satisfy JaccardDist(P ₄ , P ₅ ) ≤ 0.3. Update the similarity matrix DistArray, as shown in Table 6:

Table 6

路径点集合Waypoint collection	P ₁ P ₁	P ₂ P ₂	P ₃ P ₃	P ₄ P ₄	P ₅ P ₅
P ₁ P ₁	--	0.250.25	0.250.25	--	--
P ₂ P ₂	--	--	0.000.00	--	--
P ₃ P ₃	--	--	--	--	--
P ₄ P ₄	--	--	--	--	0.000.00
P ₅ P ₅	--	--	--	--	--

Step 9. Update the index of the set to be compared, update the radar source index to be compared t=t+1=6, and return to step 5;

Step 5: Judging the set index to be compared, judging that the target index value t=6 does not satisfy "t≤n", and proceed to step 6;

Step 6. The current collection index is updated, and the current collection index value s=s+1=5;

Step 7, judge the current collection index, judge the current collection index s=n, continue to step 10;

Step 10, the waypoint calculation neighborhood size, calculates the size of the neighborhood set of points P _i [epsilon] of each path _{_{| N ε (P i) |}} , as shown in Table 7:

Table 7

ii	路径点集合Waypoint collection	\|N _ε(P _i)\| \|N _ε (P _i )\|
11	P ₁＝{a,b,c,d} P ₁ ={a,b,c,d}	22
22	P ₂＝{a,b,c} P ₂ ={a,b,c}	22
33	P ₃＝{a,b,c} P ₃ ={a,b,c}	22
44	P ₄＝{e,f} P ₄ ={e,f}	11
55	P ₅＝{e,f} P ₅ ={e,f}	11

Step 11. Build the core path set. Set the path point set whose ε neighborhood size is not less than MinPts as the core path set, and its value is a natural number greater than or equal to 1 and less than n. Generally, the value can be

P ₁ , P ₂ , P ₃ , P ₄ , and P ₅ are all core path sets;

Step 12. Iterative aggregation based on density. There are 5 initial clusters, namely {P ₁ }, {P ₂ }, {P ₃ }, {P ₄ } and {P ₅ }. After iterative aggregation, the final generated The clusters are u=2: C ₁ ={P ₁ ,P ₂ ,P ₃ } and C ₂ ={P ₄ ,P ₅ }, in the cluster C ₁ , P ₁ , P ₂ , P ₃ are If the direct density is reachable, in cluster C ₂ , P ₄ and P ₅ are also directly reachable;

Step 13, path set mode calculation, construct a core set consisting of all core path sets for each cluster, C ₁ ={P ₁ ,P ₂ ,P ₃ } and C ₂ ={P ₄ ,P ₅ }, Calculate the modes respectively as: Mode ₁ = {a, b, c}, Mode ₂ = {e, f}, take Mode ₁ as an example, the intersection coefficients are shown in Table 8:

Table 8

The corresponding minimum sum of Jaccard distances is:

In urban road traffic management, you can strengthen the corresponding roads and traffic lights for the excavated hot routes {a,b,c} and {e,f} to ensure smooth roads and control traffic flow. The research results of the present invention are beneficial to improve the target path analysis ability in the case of inaccurate target position measurement, reduce the redundancy of target position measurement, increase the flexibility of spatial granularity, and better complete the target path analysis task .

The research work of the present invention was funded by the National Natural Science Foundation of China (No. 61771177).

The present invention provides a hotspot path analysis method based on density clustering. There are many methods and ways to implement this technical solution. The above are only preferred embodiments of the present invention. It should be noted that for those of ordinary skill in the art In other words, without departing from the principle of the present invention, several improvements and modifications can be made, and these improvements and modifications should also be regarded as the protection scope of the present invention. All the components that are not clear in this embodiment can be implemented using existing technology.

Claims

A hotspot path analysis method based on density clustering is characterized in that it comprises the following steps:

Step 1. Construct a similarity distance matrix for characterizing the target path as a set of path points composed of several path points;

Step 2. Compare the similarity between the pair of path point sets. Based on the similarity distance matrix, distance threshold ε, and density threshold MinPts, the core path set is mined from the path point set, and then the core path set can be reached according to the direct density Relationship, using density clustering to iteratively generate clusters aggregated by core path sets;

Step 3: Output the mode of the path point set of each cluster as the target hot path.
The method according to claim 1, wherein step 1 comprises:

Step 1-1, set the collection of n waypoint sets corresponding to n target paths, each waypoint set corresponds to a target path, and each element in the waypoint set is a waypoint in the corresponding target path , Define the Jaccard distance JaccardDist(P i ,P j ) between the i-th path point set P i and the j-th path point set P j as:

Step 1-2, sort the set of path points: firstly sort the set of n path points according to the collection size from large to small, and then according to the index value from small to large, denoted as P 1 , P 2 , ..., P n , satisfying |P 1 |≥|P 2 |≥…≥|P n |;

Step 1-3, initialize the similarity distance matrix: set the distance threshold ε, initialize the similarity distance matrix DistArray to be empty, and its matrix size is n×n, that is, the number of rows and columns of the matrix are both n.
The method according to claim 2, characterized in that, in steps 1-3, the distance threshold ε is the mean value of the nearest neighbor distances of all path point sets, namely:
The method according to claim 3, wherein step 2 comprises:

Step 2-1, set the current collection index: set the current path point collection index s=1;

Step 2-2, set the index of the set to be compared: set the index of the path point to be compared t=s+1;

Step 2-3, judge the set index to be compared: judge the set index of the path point to be compared, if t≤n and |P t |/|P s |≥1-ε are not satisfied, continue to step 2-4, if it is satisfied, execute Step 2-6;

Step 2-4, update the current collection index: update the current collection index value s=s+1;

Step 2-5, judge the current collection index: judge the current collection index, if s≥n, continue to step 2-8, otherwise, return to step 2-2;

Step 2-6, Calculate the similarity distance: Calculate the Jaccard distance JaccardDist(P s ,P t ) between the two path point sets corresponding to the current set index and the set index to be compared, if JaccardDist(P s ,P t ) is satisfied ≤ε, update the corresponding matrix cell value in the similarity matrix:

DistArray[s,t]=JaccardDist(P s ,P t ) (3),

DistArray[s,t] represents the value of the sth row and tth column of the similarity distance matrix DistArray;

Step 2-7, update the index of the set to be compared: t=t+1, return to step 2-3;

Step 2-8, calculate the size of the path point neighborhood: given any path point set P, define all other path point sets whose similarity distance to the path point set P is within the distance threshold ε as the ε of the path point set P Neighborhood, denoted as N ε (P):

N ε (P)={Q|JaccardDist(P,Q)≤ε&&Q≠P}

(4),

Wherein Q represents an arbitrary path point set Q, (. 4) was calculated for each path a set of points P i [epsilon] neighborhood size according to the formula, denoted by | N ε (P i) | ;

Step 2-9, construct the core path set: set the density threshold MinPts, and define the path point set whose ε neighborhood size is not less than MinPts as the core path set, that is, any core path set CoreP satisfies:

|N ε (CoreP)|≥MinPts (5);

Step 2-10, density-based iterative aggregation: each core path set is used as the initial cluster, and the distance threshold ε and the density threshold MinPts are given. If the two core path sets CoreP and CoreQ satisfy:

CoreQ∈N ε (CoreP) (6),

It is said that the core path set CoreQ is directly accessible from the core path set CoreP, which is expressed as:

CoreP＜CoreQ;

If there is a core path set chain with a non-zero length, the core path set CoreQ and the core path set CoreP satisfy the following conditions (a) and (b):

(a) CoreP＜CoreP 1 ＜CoreP 2 ＜……＜CoreP n ＜CoreQ, and

(b) n≥1 (7),

It is said that the core path set CoreQ is indirectly density accessible from the core path set CoreP, expressed as:

CoreP＜ I CoreQ;

If there is a core path set CoreO, the core path sets CoreP and CoreQ can be directly or indirectly accessible from the core path set CoreO, that is, the following conditions (c) and (d) are satisfied:

(c) CoreO< I CoreP or CoreO<CoreP, and

(d) CoreO＜ I CoreQ or CoreO＜CoreQ (8)

It is said that the core path set CoreP and CoreQ are densely connected;

Then, according to the distance threshold ε and the density threshold MinPts, iterative aggregation is performed based on density clustering, and the number of clusters generated after aggregating the core path sets whose direct density can reach and indirect density can reach the density is recorded as u;

Step 2-11, calculate the path set mode: for each cluster C k in u clusters C 1 , C 2 , ..., C u , C k contains k'core path sets: C k = {CoreP 1 , CoreP 2, ......, CoreP k '}, CoreP k'denotes' core set of path k, calculates the number of clusters set of all paths Mode C k k, wherein 1≤k≤u, C k denotes the k-th cluster .
The method according to claim 4, wherein steps 2-10 comprise:

Given a distance threshold ε and a density threshold MinPts, starting from any core path set CoreP, first aggregate all core path sets that are directly density-reachable with the core path set CoreP, until all core path sets have been processed. The process includes:

Step 2-10-1, judge whether there is an unprocessed core path set, if so, continue to step 2-10-2, if not, continue to step 2-10-3;

Step 2-10-2, for any unprocessed core path set CoreP, aggregate all core path sets that meet the direct density of the core path set CoreP, and return to step 2-10-1;

Step 2-10-3, take all the aggregated core path sets as the same cluster, output the formed clusters, and mark the number of clusters as u.
The method according to claim 5, wherein in step 2-11, the path set mode Mode k of the cluster C k is calculated according to the following formula,

Mode k = argmin P ∑ 1≤q≤k' JaccardDist(P,CoreP q ) (9), where P represents the path point set, CoreP q represents the qth core path set in the cluster C k , and the path set The number Mode k represents the set of path points corresponding to the minimum sum of Jaccard distances from all core path sets in the cluster C k .
The method according to claim 6, wherein steps 2-11 comprise:

Step 2-11-1, calculate the intersection coefficient and union coefficient: Given a cluster C k , including k'core path sets: C k = {CoreP 1 ,CoreP 2 ,...,CoreP k' }, first calculate the cluster The waypoint dictionary contained in C k Ω k :

Ω k =∪ 1≤q≤k' CoreP q ,

That is, the path point dictionary is the union of all core path sets in the cluster C k , and then for each path point p r in the path point dictionary, calculate the intersection coefficient α of the path point p r in the core path set CoreP q of the cluster C k rq and the union coefficient β rq are shown in the following formula:

Step 2-11-2, calculate the Jaccard distance between the path point and the core path set based on the intersection coefficient and the union coefficient: The Jaccard distance between the path point set P = {p r } and each core path set CoreP q is simplified as:

Step 2-11-3, calculate the mode of the path point set based on the intersection coefficient and the union coefficient:
The method according to claim 7, characterized in that: step 3 comprises: outputting Mode k as the path hot spot of the kth cluster C k .