CN107301254B

CN107301254B - Road network hot spot area mining method

Info

Publication number: CN107301254B
Application number: CN201710735328.1A
Authority: CN
Inventors: 田玲; 罗光春; 殷光强; 陈爱国
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2017-08-24
Filing date: 2017-08-24
Publication date: 2020-07-10
Anticipated expiration: 2037-08-24
Also published as: CN107301254A

Abstract

The invention discloses a road network hot spot region mining method, belongs to the technical field of data mining, and solves the problem that track clustering is carried out by adopting track space-time similarity measurement and clustering calculation in the prior art. The method comprises the following steps of 1, carrying out track segmentation on all track segments, and calculating the space-time similarity and space-time distance between two segmented sub-track segments; step 2, performing clustering calculation on all track segment data in the grid space according to the space-time similarity and space-time distance of the sub-tracks and a dynamic neighbor-based DBSCAN algorithm; step 3, selecting a significant cluster set from the cluster calculated by clustering, and extracting staying spots from the significant cluster set; and 4, obtaining a high-heat-degree area of the stay spots according to the number of the track sections carried by the stay spots, and obtaining a hot spot area in the road network in the area where the high-heat-degree stay spots are located. The invention is used for positioning the space position.

Description

Road network hot spot area mining method

Technical Field

A road network hot spot region mining method is used for positioning spatial positions and belongs to the technical field of data mining.

Background

In recent years, with the rapid development and application of spatial location positioning technologies, along with the rapid popularization of these technologies, we can easily track the location information of almost any moving object, so as to form a huge trajectory database taking the trajectory as an expression form, and these massive trajectory data contain a large amount of deep information capable of reflecting some motion behavior of the moving object. The space-time trajectory data is used as one kind of space-time data, mainly records the trend of the space position of a moving object changing along with time, and the vehicle space-time trajectory data is more special and is limited in a road network, so that many common data mining methods cannot be directly applied to the space-time trajectory data mining and need to be improved to a certain extent.

Since research on hot spot areas in a road network has important practical application value, research on hot spot path areas must be performed on track data that is effective in a road network. Clustering analysis of the trajectory data is a common method for finding hot gate paths in a road network. Trajectory clustering mainly comprises two parts: and (4) measuring the space-time similarity of the tracks and calculating the clustering. The most common research method in the aspect of measuring the track time-space similarity mainly divides the track based on a grid space, firstly, the method divides the grid space and cuts the track data, and adds the time-space similarity and the time similarity of the divided sub-tracks to obtain the time-space similarity of the track. The method can accurately calculate the space-time similarity between the tracks, but the method respectively calculates the space similarity and the time similarity of the similarity measurement between each pair of tracks, and when the track data volume is large, the response time of the algorithm is large. In the aspect of cluster calculation, because the shape of the track cluster is often similar to a strip shape rather than a spherical shape, the most typical density clustering algorithm DBSCAN is often adopted in the cluster calculation process, and the algorithm can realize cluster calculation of clusters with any shapes. However, the method needs to artificially input two parameter values of the neighborhood radius and the neighborhood density threshold when performing the clustering calculation, and the quality of the two parameter values directly affects the clustering result, and the DBSCAN algorithm does not provide a method for determining the two parameter values.

Disclosure of Invention

The invention aims to: the method solves the problems that in the prior art, when the tracks are clustered by adopting track space-time similarity measurement and clustering calculation, the response time is longer when the track data volume is larger by adopting the space-time similarity measurement; the Euclidean coordinates cannot accurately express the distance between two tracks in the road network; when the density clustering algorithm DBSCAN is adopted for clustering calculation, the neighborhood radius and the neighborhood density threshold value need to be artificially input, and when the value is inaccurate, the clustering result can be directly influenced; the invention provides a road network hot spot area mining method.

The technical scheme adopted by the invention is as follows:

a road network hot spot region mining method is characterized by comprising the following steps:

step 1, carrying out track segmentation on all track segments, and calculating the space-time similarity and space-time distance between two segmented sub-track segments;

step 2, performing clustering calculation on all track segment data in the grid space according to the space-time similarity and space-time distance of the sub-tracks and a dynamic neighbor-based DBSCAN algorithm;

step 3, selecting a significant cluster set from the cluster calculated by clustering, and extracting staying spots from the significant cluster set;

and 4, obtaining a high-heat-degree area of the stay spots according to the number of the track sections carried by the stay spots, and obtaining a hot spot area in the road network in the area where the high-heat-degree stay spots are located.

Further, the specific steps of step 1 are as follows:

step 1.1, dividing a dynamic grid space for a space area where all track sections are located;

step 1.2, carrying out track segmentation on a track sequence in a grid space according to a breakpoint;

and 1.3, calculating the space-time similarity and space-time distance between the two sub-track sections after the track segmentation.

Further, the specific steps of step 1.1 are as follows:

step 1.11, solving the minimum circumscribed rectangle of the space region where all the track segments are located according to the minimum convex hull principle;

step 1.12, solving the length of each track segment and the number of sampling points contained in the track segment, and calculating the average distance of the vehicle on the track segment moving in the time of two adjacent sampling points;

and step 1.13, taking the average distance as the size of a grid space, and performing dynamic grid space division on the minimum external rectangle.

Further, the specific steps of step 1.2 are as follows:

step 1.21, sequentially reading data of each sampling point on each track segment;

step 1.22: comparing longitude and latitude data of positions of sampling points of two adjacent track sections;

step 1.23: if the longitude and the latitude between two adjacent sampling points are unchanged, the middle position of the two sampling points is a breakpoint;

step 2.4: and carrying out track segmentation on the original track segment according to the calculated positions of the breakpoints.

Further, the specific steps of step 1.3 are as follows:

step 1.31, calculating the spatial similarity between the two sub-track segments, if the spatial similarity is not zero, calculating the time similarity between the two sub-track segments, otherwise, turning to step 1.33, wherein the formula for calculating the spatial similarity and the time similarity is as follows:

in the formula, L_c(TR_i,TR_j) Representing the spatial or temporal cumulative length of sub-track segments within two tracks, L (TR)_i) Representing sub-tracks TR_iTotal length of L (TR)_j) Representing sub-tracks TR_jTotal length of L (TR)_i)+L(TR_j)-L_c(TR_i,TR_j) The total length in space or time, i.e. the span, Sim (TR), of the two sub-track segments is indicated_i,TR_j) Representing spatial or temporal similarity between two sub-trajectory segments;

step 1.32, if the time similarity is not zero, calculating the space-time similarity between the two sub-track segments, otherwise, turning to step 1.33, and calculating the space-time similarity according to the formula:

STSim(TR_i,TR_j)＝SSim(TR_i,TR_j)×TSim(TR_i,TR_j)；

in the formula, SSim (TR)_i,TR_j) The spatial similarity, TSim (TR), between two sub-track segments is shown_i,TR_j) Shown is the temporal similarity, STSim (TR) between the two sub-tracks_i,TR_j) Representing the calculated space-time similarity measurement of the two sub-track segments;

step 1.33, calculating the space-time distance between the two sub-tracks, wherein the calculation method comprises the following steps:

STDist(TR_i,TR_j)＝1-STSim(TR_i,TR_j)；

in the formula, STSim (TR)_i,TR_j) Shown is a spatio-temporal similarity metric, STDist (TR) between two sub-trajectory segments_i,TR_j) The spatiotemporal distance between two sub-trajectory segments is represented.

Further, the specific steps of step 2 are as follows:

step 2.1, calculating the neighbor scale change of the sampling points on each track segment according to the space-time similarity, the space-time distance, the neighbor scale evolution algorithm and the DBSCAN algorithm of the sub-tracks;

step 2.2, calculating the distance between each sampling point on the track segment and other sampling points, marking the sampling point with the maximum distance to one sampling point as max, marking the sampling point with the minimum distance to the sampling point as min, if max is more than 2min, dividing the sampling point into a vibration object set, otherwise, dividing the sampling point into a stable object set;

step 2.3, initializing the Cluster _ id of the clusters in the stable object set and the oscillation object set to be 1, and defaulting the Cluster number of the nodes in the stable object set to be 0;

2.4, randomly selecting a core object v with a cluster number of 0 in the stable object set, and searching an object set Reach with reachable density in a breadth-first mode;

2.5, searching a Core object set Core in the object set Reach, and searching the minimum Cluster number Min _ Cluster in the Core object set Core;

step 2.6, if Min _ Cluster is 0, marking the Cluster numbers of the object set Reach and the core object v as Cluster _ id, and if not, searching the object set Connect connected with the object set Reach and the core object v in density, and marking the Cluster numbers of the object set Reach, the object set Connect connected with the density and the core object v as Min _ Cluster, namely clustering to obtain a Cluster;

step 2.7, judging whether the core object v still exists in the stable set object, if so, returning to the step 2.3, otherwise, obtaining all the class clusters, and performing the step 2.8;

and 2.8, distinguishing boundary points and noise points in the Oscillation object set oscillography, and distributing the boundary points to different clusters in the class clusters.

Further, the specific steps of step 3 are as follows:

step 3.1, counting the number n of clusters obtained by clustering and the number m of all track segments;

step 3.2, making p equal to m/n; step 3.3, if the number of the track sections contained in the cluster obtained by clustering is more than p, marking the cluster as a significant cluster, otherwise, marking the cluster as a non-significant cluster;

step 3.4: selecting a significant cluster C from the clustering results, and setting the starting point of the track segment contained in the significant cluster C as a point set K;

step 3.5, randomly selecting a breakpoint b from the point set K, combining other breakpoints and the breakpoint b in sequence to form an expandable point set Q, and if the added breakpoint b causes that the minimum circumscribed circle radius of the point set Q is larger than a pre-specified threshold β, deleting the breakpoint b from the point set Q;

step 3.6, traversing all points of the point set K, and if the distribution of the broken point number contained in the point set Q is more than a threshold value α, marking the point set Q as a staying spot;

step 3.7: repeating the step 3.5 to the step 3.6 until all candidate stay spots in the significant cluster C are generated;

step 3.8: and repeating the steps 3.4-3.7 until the complete significant cluster is traversed.

Further, the specific steps of step 4 are as follows:

step 4.1, calculating the stay heat degree information corresponding to each stay spot, wherein the calculation method comprises the following steps:

wherein h is_spotTo retain the heat of the spot, n_subtraNumber of track segments included for staying spots, n_traIndicating the number of traces that the dwell spot contains, β being a factor;

4.2, obtaining a high-heat-degree area of the staying spots from the staying heat-degree information;

and 4.3, obtaining a hot spot area in the road network according to the area where the high-heat stay spots are located.

In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:

1. the road hot spot area mining method provided by the invention combines the track space-time similarity measurement under the grid space and the optimized DBSCAN track clustering method, and better overcomes the defects that the distance between two tracks in a road network cannot be accurately expressed by the traditional European coordinates and the traditional DBSCAN clustering needs to manually input related parameters in advance;

2. the method for representing the track sequence by adopting the grid space coordinates overcomes the confusion that the space-time similarity between the tracks cannot be accurately calculated due to the deviation of the track sampling points caused by the network environment, sampling equipment and the like;

3. the method for mining the road hot spot area based on the vehicle track has the best effect on the track data with high sampling frequency, can save the storage space overhead of the track data, and can improve the execution efficiency of the whole system;

4. the method for obtaining the track space-time similarity by multiplying the track time similarity and the spatial similarity can greatly improve the calculation efficiency of calculating the track space-time similarity and has quicker response time.

Drawings

FIG. 1 is a sub-flow diagram of the computation of trajectory spatiotemporal similarity and spatiotemporal distance in the present invention;

FIG. 2 is a flow chart of a DBSCAN algorithm based on dynamic neighbor optimization in the present invention;

FIG. 3 is a sub-flowchart of the hot spot area mining based on clustering results in the present invention;

fig. 4 is a distribution condition of the salient clusters in step 5 of the present invention, wherein the road segments covered by the black areas are the salient cluster aggregation areas;

FIG. 5 is a diagram illustrating the distribution of hot spots in step 7, wherein the black spots are high heat spots, and the gray spots are normal heat spots.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a road hot spot area mining method. The mining effect of the road hot spot area can be accurately and effectively improved by carrying out dynamic neighbor optimization DBSCAN clustering on the vehicle track and calculating the stay spot heat. The dynamic neighbor-based DBSCAN clustering algorithm overcomes the defect that the clustering result is greatly influenced by manually input parameter values. And the distribution condition of the hot spot area can be more accurately described by calculating the heat information of the staying spots.

A road network hot spot area mining method comprises the following steps:

step 1, carrying out track segmentation on all track segments, and calculating the space-time similarity and space-time distance between two segmented sub-track segments; the method comprises the following specific steps:

step 1.1, dividing a dynamic grid space for a space area where all track sections are located; the method comprises the following specific steps:

Step 1.2, carrying out track segmentation on a track sequence in a grid space according to a breakpoint; the method comprises the following specific steps:

And 1.3, calculating the space-time similarity and space-time distance between the two sub-track sections after the track segmentation. Step 1.31, calculating the spatial similarity between the two sub-track segments, if the spatial similarity is not zero, calculating the time similarity between the two sub-track segments, otherwise, turning to step 1.33, wherein the formula for calculating the spatial similarity and the time similarity is as follows:

in the formula, L_c(TR_i,TR_j) Representing the spatial or temporal cumulative length of sub-track segments within two tracks, L (TR)_i) Representing sub-tracks TR_iTotal length of L (TR)_j) Representing sub-tracks TR_jTotal length of L (TR)_i)+L(TR_j)-L_c(TR_i,TR_j) The total length in space or time, i.e. the span, Sim (TR), of the two sub-track segments is indicated_i,TR_j) Representing two sub-tracksSpatial or temporal similarity between traces;

STSim(TR_i,TR_j)＝SSim(TR_i,TR_j)×TSim(TR_i,TR_j)；

STDist(TR_i,TR_j)＝1-STSim(TR_i,TR_j)；

Step 2, performing clustering calculation on all track segment data in the grid space according to the space-time similarity and space-time distance of the sub-tracks and a dynamic neighbor-based DBSCAN algorithm; the method comprises the following specific steps:

the method comprises the following specific steps:

2.4, randomly selecting a core object v with a cluster number of 0 in the stable object set, and searching the object set Reach with the reachable density preferentially, wherein the standard of whether the density is reachable or not is whether reachable paths exist among the objects, and if yes, the objects are reachable, and if not, the objects are unreachable;

Step 3, selecting a significant cluster set from the cluster calculated by clustering, and extracting staying spots from the significant cluster set; the method comprises the following specific steps:

Step 4, obtaining a high-heat-degree area of the stay spots according to the number of the track sections carried by the stay spots, and obtaining a hot spot area in the road network in the area where the high-heat-degree stay spots are located; the method comprises the following specific steps:

wherein h is_spotTo retain the heat of the spot, n_subtraNumber of track segments included for staying spots, n_traIndicating the number of traces contained in the dwell spot, β is a coefficient set during the test

Wherein any two track segments, whether identical or not, are different track segments, but if the two track segments are identical, they are the same track. I.e. the number of track segments is greater than (if there are identical tracks) or equal to (if all track segments are not identical) the number of tracks.

Compared with the prior art, the road hot spot area mining method provided by the invention combines the track space-time similarity measurement under the grid space and the optimized DBSCAN track clustering method, and better overcomes the defects that the distance between two tracks in a road network cannot be accurately expressed by the traditional European coordinates and the traditional DBSCAN clustering needs to manually input related parameters in advance. Meanwhile, the method for representing the track sequence by adopting the grid space coordinates overcomes the problem that the space-time similarity between the tracks cannot be accurately calculated due to the deviation of the track sampling points caused by the network environment, sampling equipment and the like.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A road network hot spot region mining method is characterized by comprising the following steps:

step 4, obtaining a high-heat-degree area of the stay spots according to the number of the track sections carried by the stay spots, and obtaining a hot spot area in the road network in the area where the high-heat-degree stay spots are located;

the specific steps of the step 1 are as follows:

step 1.3, calculating the space-time similarity and space-time distance between two sub-track sections after track segmentation;

the specific steps of step 1.1 are as follows:

step 1.13, taking the average distance as the size of a grid space, and performing dynamic grid space division on the minimum external rectangle;

the specific steps of step 1.2 are as follows:

step 1.24: carrying out track segmentation on the original track segment according to the calculated positions of all the breakpoints;

the specific steps of step 1.3 are as follows:

in the formula, L_c(TR_i,TR_j) Representing the spatial or temporal cumulative length of sub-track segments within two tracks, L (TR)_i) Representing sub-tracks TR_iTotal length of L (TR)_j) Representing sub-tracks TR_jTotal length of L (TR)_i)+L(TR_j)-L_c(TR_i,TR_j) Representing the sum of space or time of two sub-track segmentsLength, i.e. span, Sim (TR)_i,TR_j) Representing spatial or temporal similarity between two sub-trajectory segments;

step 1.32, if the time similarity is not zero, calculating the space-time similarity between the two sub-track segments, otherwise, turning to step 1.33, wherein the formula for calculating the space-time similarity is as follows:

STSim(TR_i,TR_j)＝SSim(TR_i,TR_j)×TSim(TR_i,TR_j)；

STDist(TR_i,TR_j)＝1-STSim(TR_i,TR_j)；

in the formula, STSim (TR)_i,TR_j) Shown is a spatio-temporal similarity metric, STDist (TR) between two sub-trajectory segments_i,TR_j) Representing the spatiotemporal distance between two sub-trajectory segments;

the specific steps of the step 2 are as follows:

step 2.8, distinguishing boundary points and noise points in the Oscillation object set oscillography, and distributing the boundary points to different clusters in the cluster class;

the specific steps of the step 3 are as follows:

step 3.8: repeating the step 3.4 to the step 3.7 until the complete salient cluster is traversed;

the specific steps of the step 4 are as follows: