CN113516309A - OD flow direction clustering method based on multi-path graph cutting rule and ant colony optimization - Google Patents

OD flow direction clustering method based on multi-path graph cutting rule and ant colony optimization Download PDF

Info

Publication number
CN113516309A
CN113516309A CN202110782636.6A CN202110782636A CN113516309A CN 113516309 A CN113516309 A CN 113516309A CN 202110782636 A CN202110782636 A CN 202110782636A CN 113516309 A CN113516309 A CN 113516309A
Authority
CN
China
Prior art keywords
flow direction
node
clustering
connected components
pheromone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110782636.6A
Other languages
Chinese (zh)
Other versions
CN113516309B (en
Inventor
邬群勇
张晗
朱秋圳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202110782636.6A priority Critical patent/CN113516309B/en
Publication of CN113516309A publication Critical patent/CN113516309A/en
Application granted granted Critical
Publication of CN113516309B publication Critical patent/CN113516309B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Software Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Game Theory and Decision Science (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Complex Calculations (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to an OD flow direction clustering method based on a multi-path graph cutting rule and ant colony optimization, which is characterized in that a flow direction end point POI is utilized to construct a theme distribution model, the flow direction space-time semantic similarity is calculated, an undirected graph complex network and an initial pheromone matrix are constructed, all connected components of the network are extracted, connected components to be clustered are identified, the connected components to be clustered are clustered by adopting a multi-process parallel mode based on the multi-path graph cutting rule and the ant colony optimization, and one connected component is clustered by one process. And summarizing the clustering results of the processes in the step to obtain the final clustering result. The invention organically combines the idea of an undirected graph complex network with a clustering algorithm, simplifies the complex network by adopting a Gaussian kernel function, and realizes automatic noise identification by utilizing graph connected components. The invention improves the heuristic function based on the multi-path graph cutting rule, screens the ant colony initial nodes by using the betweenness centrality based on the complex network thought, and effectively improves the clustering effect.

Description

OD flow direction clustering method based on multi-path graph cutting rule and ant colony optimization
Technical Field
The invention relates to the field of urban traffic data mining analysis, in particular to an OD flow direction clustering method based on a multi-path graph cutting rule and ant colony optimization.
Background
The traffic flow is an important component in an urban comprehensive system, contains abundant potential information, and reflects spatial distribution rules, regional association characteristics, resident travel characteristics and the like of the city to a certain extent. Therefore, the characteristics of traffic flow direction data are mined and analyzed, and the method has important significance for exploring city potential laws and providing suggestions for city management.
The clustering algorithm is a method for mining traffic flow direction data, belongs to an unsupervised learning algorithm, and is used for dividing flow directions with the same characteristics into the same cluster class through clustering to discover the common characteristics and the hidden characteristics of the data. And OD flow direction data clustering, namely clustering aiming at track points and clustering aiming at OD flow direction integration according to different clustering objects. Clustering based on trace points mainly includes: a spatial point clustering algorithm based on the number of shared neighbors and a spatial data point clustering algorithm based on a traffic grid. Clustering for the OD flow ensemble mainly includes: there are clustering methods based on scan statistics, arbitrary shape stream clustering methods based on stream density domain decomposition, stream clustering based on minimum spanning tree and optimal segmentation, and the like. The common OD flow direction clustering method usually ignores the overall flow direction attribute, the overall clustering effect needs to be improved, and the measurement of flow direction semantic information is lacked.
The spectral clustering algorithm is based on the graph theory idea, the clustering problem is converted into the segmentation problem of an undirected graph, and clustering is realized by optimizing a graph cutting criterion. At present, the mainstream algorithm is based on iteration bipartite graph, and compared with the bipartite graph criterion, the multipath graph criterion is more in line with the actual situation and is more detailed and objective. The ant colony algorithm is a heuristic algorithm for simulating ant behaviors, realizes mutual cooperation by a plurality of independent ants through pheromone accumulation so as to show colony intelligence, has the characteristics of heuristic search, distribution calculation, information positive feedback and the like, realizes global optimization of complex problems, and has important significance for NP problem solution.
Disclosure of Invention
In view of this, the present invention aims to provide an OD flow direction clustering method based on a multiple tangent diagram criterion and ant colony optimization, which improves a heuristic function based on the multiple tangent diagram criterion, and effectively improves a clustering effect by using betweenness centrality to screen ant colony initial nodes based on a complex network concept.
The invention is realized by adopting the following scheme: an OD flow direction clustering method based on a multiple tangent diagram criterion and ant colony optimization comprises the following steps:
step S1, removing repeated values, error values and meaningless values in OD flow data and POI (point of interest) data, constructing an OD flow library by using MongoDB, and establishing a spatial index by using 2dsphere Indexes;
step S2, establishing a flow direction terminal buffer area, selecting POI points in the buffer area, calling a python genesis tool library to construct a theme distribution model based on POI data, calculating semantic similarity on the basis, and calculating OD flow direction space and time similarity to finally obtain flow direction space-time semantic similarity;
s3, constructing an OD flow direction initial undirected graph complex network based on the space-time semantic similarity, extracting all connected components, and identifying noise and connected components to be clustered by using the connected components;
s4, designing and improving a heuristic function based on a multi-path graph cutting rule, and clustering connected components to be clustered by adopting a multi-process parallel mode for the connected components to be clustered by one process in combination with the positive feedback function of the ant colony pheromone;
and S5, summarizing the clustering results of the processes in the step S4 to obtain a final clustering result.
Further, the step S2 specifically includes the following steps:
step S21: establishing a circular buffer area with the radius of 250 meters according to the flow direction end point, and searching all POI points in the buffer area;
step S22: summarizing the type field value of each POI point corresponding to each flow direction into a document, wherein the document is the POI semantic document corresponding to the flow direction, summarizing all the semantic documents in the flow direction to establish a corpus, calling a python generative tool library by using the corpus to train an LDA theme distribution model, training to obtain corpus-theme distribution, and predicting the flow direction corpus input model to obtain theme-flow direction, namely flow direction theme probability distribution;
step S23: calculating JS divergence according to the theme probability distribution, taking the JS divergence as semantic similarity measurement, and calculating the semantic similarity of the flow direction i and the flow direction j according to the following formula:
Figure BDA0003157401760000031
Figure BDA0003157401760000032
wherein ,Pi、PjTopic probability distributions of flow directions i and j, respectively; pi(x)、Pj(x) Respectively the topic probability distribution values of the X topics in the flow direction i, j;
step S24, calculating spatial similarity sim of flow directiondisAnd time similarity simt
Step S25, mapping the flow direction semantics, time and space similarity by using a Gaussian kernel function to obtain the OD flow direction space-time semantics similarity, wherein the calculation formula is as follows:
Figure BDA0003157401760000041
further, the step S3 specifically includes the following steps:
step S31, taking the flow direction as a network node, taking the space-time semantic similarity of the OD flow directions between every two flow directions as the weight of an edge, using NetworkX to establish an undirected graph complex network, and extracting all connected components;
step S32, dividing the components according to the node number of the connected components; noise components classified as less than a threshold; and if the value is larger than the threshold value, classifying the connected components to be clustered.
Further, the step S4 specifically includes the following steps:
s41, clustering one connected component by one process in a multi-process parallel mode, and executing steps S42-S48 by each process;
step S42, establishing an initial pheromone matrix with dimension N x N according to the number of nodes of the connected components, wherein the initial pheromone value is 1;
step S43, calculating the betweenness centrality of each node in the connected components, sorting according to the centrality value, selecting the first K points as the initial positions of ants, and starting to search; the centrality calculation formula is as follows:
Figure BDA0003157401760000042
wherein ,
Figure BDA0003157401760000043
representing the number of paths which pass through the node i and are the shortest paths; gstRepresents the number of shortest paths connecting s and t;
step S44, acquiring the adjacent node of the ant, judging whether the adjacent node is in the taboo list, if not, adding the node into the alternative node list;
step S45, traversing the alternative node list, and calculating the heuristic function n improved based on the multi-path graph cutting criterionij(t) and binding pheromoneij(t) calculating the probability of the node being selected
Figure BDA0003157401760000051
At time t, the ant k at node i selects the probability of node j as the next node
Figure BDA0003157401760000052
The calculation formula of (a) is as follows:
Figure BDA0003157401760000053
Figure BDA0003157401760000054
Figure BDA0003157401760000055
wherein the pherij(t) is a pheromone factor; n isij(t) is a heuristic function; simijFlow direction spatiotemporal semantic similarity; MNCut(k,ij)(t) selecting a multipath graph cutting factor of the j node for the time t; cut (A)(k),V-A(k)) Is A(k)The sum of the weights of the adjacent edges of the nodes in the class and other nodes in the class; assoc (A)(k)) Is A(k)The sum of the weights of adjacent edges among all nodes is similar;
step S46, selecting the next node by adopting a roulette mode according to the selection probability of each alternative node, and adding the selected node into a taboo table;
and step S47, updating pheromone according to the nodes selected by the ants, wherein the updating equation of the pheromone is as follows:
Figure BDA0003157401760000056
wherein epsilon is the volatilization rate of pheromone;
Figure BDA0003157401760000057
the pheromone concentration of the kth ant on the side of ij at the time t is increased; ck(t) is the spatiotemporal semantic similarity of the edge passed by the kth ant at the time t;
step S48, when all ants have no optional node, the iteration is finished, and the next iteration is started; and ending the iteration until the result converges or the maximum iteration number is reached.
Further, the step S5 specifically includes the following steps:
s51, summarizing and combining the results obtained by clustering the processes in the S4 into a result list;
and step S52, traversing the result list and uniformly outputting the category numbers to avoid the repetition of the category numbers.
Compared with the prior art, the invention has the following beneficial effects:
1. the method effectively extracts flow direction semantic information, calculates the space-time semantic similarity by combining the time similarity and the space similarity, and more comprehensively measures the flow direction similarity.
2. The invention organically combines the idea of an undirected graph complex network with a clustering algorithm, simplifies the complex network by adopting a Gaussian kernel function, and realizes automatic noise identification by utilizing graph connected components.
3. The invention improves the heuristic function based on the multi-path graph cutting rule, screens the ant colony initial nodes by using the betweenness centrality based on the complex network thought, and effectively improves the clustering effect.
Drawings
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a part of connected components of the established complex network according to the embodiment of the present invention.
Fig. 3 is an original flow diagram of an embodiment of the present invention.
Fig. 4 shows a partial clustering result according to an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the present embodiment provides an OD flow direction clustering method based on a multiple tangent diagram criterion and ant colony optimization, including the following steps:
step S1, removing repeated values, error values and meaningless values in OD flow data and POI (point of interest) data, constructing an OD flow library by using MongoDB, and establishing a spatial index by using 2dsphere Indexes;
step S2, establishing a flow direction terminal buffer area, selecting POI points in the buffer area, calling a python genesis tool library to construct a theme distribution model based on POI data, calculating semantic similarity on the basis, and calculating OD flow direction space and time similarity to finally obtain flow direction space-time semantic similarity;
s3, constructing an OD flow direction initial undirected graph complex network based on the space-time semantic similarity, extracting all connected components, and identifying noise and connected components to be clustered by using the connected components;
s4, designing and improving a heuristic function based on a multi-path graph cutting rule, and clustering connected components to be clustered by adopting a multi-process parallel mode for the connected components to be clustered by one process in combination with the positive feedback function of the ant colony pheromone;
and S5, summarizing the clustering results of the processes in the step S4 to obtain a final clustering result.
In this embodiment, the step S2 specifically includes the following steps:
step S21: establishing a circular buffer area with the radius of 250 meters according to the flow direction end point, and searching all POI points in the buffer area;
step S22: summarizing the type field value of each POI point corresponding to each flow direction into a document, wherein the document is the POI semantic document corresponding to the flow direction, summarizing all the semantic documents in the flow direction to establish a corpus, calling a python generative tool library by using the corpus to train an LDA theme distribution model, training to obtain corpus-theme distribution, and predicting the flow direction corpus input model to obtain theme-flow direction, namely flow direction theme probability distribution;
step S23: calculating JS divergence according to the theme probability distribution, taking the JS divergence as semantic similarity measurement, and calculating the semantic similarity of the flow direction i and the flow direction j according to the following formula:
Figure BDA0003157401760000081
Figure BDA0003157401760000082
wherein ,Pi、PjTopic probability distributions of flow directions i and j, respectively; pi(x)、Pj(x) Respectively the topic probability distribution values of the X topics in the flow direction i, j;
step S24, calculating spatial similarity sim of flow directiondisAnd time similarity simt
Step S25, mapping the flow direction semantics, time and space similarity by using a Gaussian kernel function to obtain the OD flow direction space-time semantics similarity, wherein the calculation formula is as follows:
Figure BDA0003157401760000091
in this embodiment, the step S3 specifically includes the following steps:
step S31, taking the flow direction as a network node, taking the space-time semantic similarity of the OD flow directions between every two flow directions as the weight of an edge, using NetworkX to establish an undirected graph complex network, and extracting all connected components;
step S32, dividing the components according to the node number of the connected components; noise components classified as less than a threshold; and if the value is larger than the threshold value, classifying the connected components to be clustered. In the present embodiment, the threshold is 2.
In this embodiment, the step S4 specifically includes the following steps:
and S41, clustering one connected component by one process in a multi-process parallel mode, and executing the steps S42-S48 by each process.
Step S42, establishing an initial pheromone matrix with dimension N x N according to the number of nodes of the connected components, wherein the initial pheromone value is 1;
step S43, calculating the betweenness centrality of each node in the connected components, sorting according to the centrality value, selecting the first K points as the initial positions of ants, and starting to search; the centrality calculation formula is as follows:
Figure BDA0003157401760000092
wherein ,
Figure BDA0003157401760000101
representing the number of paths which pass through the node i and are the shortest paths; gstRepresents the number of shortest paths connecting s and t;
step S44, acquiring the adjacent node of the ant, judging whether the adjacent node is in the taboo list, if not, adding the node into the alternative node list;
step S45, traversing the alternative node list, and calculating the heuristic function n improved based on the multi-path graph cutting criterionij(t) and binding pheromoneij(t) calculating the probability of the node being selected
Figure BDA0003157401760000102
At time t, the ant k at node i selects the probability of node j as the next node
Figure BDA0003157401760000103
The calculation formula of (a) is as follows:
Figure BDA0003157401760000104
Figure BDA0003157401760000105
Figure BDA0003157401760000106
wherein the pherij(t) is a pheromone factor; n isij(t) is a heuristic function; simijFlow direction spatiotemporal semantic similarity; MNCut(k,ij)(t) selecting a multipath graph cutting factor of the j node for the time t; cut (A)(k),V-A(k)) Is A(k)The sum of the weights of the adjacent edges of the nodes in the class and other nodes in the class; assoc (A)(k)) Is A(k)The sum of the weights of adjacent edges among all nodes is similar;
step S46, selecting the next node by adopting a roulette mode according to the selection probability of each alternative node, and adding the selected node into a taboo table;
and step S47, updating pheromone according to the nodes selected by the ants, wherein the updating equation of the pheromone is as follows:
Figure BDA0003157401760000111
wherein epsilon is the volatilization rate of pheromone;
Figure BDA0003157401760000112
the pheromone concentration of the kth ant on the side of ij at the time t is increased; ck(t) is the spatiotemporal semantic similarity of the edge passed by the kth ant at the time t;
step S48, when all ants have no optional node, the iteration is finished, and the next iteration is started; and ending the iteration until the result converges or the maximum iteration number is reached.
In this embodiment, the step S5 specifically includes the following steps:
s51, summarizing and combining the results obtained by clustering the processes in the S4 into a result list;
and step S52, traversing the result list and uniformly outputting the category numbers to avoid the repetition of the category numbers.
Preferably, in the embodiment, a topic distribution model is constructed by using a flow direction endpoint POI, flow direction spatiotemporal semantic similarity is calculated, an undirected graph complex network and an initial pheromone matrix are constructed, all connected components of the network are extracted, connected components to be clustered are identified, and the connected components to be clustered are clustered in parallel by adopting a multi-thread/multi-process ant colony algorithm based on a multi-path graph cutting criterion and ant colony optimization, so that OD flow direction clustering is realized.
Preferably, in this embodiment, taking OD flow direction data of taxis in the city of xiamen and POI data in the city of xiamen as an example, relevant parameters of the OD flow direction clustering method based on the multiple-way graph cutting criterion and ant colony optimization in this embodiment are shown in table 1:
Figure BDA0003157401760000113
Figure BDA0003157401760000121
the method specifically comprises the following steps:
step S1, removing repeated values, error values and meaningless values in OD flow data and POI (point of interest) data, constructing an OD flow library by using MongoDB, and establishing a spatial index by using 2dsphere Indexes;
step S2, establishing a flow direction terminal buffer area, selecting POI in the buffer area, calling a python genesis tool library to construct a theme distribution model based on POI data, calculating semantic similarity on the basis, calculating OD flow direction space and time similarity, and finally obtaining flow direction space-time semantic similarity;
s3, constructing an OD flow direction initial undirected graph complex network based on the space-time semantic similarity, extracting all connected components, and identifying noise and connected components to be clustered by using the connected components;
s4, designing and improving a heuristic function based on a multi-path graph cutting rule, and clustering connected components to be clustered by adopting a multi-process parallel mode for the connected components to be clustered by one process in combination with the positive feedback function of the ant colony pheromone;
and S5, summarizing the clustering results of the processes in the step S4 to obtain a final clustering result.
The step S2 includes the following steps:
step S21: and establishing a circular buffer area with the radius of 250 meters according to the flow direction end point, and searching all POI points in the buffer area.
Step S22: according to the flow direction number, summarizing the type field value of each POI point corresponding to the flow direction into a document, wherein the document is the POI semantic document corresponding to the flow direction, summarizing all the semantic documents in the flow direction to establish a corpus, training an LDA topic distribution model by using the corpus to obtain the corpus-topic distribution of the flow direction of the building OD, and then inputting the flow direction corpus into the model to predict to obtain the topic-flow direction, namely the topic probability distribution of the flow direction of the building OD. The embodiment performs model training according to the characteristics of data, and obtains ten types of relatively representative travel subjects such as work commute, travel, shopping travel, leisure entertainment, hospitalizing travel and the like. As shown in the following table:
Figure BDA0003157401760000131
step S23: calculating JS divergence according to the theme probability distribution, taking the JS divergence as semantic similarity measurement, and calculating the semantic similarity of the flow direction i and the flow direction j according to the following formula:
Figure BDA0003157401760000132
Figure BDA0003157401760000133
wherein ,Pi、PjTopic probability distributions of flow directions i and j, respectively; pi(x)、Pj(x) Respectively the topic probability distribution values of the X topics in the flow direction i, j;
step S24 spatial similarity S of flow directionimdisAnd time similarity simtThe calculations were performed with reference to the methods mentioned in the prior art;
step S25, mapping the flow direction semantics, time and space similarity by using a Gaussian kernel function to obtain the OD flow direction space-time semantics similarity, wherein the calculation formula is as follows:
Figure BDA0003157401760000141
as shown in fig. 2, the specific steps of step S3 are:
and step S31, taking the flow direction as a network node, taking the space-time semantic similarity of the OD flow directions between every two flow directions as the weight of an edge, establishing an undirected graph complex network by using NetworkX, and calculating all connected components.
And step S32, dividing the components according to the node number of the connected components. Noise components classified as less than threshold 2; and if the value is larger than the threshold value, classifying the connected components to be clustered.
As shown in fig. 3 and 4, step S4 specifically includes the following steps:
and S41, clustering one connected component by one process in a multi-process parallel mode, and executing the steps S42-S48 by each process.
Step S42, establishing an initial pheromone matrix with dimension N x N according to the number of nodes of the connected components, wherein the initial pheromone value is 1;
step S43, calculating the betweenness centrality of each node in the connected components, sorting according to the centrality value, selecting the first K points as the initial positions of ants, and starting to search; the centrality calculation formula is as follows:
Figure BDA0003157401760000151
wherein ,
Figure BDA0003157401760000152
representing the number of paths which pass through the node i and are the shortest paths; gstRepresenting the shortest path connecting s and tThe number of diameters;
step S44, acquiring the adjacent node of the ant, judging whether the adjacent node is in the taboo list, if not, adding the node into the alternative node list;
step S45, traversing the alternative node list, and calculating the heuristic function n improved based on the multi-path graph cutting criterionij(t) and binding pheromoneij(t) calculating the probability of the node being selected
Figure BDA0003157401760000153
At time t, the ant k at node i selects the probability of node j as the next node
Figure BDA0003157401760000154
The calculation formula of (a) is as follows:
Figure BDA0003157401760000155
Figure BDA0003157401760000156
Figure BDA0003157401760000157
wherein the pherij(t) is a pheromone factor; n isij(t) is a heuristic function; simijFlow direction spatiotemporal semantic similarity; MNCut(k,ij)(t) selecting a multipath graph cutting factor of the j node for the time t; cut (A)(k),V-A(k)) Is A(k)The sum of the weights of the adjacent edges of the nodes in the class and other nodes in the class; assoc (A)(k)) Is A(k)The sum of the weights of adjacent edges among all nodes is similar;
step S46, selecting the next node by adopting a roulette mode according to the selection probability of each alternative node, and adding the selected node into a taboo table;
and step S47, updating pheromone according to the nodes selected by the ants, wherein the updating equation of the pheromone is as follows:
Figure BDA0003157401760000161
wherein epsilon is the volatilization rate of pheromone;
Figure BDA0003157401760000162
the pheromone concentration of the kth ant on the side of ij at the time t is increased; ck(t) is the spatiotemporal semantic similarity of the edge passed by the kth ant at the time t;
step S48, when all ants have no optional node, the iteration is finished, and the next iteration is started; and ending iteration until the result converges or the maximum iteration number is reached, and clustering the results of the trip part as shown in fig. 4.
Step S5 specifically includes:
and S51, summarizing the results obtained by clustering the processes in the step S4, and combining the results into a result list.
And step S52, traversing the result list, and uniformly outputting the category numbers to avoid the repetition of the category numbers. The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims (5)

1. An OD flow direction clustering method based on a multiple tangent diagram criterion and ant colony optimization is characterized in that: the method comprises the following steps:
step S1, removing repeated values, error values and meaningless values in the OD flow direction data and the POI data, constructing an OD flow direction library by using MongoDB, and establishing a spatial index by using 2dsphere Indexes;
step S2, establishing a flow direction terminal buffer area, selecting POI points in the buffer area, calling a python genesis tool library to construct a theme distribution model based on POI data, calculating semantic similarity on the basis, and calculating OD flow direction space and time similarity to finally obtain flow direction space-time semantic similarity;
s3, constructing an OD flow direction initial undirected graph complex network based on the space-time semantic similarity, extracting all connected components, and identifying noise and connected components to be clustered by using the connected components;
s4, designing and improving a heuristic function based on a multi-path graph cutting rule, and clustering connected components to be clustered by adopting a multi-process parallel mode for the connected components to be clustered by one process in combination with the positive feedback function of the ant colony pheromone;
and S5, summarizing the clustering results of the processes in the step S4 to obtain a final clustering result.
2. The OD flow direction clustering method based on the multiple tangent graph criterion and ant colony optimization according to claim 1, characterized in that: the step S2 specifically includes the following steps:
step S21: establishing a circular buffer area with the radius of 250 meters according to the flow direction end point, and searching all POI points in the buffer area;
step S22: summarizing the type field value of each POI point corresponding to each flow direction into a document, wherein the document is the POI semantic document corresponding to the flow direction, summarizing all the semantic documents in the flow direction to establish a corpus, calling a python generative tool library by using the corpus to train an LDA theme distribution model, training to obtain corpus-theme distribution, and predicting the flow direction corpus input model to obtain theme-flow direction, namely flow direction theme probability distribution;
step S23: calculating JS divergence according to the theme probability distribution, taking the JS divergence as semantic similarity measurement, and calculating the semantic similarity of the flow direction i and the flow direction j according to the following formula:
Figure FDA0003157401750000021
Figure FDA0003157401750000022
wherein ,Pi、PjRespectively is the flow directioni. A topic probability distribution of j; pi(x)、Pj(x) Respectively the topic probability distribution values of the X topics in the flow direction i, j;
step S24, calculating spatial similarity sim of flow directiondisAnd time similarity simt
Step S25, mapping the flow direction semantics, time and space similarity by using a Gaussian kernel function to obtain the OD flow direction space-time semantics similarity, wherein the calculation formula is as follows:
Figure FDA0003157401750000023
3. the OD flow direction clustering method based on the multiple tangent graph criterion and ant colony optimization according to claim 1, characterized in that: the step S3 specifically includes the following steps:
step S31, taking the flow direction as a network node, taking the space-time semantic similarity of the OD flow directions between every two flow directions as the weight of an edge, using NetworkX to establish an undirected graph complex network, and extracting all connected components;
step S32, dividing the components according to the node number of the connected components; noise components classified as less than a threshold; and if the value is larger than the threshold value, classifying the connected components to be clustered.
4. The OD flow direction clustering method based on the multiple tangent graph criterion and ant colony optimization according to claim 1, characterized in that: the step S4 specifically includes the following steps:
s41, clustering one connected component by one process in a multi-process parallel mode, and executing steps S42-S48 by each process;
step S42, establishing an initial pheromone matrix with dimension N x N according to the number of nodes of the connected components, wherein the initial pheromone value is 1;
step S43, calculating the betweenness centrality of each node in the connected components, sorting according to the centrality value, selecting the first K points as the initial positions of ants, and starting to search; the centrality calculation formula is as follows:
Figure FDA0003157401750000031
wherein ,
Figure FDA0003157401750000032
representing the number of paths which pass through the node i and are the shortest paths; gstRepresents the number of shortest paths connecting s and t;
step S44, acquiring the adjacent node of the ant, judging whether the adjacent node is in the taboo list, if not, adding the node into the alternative node list;
step S45, traversing the alternative node list, and calculating the heuristic function n improved based on the multi-path graph cutting criterionij(t) and binding pheromoneij(t) calculating the probability of the node being selected
Figure FDA0003157401750000033
At time t, the ant k at node i selects the probability of node j as the next node
Figure FDA0003157401750000034
The calculation formula of (a) is as follows:
Figure FDA0003157401750000041
Figure FDA0003157401750000042
Figure FDA0003157401750000043
wherein the pherij(t) is a pheromone factor; n isij(t) is heuristicA function of formula; simijFlow direction spatiotemporal semantic similarity; MNCut(k,ij)(t) selecting a multipath graph cutting factor of the j node for the time t; cut (A)(k),V-A(k)) Is A(k)The sum of the weights of the adjacent edges of the nodes in the class and other nodes in the class; assoc (A)(k)) Is A(k)The sum of the weights of adjacent edges among all nodes is similar;
step S46, selecting the next node by adopting a roulette mode according to the selection probability of each alternative node, and adding the selected node into a taboo table;
and step S47, updating pheromone according to the nodes selected by the ants, wherein the updating equation of the pheromone is as follows:
Figure FDA0003157401750000044
wherein epsilon is the volatilization rate of pheromone;
Figure FDA0003157401750000045
the pheromone concentration of the kth ant on the side of ij at the time t is increased; ck(t) is the spatiotemporal semantic similarity of the edge passed by the kth ant at the time t;
step S48, when all ants have no optional node, the iteration is finished, and the next iteration is started; and ending the iteration until the result converges or the maximum iteration number is reached.
5. The OD flow direction clustering method based on the multiple tangent graph criterion and ant colony optimization according to claim 1, characterized in that: the step S5 specifically includes the following steps:
s51, summarizing and combining the results obtained by clustering the processes in the S4 into a result list;
and step S52, traversing the result list and uniformly outputting the category numbers to avoid the repetition of the category numbers.
CN202110782636.6A 2021-07-12 2021-07-12 OD flow direction clustering method based on multipath graph cutting criterion and ant colony optimization Active CN113516309B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110782636.6A CN113516309B (en) 2021-07-12 2021-07-12 OD flow direction clustering method based on multipath graph cutting criterion and ant colony optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110782636.6A CN113516309B (en) 2021-07-12 2021-07-12 OD flow direction clustering method based on multipath graph cutting criterion and ant colony optimization

Publications (2)

Publication Number Publication Date
CN113516309A true CN113516309A (en) 2021-10-19
CN113516309B CN113516309B (en) 2023-08-11

Family

ID=78066961

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110782636.6A Active CN113516309B (en) 2021-07-12 2021-07-12 OD flow direction clustering method based on multipath graph cutting criterion and ant colony optimization

Country Status (1)

Country Link
CN (1) CN113516309B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116342608A (en) * 2023-05-30 2023-06-27 首都医科大学宣武医院 Medical image-based stent adherence measurement method, device, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105118052A (en) * 2015-08-03 2015-12-02 福州大学 Novel omnidirectional M type cardiogram motion curve extraction method
WO2016095692A1 (en) * 2014-12-15 2016-06-23 江南大学 Method for improving ant colony optimization sensor-network cluster head
CN108320512A (en) * 2018-03-30 2018-07-24 江苏智通交通科技有限公司 Macroscopical road safety analytic unit choosing method based on Laplce's spectrum analysis
CN109993721A (en) * 2019-04-04 2019-07-09 电子科技大学成都学院 A kind of image enchancing method based on clustering algorithm and ant group algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016095692A1 (en) * 2014-12-15 2016-06-23 江南大学 Method for improving ant colony optimization sensor-network cluster head
CN105118052A (en) * 2015-08-03 2015-12-02 福州大学 Novel omnidirectional M type cardiogram motion curve extraction method
CN108320512A (en) * 2018-03-30 2018-07-24 江苏智通交通科技有限公司 Macroscopical road safety analytic unit choosing method based on Laplce's spectrum analysis
CN109993721A (en) * 2019-04-04 2019-07-09 电子科技大学成都学院 A kind of image enchancing method based on clustering algorithm and ant group algorithm

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
GUO X G: "An OD flow clustering method based on vector constraints:A case study for Beijing taxi origin-destination data", ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, no. 02 *
张晗: "基于LDA和优化蚁群的OD流向时空语义聚类算法", 地球信息科学学报, vol. 24, no. 05 *
王祖超;袁晓如;: "轨迹数据可视分析研究", 计算机辅助设计与图形学学报, no. 01 *
王立群;杨淑莹;安博;: "基于蚁群算法的多字符聚类识别", 天津理工大学学报, no. 05 *
邹小林;: "改进的判别割及其在图像分割中的应用", 计算机应用, no. 08 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116342608A (en) * 2023-05-30 2023-06-27 首都医科大学宣武医院 Medical image-based stent adherence measurement method, device, equipment and medium
CN116342608B (en) * 2023-05-30 2023-08-15 首都医科大学宣武医院 Medical image-based stent adherence measurement method, device, equipment and medium

Also Published As

Publication number Publication date
CN113516309B (en) 2023-08-11

Similar Documents

Publication Publication Date Title
Priyam et al. Comparative analysis of decision tree classification algorithms
CN106096727B (en) A kind of network model building method and device based on machine learning
KR102431549B1 (en) Causality recognition device and computer program therefor
JP5092165B2 (en) Data construction method and system
CN107330734B (en) Co-location mode and ontology-based business address selection method
CN104615608A (en) Data mining processing system and method
CN103593400A (en) Lightning activity data statistics method based on modified Apriori algorithm
US10387805B2 (en) System and method for ranking news feeds
CN105354305A (en) Online-rumor identification method and apparatus
CN112507699A (en) Remote supervision relation extraction method based on graph convolution network
CN103208039A (en) Method and device for evaluating software project risks
CN110275929B (en) Candidate road section screening method based on grid segmentation and grid segmentation method
CN111062520B (en) Hostname feature prediction method based on random forest algorithm
CN112311608B (en) Multilayer heterogeneous network space node characterization method
CN113177101B (en) User track identification method, device, equipment and storage medium
CN111897733A (en) Fuzzy test method and device based on minimum set coverage
CN113516309A (en) OD flow direction clustering method based on multi-path graph cutting rule and ant colony optimization
CN117436724A (en) Multi-source data visual analysis method and system based on smart city
KR101275834B1 (en) Method of miming Top-K important patterns
CN116910283A (en) Graph storage method and system for network behavior data
Mittal et al. A COMPARATIVE STUDY OF ASSOCIATION RULE MINING TECHNIQUES AND PREDICTIVE MINING APPROACHES FOR ASSOCIATION CLASSIFICATION.
Fränti et al. Averaging GPS segments competition 2019
CN109633748A (en) A kind of seismic properties preferred method based on improved adaptive GA-IAGA
CN115860434B (en) Vegetation restoration planning method and device based on soil moisture resource bearing capacity
CN117540223A (en) Social network public opinion propagation forwarding chain mining method and device based on AP algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant