CN113516309A - OD flow direction clustering method based on multi-path graph cutting rule and ant colony optimization - Google Patents
OD flow direction clustering method based on multi-path graph cutting rule and ant colony optimization Download PDFInfo
- Publication number
- CN113516309A CN113516309A CN202110782636.6A CN202110782636A CN113516309A CN 113516309 A CN113516309 A CN 113516309A CN 202110782636 A CN202110782636 A CN 202110782636A CN 113516309 A CN113516309 A CN 113516309A
- Authority
- CN
- China
- Prior art keywords
- flow direction
- node
- clustering
- connected components
- pheromone
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000005457 optimization Methods 0.000 title claims abstract description 16
- 239000003016 pheromone Substances 0.000 claims abstract description 39
- 238000009826 distribution Methods 0.000 claims abstract description 32
- 239000011159 matrix material Substances 0.000 claims abstract description 6
- 238000004364 calculation method Methods 0.000 claims description 14
- 241000257303 Hymenoptera Species 0.000 claims description 13
- 238000010586 diagram Methods 0.000 claims description 7
- 238000005259 measurement Methods 0.000 claims description 5
- 238000012549 training Methods 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 4
- 230000000694 effects Effects 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 18
- 230000011218 segmentation Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000029305 taxis Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Marketing (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Development Economics (AREA)
- Software Systems (AREA)
- Entrepreneurship & Innovation (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Game Theory and Decision Science (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Educational Administration (AREA)
- Primary Health Care (AREA)
- Complex Calculations (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to an OD flow direction clustering method based on a multi-path graph cutting rule and ant colony optimization, which is characterized in that a flow direction end point POI is utilized to construct a theme distribution model, the flow direction space-time semantic similarity is calculated, an undirected graph complex network and an initial pheromone matrix are constructed, all connected components of the network are extracted, connected components to be clustered are identified, the connected components to be clustered are clustered by adopting a multi-process parallel mode based on the multi-path graph cutting rule and the ant colony optimization, and one connected component is clustered by one process. And summarizing the clustering results of the processes in the step to obtain the final clustering result. The invention organically combines the idea of an undirected graph complex network with a clustering algorithm, simplifies the complex network by adopting a Gaussian kernel function, and realizes automatic noise identification by utilizing graph connected components. The invention improves the heuristic function based on the multi-path graph cutting rule, screens the ant colony initial nodes by using the betweenness centrality based on the complex network thought, and effectively improves the clustering effect.
Description
Technical Field
The invention relates to the field of urban traffic data mining analysis, in particular to an OD flow direction clustering method based on a multi-path graph cutting rule and ant colony optimization.
Background
The traffic flow is an important component in an urban comprehensive system, contains abundant potential information, and reflects spatial distribution rules, regional association characteristics, resident travel characteristics and the like of the city to a certain extent. Therefore, the characteristics of traffic flow direction data are mined and analyzed, and the method has important significance for exploring city potential laws and providing suggestions for city management.
The clustering algorithm is a method for mining traffic flow direction data, belongs to an unsupervised learning algorithm, and is used for dividing flow directions with the same characteristics into the same cluster class through clustering to discover the common characteristics and the hidden characteristics of the data. And OD flow direction data clustering, namely clustering aiming at track points and clustering aiming at OD flow direction integration according to different clustering objects. Clustering based on trace points mainly includes: a spatial point clustering algorithm based on the number of shared neighbors and a spatial data point clustering algorithm based on a traffic grid. Clustering for the OD flow ensemble mainly includes: there are clustering methods based on scan statistics, arbitrary shape stream clustering methods based on stream density domain decomposition, stream clustering based on minimum spanning tree and optimal segmentation, and the like. The common OD flow direction clustering method usually ignores the overall flow direction attribute, the overall clustering effect needs to be improved, and the measurement of flow direction semantic information is lacked.
The spectral clustering algorithm is based on the graph theory idea, the clustering problem is converted into the segmentation problem of an undirected graph, and clustering is realized by optimizing a graph cutting criterion. At present, the mainstream algorithm is based on iteration bipartite graph, and compared with the bipartite graph criterion, the multipath graph criterion is more in line with the actual situation and is more detailed and objective. The ant colony algorithm is a heuristic algorithm for simulating ant behaviors, realizes mutual cooperation by a plurality of independent ants through pheromone accumulation so as to show colony intelligence, has the characteristics of heuristic search, distribution calculation, information positive feedback and the like, realizes global optimization of complex problems, and has important significance for NP problem solution.
Disclosure of Invention
In view of this, the present invention aims to provide an OD flow direction clustering method based on a multiple tangent diagram criterion and ant colony optimization, which improves a heuristic function based on the multiple tangent diagram criterion, and effectively improves a clustering effect by using betweenness centrality to screen ant colony initial nodes based on a complex network concept.
The invention is realized by adopting the following scheme: an OD flow direction clustering method based on a multiple tangent diagram criterion and ant colony optimization comprises the following steps:
step S1, removing repeated values, error values and meaningless values in OD flow data and POI (point of interest) data, constructing an OD flow library by using MongoDB, and establishing a spatial index by using 2dsphere Indexes;
step S2, establishing a flow direction terminal buffer area, selecting POI points in the buffer area, calling a python genesis tool library to construct a theme distribution model based on POI data, calculating semantic similarity on the basis, and calculating OD flow direction space and time similarity to finally obtain flow direction space-time semantic similarity;
s3, constructing an OD flow direction initial undirected graph complex network based on the space-time semantic similarity, extracting all connected components, and identifying noise and connected components to be clustered by using the connected components;
s4, designing and improving a heuristic function based on a multi-path graph cutting rule, and clustering connected components to be clustered by adopting a multi-process parallel mode for the connected components to be clustered by one process in combination with the positive feedback function of the ant colony pheromone;
and S5, summarizing the clustering results of the processes in the step S4 to obtain a final clustering result.
Further, the step S2 specifically includes the following steps:
step S21: establishing a circular buffer area with the radius of 250 meters according to the flow direction end point, and searching all POI points in the buffer area;
step S22: summarizing the type field value of each POI point corresponding to each flow direction into a document, wherein the document is the POI semantic document corresponding to the flow direction, summarizing all the semantic documents in the flow direction to establish a corpus, calling a python generative tool library by using the corpus to train an LDA theme distribution model, training to obtain corpus-theme distribution, and predicting the flow direction corpus input model to obtain theme-flow direction, namely flow direction theme probability distribution;
step S23: calculating JS divergence according to the theme probability distribution, taking the JS divergence as semantic similarity measurement, and calculating the semantic similarity of the flow direction i and the flow direction j according to the following formula:
wherein ,Pi、PjTopic probability distributions of flow directions i and j, respectively; pi(x)、Pj(x) Respectively the topic probability distribution values of the X topics in the flow direction i, j;
step S24, calculating spatial similarity sim of flow directiondisAnd time similarity simt;
Step S25, mapping the flow direction semantics, time and space similarity by using a Gaussian kernel function to obtain the OD flow direction space-time semantics similarity, wherein the calculation formula is as follows:
further, the step S3 specifically includes the following steps:
step S31, taking the flow direction as a network node, taking the space-time semantic similarity of the OD flow directions between every two flow directions as the weight of an edge, using NetworkX to establish an undirected graph complex network, and extracting all connected components;
step S32, dividing the components according to the node number of the connected components; noise components classified as less than a threshold; and if the value is larger than the threshold value, classifying the connected components to be clustered.
Further, the step S4 specifically includes the following steps:
s41, clustering one connected component by one process in a multi-process parallel mode, and executing steps S42-S48 by each process;
step S42, establishing an initial pheromone matrix with dimension N x N according to the number of nodes of the connected components, wherein the initial pheromone value is 1;
step S43, calculating the betweenness centrality of each node in the connected components, sorting according to the centrality value, selecting the first K points as the initial positions of ants, and starting to search; the centrality calculation formula is as follows:
wherein ,representing the number of paths which pass through the node i and are the shortest paths; gstRepresents the number of shortest paths connecting s and t;
step S44, acquiring the adjacent node of the ant, judging whether the adjacent node is in the taboo list, if not, adding the node into the alternative node list;
step S45, traversing the alternative node list, and calculating the heuristic function n improved based on the multi-path graph cutting criterionij(t) and binding pheromoneij(t) calculating the probability of the node being selectedAt time t, the ant k at node i selects the probability of node j as the next nodeThe calculation formula of (a) is as follows:
wherein the pherij(t) is a pheromone factor; n isij(t) is a heuristic function; simijFlow direction spatiotemporal semantic similarity; MNCut(k,ij)(t) selecting a multipath graph cutting factor of the j node for the time t; cut (A)(k),V-A(k)) Is A(k)The sum of the weights of the adjacent edges of the nodes in the class and other nodes in the class; assoc (A)(k)) Is A(k)The sum of the weights of adjacent edges among all nodes is similar;
step S46, selecting the next node by adopting a roulette mode according to the selection probability of each alternative node, and adding the selected node into a taboo table;
and step S47, updating pheromone according to the nodes selected by the ants, wherein the updating equation of the pheromone is as follows:
wherein epsilon is the volatilization rate of pheromone;the pheromone concentration of the kth ant on the side of ij at the time t is increased; ck(t) is the spatiotemporal semantic similarity of the edge passed by the kth ant at the time t;
step S48, when all ants have no optional node, the iteration is finished, and the next iteration is started; and ending the iteration until the result converges or the maximum iteration number is reached.
Further, the step S5 specifically includes the following steps:
s51, summarizing and combining the results obtained by clustering the processes in the S4 into a result list;
and step S52, traversing the result list and uniformly outputting the category numbers to avoid the repetition of the category numbers.
Compared with the prior art, the invention has the following beneficial effects:
1. the method effectively extracts flow direction semantic information, calculates the space-time semantic similarity by combining the time similarity and the space similarity, and more comprehensively measures the flow direction similarity.
2. The invention organically combines the idea of an undirected graph complex network with a clustering algorithm, simplifies the complex network by adopting a Gaussian kernel function, and realizes automatic noise identification by utilizing graph connected components.
3. The invention improves the heuristic function based on the multi-path graph cutting rule, screens the ant colony initial nodes by using the betweenness centrality based on the complex network thought, and effectively improves the clustering effect.
Drawings
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a part of connected components of the established complex network according to the embodiment of the present invention.
Fig. 3 is an original flow diagram of an embodiment of the present invention.
Fig. 4 shows a partial clustering result according to an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the present embodiment provides an OD flow direction clustering method based on a multiple tangent diagram criterion and ant colony optimization, including the following steps:
step S1, removing repeated values, error values and meaningless values in OD flow data and POI (point of interest) data, constructing an OD flow library by using MongoDB, and establishing a spatial index by using 2dsphere Indexes;
step S2, establishing a flow direction terminal buffer area, selecting POI points in the buffer area, calling a python genesis tool library to construct a theme distribution model based on POI data, calculating semantic similarity on the basis, and calculating OD flow direction space and time similarity to finally obtain flow direction space-time semantic similarity;
s3, constructing an OD flow direction initial undirected graph complex network based on the space-time semantic similarity, extracting all connected components, and identifying noise and connected components to be clustered by using the connected components;
s4, designing and improving a heuristic function based on a multi-path graph cutting rule, and clustering connected components to be clustered by adopting a multi-process parallel mode for the connected components to be clustered by one process in combination with the positive feedback function of the ant colony pheromone;
and S5, summarizing the clustering results of the processes in the step S4 to obtain a final clustering result.
In this embodiment, the step S2 specifically includes the following steps:
step S21: establishing a circular buffer area with the radius of 250 meters according to the flow direction end point, and searching all POI points in the buffer area;
step S22: summarizing the type field value of each POI point corresponding to each flow direction into a document, wherein the document is the POI semantic document corresponding to the flow direction, summarizing all the semantic documents in the flow direction to establish a corpus, calling a python generative tool library by using the corpus to train an LDA theme distribution model, training to obtain corpus-theme distribution, and predicting the flow direction corpus input model to obtain theme-flow direction, namely flow direction theme probability distribution;
step S23: calculating JS divergence according to the theme probability distribution, taking the JS divergence as semantic similarity measurement, and calculating the semantic similarity of the flow direction i and the flow direction j according to the following formula:
wherein ,Pi、PjTopic probability distributions of flow directions i and j, respectively; pi(x)、Pj(x) Respectively the topic probability distribution values of the X topics in the flow direction i, j;
step S24, calculating spatial similarity sim of flow directiondisAnd time similarity simt;
Step S25, mapping the flow direction semantics, time and space similarity by using a Gaussian kernel function to obtain the OD flow direction space-time semantics similarity, wherein the calculation formula is as follows:
in this embodiment, the step S3 specifically includes the following steps:
step S31, taking the flow direction as a network node, taking the space-time semantic similarity of the OD flow directions between every two flow directions as the weight of an edge, using NetworkX to establish an undirected graph complex network, and extracting all connected components;
step S32, dividing the components according to the node number of the connected components; noise components classified as less than a threshold; and if the value is larger than the threshold value, classifying the connected components to be clustered. In the present embodiment, the threshold is 2.
In this embodiment, the step S4 specifically includes the following steps:
and S41, clustering one connected component by one process in a multi-process parallel mode, and executing the steps S42-S48 by each process.
Step S42, establishing an initial pheromone matrix with dimension N x N according to the number of nodes of the connected components, wherein the initial pheromone value is 1;
step S43, calculating the betweenness centrality of each node in the connected components, sorting according to the centrality value, selecting the first K points as the initial positions of ants, and starting to search; the centrality calculation formula is as follows:
wherein ,representing the number of paths which pass through the node i and are the shortest paths; gstRepresents the number of shortest paths connecting s and t;
step S44, acquiring the adjacent node of the ant, judging whether the adjacent node is in the taboo list, if not, adding the node into the alternative node list;
step S45, traversing the alternative node list, and calculating the heuristic function n improved based on the multi-path graph cutting criterionij(t) and binding pheromoneij(t) calculating the probability of the node being selectedAt time t, the ant k at node i selects the probability of node j as the next nodeThe calculation formula of (a) is as follows:
wherein the pherij(t) is a pheromone factor; n isij(t) is a heuristic function; simijFlow direction spatiotemporal semantic similarity; MNCut(k,ij)(t) selecting a multipath graph cutting factor of the j node for the time t; cut (A)(k),V-A(k)) Is A(k)The sum of the weights of the adjacent edges of the nodes in the class and other nodes in the class; assoc (A)(k)) Is A(k)The sum of the weights of adjacent edges among all nodes is similar;
step S46, selecting the next node by adopting a roulette mode according to the selection probability of each alternative node, and adding the selected node into a taboo table;
and step S47, updating pheromone according to the nodes selected by the ants, wherein the updating equation of the pheromone is as follows:
wherein epsilon is the volatilization rate of pheromone;the pheromone concentration of the kth ant on the side of ij at the time t is increased; ck(t) is the spatiotemporal semantic similarity of the edge passed by the kth ant at the time t;
step S48, when all ants have no optional node, the iteration is finished, and the next iteration is started; and ending the iteration until the result converges or the maximum iteration number is reached.
In this embodiment, the step S5 specifically includes the following steps:
s51, summarizing and combining the results obtained by clustering the processes in the S4 into a result list;
and step S52, traversing the result list and uniformly outputting the category numbers to avoid the repetition of the category numbers.
Preferably, in the embodiment, a topic distribution model is constructed by using a flow direction endpoint POI, flow direction spatiotemporal semantic similarity is calculated, an undirected graph complex network and an initial pheromone matrix are constructed, all connected components of the network are extracted, connected components to be clustered are identified, and the connected components to be clustered are clustered in parallel by adopting a multi-thread/multi-process ant colony algorithm based on a multi-path graph cutting criterion and ant colony optimization, so that OD flow direction clustering is realized.
Preferably, in this embodiment, taking OD flow direction data of taxis in the city of xiamen and POI data in the city of xiamen as an example, relevant parameters of the OD flow direction clustering method based on the multiple-way graph cutting criterion and ant colony optimization in this embodiment are shown in table 1:
the method specifically comprises the following steps:
step S1, removing repeated values, error values and meaningless values in OD flow data and POI (point of interest) data, constructing an OD flow library by using MongoDB, and establishing a spatial index by using 2dsphere Indexes;
step S2, establishing a flow direction terminal buffer area, selecting POI in the buffer area, calling a python genesis tool library to construct a theme distribution model based on POI data, calculating semantic similarity on the basis, calculating OD flow direction space and time similarity, and finally obtaining flow direction space-time semantic similarity;
s3, constructing an OD flow direction initial undirected graph complex network based on the space-time semantic similarity, extracting all connected components, and identifying noise and connected components to be clustered by using the connected components;
s4, designing and improving a heuristic function based on a multi-path graph cutting rule, and clustering connected components to be clustered by adopting a multi-process parallel mode for the connected components to be clustered by one process in combination with the positive feedback function of the ant colony pheromone;
and S5, summarizing the clustering results of the processes in the step S4 to obtain a final clustering result.
The step S2 includes the following steps:
step S21: and establishing a circular buffer area with the radius of 250 meters according to the flow direction end point, and searching all POI points in the buffer area.
Step S22: according to the flow direction number, summarizing the type field value of each POI point corresponding to the flow direction into a document, wherein the document is the POI semantic document corresponding to the flow direction, summarizing all the semantic documents in the flow direction to establish a corpus, training an LDA topic distribution model by using the corpus to obtain the corpus-topic distribution of the flow direction of the building OD, and then inputting the flow direction corpus into the model to predict to obtain the topic-flow direction, namely the topic probability distribution of the flow direction of the building OD. The embodiment performs model training according to the characteristics of data, and obtains ten types of relatively representative travel subjects such as work commute, travel, shopping travel, leisure entertainment, hospitalizing travel and the like. As shown in the following table:
step S23: calculating JS divergence according to the theme probability distribution, taking the JS divergence as semantic similarity measurement, and calculating the semantic similarity of the flow direction i and the flow direction j according to the following formula:
wherein ,Pi、PjTopic probability distributions of flow directions i and j, respectively; pi(x)、Pj(x) Respectively the topic probability distribution values of the X topics in the flow direction i, j;
step S24 spatial similarity S of flow directionimdisAnd time similarity simtThe calculations were performed with reference to the methods mentioned in the prior art;
step S25, mapping the flow direction semantics, time and space similarity by using a Gaussian kernel function to obtain the OD flow direction space-time semantics similarity, wherein the calculation formula is as follows:
as shown in fig. 2, the specific steps of step S3 are:
and step S31, taking the flow direction as a network node, taking the space-time semantic similarity of the OD flow directions between every two flow directions as the weight of an edge, establishing an undirected graph complex network by using NetworkX, and calculating all connected components.
And step S32, dividing the components according to the node number of the connected components. Noise components classified as less than threshold 2; and if the value is larger than the threshold value, classifying the connected components to be clustered.
As shown in fig. 3 and 4, step S4 specifically includes the following steps:
and S41, clustering one connected component by one process in a multi-process parallel mode, and executing the steps S42-S48 by each process.
Step S42, establishing an initial pheromone matrix with dimension N x N according to the number of nodes of the connected components, wherein the initial pheromone value is 1;
step S43, calculating the betweenness centrality of each node in the connected components, sorting according to the centrality value, selecting the first K points as the initial positions of ants, and starting to search; the centrality calculation formula is as follows:
wherein ,representing the number of paths which pass through the node i and are the shortest paths; gstRepresenting the shortest path connecting s and tThe number of diameters;
step S44, acquiring the adjacent node of the ant, judging whether the adjacent node is in the taboo list, if not, adding the node into the alternative node list;
step S45, traversing the alternative node list, and calculating the heuristic function n improved based on the multi-path graph cutting criterionij(t) and binding pheromoneij(t) calculating the probability of the node being selectedAt time t, the ant k at node i selects the probability of node j as the next nodeThe calculation formula of (a) is as follows:
wherein the pherij(t) is a pheromone factor; n isij(t) is a heuristic function; simijFlow direction spatiotemporal semantic similarity; MNCut(k,ij)(t) selecting a multipath graph cutting factor of the j node for the time t; cut (A)(k),V-A(k)) Is A(k)The sum of the weights of the adjacent edges of the nodes in the class and other nodes in the class; assoc (A)(k)) Is A(k)The sum of the weights of adjacent edges among all nodes is similar;
step S46, selecting the next node by adopting a roulette mode according to the selection probability of each alternative node, and adding the selected node into a taboo table;
and step S47, updating pheromone according to the nodes selected by the ants, wherein the updating equation of the pheromone is as follows:
wherein epsilon is the volatilization rate of pheromone;the pheromone concentration of the kth ant on the side of ij at the time t is increased; ck(t) is the spatiotemporal semantic similarity of the edge passed by the kth ant at the time t;
step S48, when all ants have no optional node, the iteration is finished, and the next iteration is started; and ending iteration until the result converges or the maximum iteration number is reached, and clustering the results of the trip part as shown in fig. 4.
Step S5 specifically includes:
and S51, summarizing the results obtained by clustering the processes in the step S4, and combining the results into a result list.
And step S52, traversing the result list, and uniformly outputting the category numbers to avoid the repetition of the category numbers. The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.
Claims (5)
1. An OD flow direction clustering method based on a multiple tangent diagram criterion and ant colony optimization is characterized in that: the method comprises the following steps:
step S1, removing repeated values, error values and meaningless values in the OD flow direction data and the POI data, constructing an OD flow direction library by using MongoDB, and establishing a spatial index by using 2dsphere Indexes;
step S2, establishing a flow direction terminal buffer area, selecting POI points in the buffer area, calling a python genesis tool library to construct a theme distribution model based on POI data, calculating semantic similarity on the basis, and calculating OD flow direction space and time similarity to finally obtain flow direction space-time semantic similarity;
s3, constructing an OD flow direction initial undirected graph complex network based on the space-time semantic similarity, extracting all connected components, and identifying noise and connected components to be clustered by using the connected components;
s4, designing and improving a heuristic function based on a multi-path graph cutting rule, and clustering connected components to be clustered by adopting a multi-process parallel mode for the connected components to be clustered by one process in combination with the positive feedback function of the ant colony pheromone;
and S5, summarizing the clustering results of the processes in the step S4 to obtain a final clustering result.
2. The OD flow direction clustering method based on the multiple tangent graph criterion and ant colony optimization according to claim 1, characterized in that: the step S2 specifically includes the following steps:
step S21: establishing a circular buffer area with the radius of 250 meters according to the flow direction end point, and searching all POI points in the buffer area;
step S22: summarizing the type field value of each POI point corresponding to each flow direction into a document, wherein the document is the POI semantic document corresponding to the flow direction, summarizing all the semantic documents in the flow direction to establish a corpus, calling a python generative tool library by using the corpus to train an LDA theme distribution model, training to obtain corpus-theme distribution, and predicting the flow direction corpus input model to obtain theme-flow direction, namely flow direction theme probability distribution;
step S23: calculating JS divergence according to the theme probability distribution, taking the JS divergence as semantic similarity measurement, and calculating the semantic similarity of the flow direction i and the flow direction j according to the following formula:
wherein ,Pi、PjRespectively is the flow directioni. A topic probability distribution of j; pi(x)、Pj(x) Respectively the topic probability distribution values of the X topics in the flow direction i, j;
step S24, calculating spatial similarity sim of flow directiondisAnd time similarity simt;
Step S25, mapping the flow direction semantics, time and space similarity by using a Gaussian kernel function to obtain the OD flow direction space-time semantics similarity, wherein the calculation formula is as follows:
3. the OD flow direction clustering method based on the multiple tangent graph criterion and ant colony optimization according to claim 1, characterized in that: the step S3 specifically includes the following steps:
step S31, taking the flow direction as a network node, taking the space-time semantic similarity of the OD flow directions between every two flow directions as the weight of an edge, using NetworkX to establish an undirected graph complex network, and extracting all connected components;
step S32, dividing the components according to the node number of the connected components; noise components classified as less than a threshold; and if the value is larger than the threshold value, classifying the connected components to be clustered.
4. The OD flow direction clustering method based on the multiple tangent graph criterion and ant colony optimization according to claim 1, characterized in that: the step S4 specifically includes the following steps:
s41, clustering one connected component by one process in a multi-process parallel mode, and executing steps S42-S48 by each process;
step S42, establishing an initial pheromone matrix with dimension N x N according to the number of nodes of the connected components, wherein the initial pheromone value is 1;
step S43, calculating the betweenness centrality of each node in the connected components, sorting according to the centrality value, selecting the first K points as the initial positions of ants, and starting to search; the centrality calculation formula is as follows:
wherein ,representing the number of paths which pass through the node i and are the shortest paths; gstRepresents the number of shortest paths connecting s and t;
step S44, acquiring the adjacent node of the ant, judging whether the adjacent node is in the taboo list, if not, adding the node into the alternative node list;
step S45, traversing the alternative node list, and calculating the heuristic function n improved based on the multi-path graph cutting criterionij(t) and binding pheromoneij(t) calculating the probability of the node being selectedAt time t, the ant k at node i selects the probability of node j as the next nodeThe calculation formula of (a) is as follows:
wherein the pherij(t) is a pheromone factor; n isij(t) is heuristicA function of formula; simijFlow direction spatiotemporal semantic similarity; MNCut(k,ij)(t) selecting a multipath graph cutting factor of the j node for the time t; cut (A)(k),V-A(k)) Is A(k)The sum of the weights of the adjacent edges of the nodes in the class and other nodes in the class; assoc (A)(k)) Is A(k)The sum of the weights of adjacent edges among all nodes is similar;
step S46, selecting the next node by adopting a roulette mode according to the selection probability of each alternative node, and adding the selected node into a taboo table;
and step S47, updating pheromone according to the nodes selected by the ants, wherein the updating equation of the pheromone is as follows:
wherein epsilon is the volatilization rate of pheromone;the pheromone concentration of the kth ant on the side of ij at the time t is increased; ck(t) is the spatiotemporal semantic similarity of the edge passed by the kth ant at the time t;
step S48, when all ants have no optional node, the iteration is finished, and the next iteration is started; and ending the iteration until the result converges or the maximum iteration number is reached.
5. The OD flow direction clustering method based on the multiple tangent graph criterion and ant colony optimization according to claim 1, characterized in that: the step S5 specifically includes the following steps:
s51, summarizing and combining the results obtained by clustering the processes in the S4 into a result list;
and step S52, traversing the result list and uniformly outputting the category numbers to avoid the repetition of the category numbers.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110782636.6A CN113516309B (en) | 2021-07-12 | 2021-07-12 | OD flow direction clustering method based on multipath graph cutting criterion and ant colony optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110782636.6A CN113516309B (en) | 2021-07-12 | 2021-07-12 | OD flow direction clustering method based on multipath graph cutting criterion and ant colony optimization |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113516309A true CN113516309A (en) | 2021-10-19 |
CN113516309B CN113516309B (en) | 2023-08-11 |
Family
ID=78066961
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110782636.6A Active CN113516309B (en) | 2021-07-12 | 2021-07-12 | OD flow direction clustering method based on multipath graph cutting criterion and ant colony optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113516309B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116342608A (en) * | 2023-05-30 | 2023-06-27 | 首都医科大学宣武医院 | Medical image-based stent adherence measurement method, device, equipment and medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105118052A (en) * | 2015-08-03 | 2015-12-02 | 福州大学 | Novel omnidirectional M type cardiogram motion curve extraction method |
WO2016095692A1 (en) * | 2014-12-15 | 2016-06-23 | 江南大学 | Method for improving ant colony optimization sensor-network cluster head |
CN108320512A (en) * | 2018-03-30 | 2018-07-24 | 江苏智通交通科技有限公司 | Macroscopical road safety analytic unit choosing method based on Laplce's spectrum analysis |
CN109993721A (en) * | 2019-04-04 | 2019-07-09 | 电子科技大学成都学院 | A kind of image enchancing method based on clustering algorithm and ant group algorithm |
-
2021
- 2021-07-12 CN CN202110782636.6A patent/CN113516309B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016095692A1 (en) * | 2014-12-15 | 2016-06-23 | 江南大学 | Method for improving ant colony optimization sensor-network cluster head |
CN105118052A (en) * | 2015-08-03 | 2015-12-02 | 福州大学 | Novel omnidirectional M type cardiogram motion curve extraction method |
CN108320512A (en) * | 2018-03-30 | 2018-07-24 | 江苏智通交通科技有限公司 | Macroscopical road safety analytic unit choosing method based on Laplce's spectrum analysis |
CN109993721A (en) * | 2019-04-04 | 2019-07-09 | 电子科技大学成都学院 | A kind of image enchancing method based on clustering algorithm and ant group algorithm |
Non-Patent Citations (5)
Title |
---|
GUO X G: "An OD flow clustering method based on vector constraints:A case study for Beijing taxi origin-destination data", ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, no. 02 * |
张晗: "基于LDA和优化蚁群的OD流向时空语义聚类算法", 地球信息科学学报, vol. 24, no. 05 * |
王祖超;袁晓如;: "轨迹数据可视分析研究", 计算机辅助设计与图形学学报, no. 01 * |
王立群;杨淑莹;安博;: "基于蚁群算法的多字符聚类识别", 天津理工大学学报, no. 05 * |
邹小林;: "改进的判别割及其在图像分割中的应用", 计算机应用, no. 08 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116342608A (en) * | 2023-05-30 | 2023-06-27 | 首都医科大学宣武医院 | Medical image-based stent adherence measurement method, device, equipment and medium |
CN116342608B (en) * | 2023-05-30 | 2023-08-15 | 首都医科大学宣武医院 | Medical image-based stent adherence measurement method, device, equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN113516309B (en) | 2023-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Priyam et al. | Comparative analysis of decision tree classification algorithms | |
CN106096727B (en) | A kind of network model building method and device based on machine learning | |
KR102431549B1 (en) | Causality recognition device and computer program therefor | |
JP5092165B2 (en) | Data construction method and system | |
CN107330734B (en) | Co-location mode and ontology-based business address selection method | |
CN104615608A (en) | Data mining processing system and method | |
CN103593400A (en) | Lightning activity data statistics method based on modified Apriori algorithm | |
US10387805B2 (en) | System and method for ranking news feeds | |
CN105354305A (en) | Online-rumor identification method and apparatus | |
CN112507699A (en) | Remote supervision relation extraction method based on graph convolution network | |
CN103208039A (en) | Method and device for evaluating software project risks | |
CN110275929B (en) | Candidate road section screening method based on grid segmentation and grid segmentation method | |
CN111062520B (en) | Hostname feature prediction method based on random forest algorithm | |
CN112311608B (en) | Multilayer heterogeneous network space node characterization method | |
CN113177101B (en) | User track identification method, device, equipment and storage medium | |
CN111897733A (en) | Fuzzy test method and device based on minimum set coverage | |
CN113516309A (en) | OD flow direction clustering method based on multi-path graph cutting rule and ant colony optimization | |
CN117436724A (en) | Multi-source data visual analysis method and system based on smart city | |
KR101275834B1 (en) | Method of miming Top-K important patterns | |
CN116910283A (en) | Graph storage method and system for network behavior data | |
Mittal et al. | A COMPARATIVE STUDY OF ASSOCIATION RULE MINING TECHNIQUES AND PREDICTIVE MINING APPROACHES FOR ASSOCIATION CLASSIFICATION. | |
Fränti et al. | Averaging GPS segments competition 2019 | |
CN109633748A (en) | A kind of seismic properties preferred method based on improved adaptive GA-IAGA | |
CN115860434B (en) | Vegetation restoration planning method and device based on soil moisture resource bearing capacity | |
CN117540223A (en) | Social network public opinion propagation forwarding chain mining method and device based on AP algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |