CN117454208A - Deep learning-based shared bicycle travel network community mining method - Google Patents

Deep learning-based shared bicycle travel network community mining method Download PDF

Info

Publication number
CN117454208A
CN117454208A CN202311327161.7A CN202311327161A CN117454208A CN 117454208 A CN117454208 A CN 117454208A CN 202311327161 A CN202311327161 A CN 202311327161A CN 117454208 A CN117454208 A CN 117454208A
Authority
CN
China
Prior art keywords
travel
network
node
shared bicycle
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311327161.7A
Other languages
Chinese (zh)
Inventor
昌锡铭
孙会君
杨欣
刘天宇
吴建军
闫学东
尹浩东
屈云超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN202311327161.7A priority Critical patent/CN117454208A/en
Publication of CN117454208A publication Critical patent/CN117454208A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a deep learning-based method for mining a shared bicycle travel network community. The method comprises the following steps: analyzing and counting road network data information, shared bicycle data information and travel data of users in a designated area to construct a transportation travel network; quantitatively describing travel characteristics of users in the shared bicycle travel network, taking the travel quantitative indexes of the users as network measurement indexes of the graph neural network model, and constructing the graph neural network model; and carrying out community cluster analysis by using a ClusterNet algorithm based on the graph neural network model, and mining a community structure. According to the method, a dynamic travel network is constructed according to different time periods, and the time-space distribution of travel demands in different stages is analyzed. And (3) quantifying travel characteristics by utilizing space statistics and a complex network method to construct indexes, so that the change of travel modes of users in different periods is clearly known, and a community structure in a travel network is dynamically mined.

Description

Deep learning-based shared bicycle travel network community mining method
Technical Field
The invention relates to the technical field of computer application, in particular to a deep learning-based method for mining a shared bicycle travel network community.
Background
Cities are a large and complex system, and planning and transportation of cities are closely related to our daily lives. Therefore, the space-time pattern of the resident travel flow is quantized, and the dynamics of the city constituent elements can be effectively reflected. The development of urban economy results in the continuous expansion of traffic scale, and the travel demands of people are increasing. The frequent traffic problems such as traffic jams, unbalanced travel demand distribution and the like are increasingly prominent, and the traffic problems are always concerned by various communities. Knowing the travel demand, travel behavior and travel community structure changes of the user can assist the government and operators to provide better services for passengers. The users have different travel selection behaviors on different travel modes, and the emerging shared bicycle provides a new traffic mode for short-distance travel and strengthens the connection with other traffic modes such as buses, subways and the like.
At present, some breakthroughs and innovations are made in the research of the travel movement characteristic method of the human beings. However, there are still some problems. First, the change study of the travel demands, travel behaviors and travel community structures of the users is insufficient. Secondly, the emerging deep learning graph neural network can create more powerful node attribute and community structure representation, and has strong learning ability for community discovery of a user trip network, wherein learning of a network topological structure and node characteristics is considered.
Disclosure of Invention
The embodiment of the invention provides a deep learning-based community mining method for a shared bicycle travel network, which is used for effectively identifying communities and community structure changes existing in the shared bicycle travel network.
In order to achieve the above purpose, the present invention adopts the following technical scheme.
A deep learning-based method for mining a shared bicycle travel network community comprises the following steps:
analyzing and counting road network data information, shared bicycle data information and travel data of users in a designated area to construct a transportation travel network;
quantitatively describing travel characteristics of users in the shared bicycle travel network, taking the travel quantitative indexes of the users as network measurement indexes of the graph neural network model, and constructing the graph neural network model;
and carrying out community cluster analysis by using a ClusterNet algorithm based on the graph neural network model, and mining a community structure.
Preferably, the analyzing and counting the road network data information, the shared bicycle data information and the travel data of the user in the designated area to construct a traffic travel network includes:
analyzing shared bicycle data and road network data information of a designated area, and constructing a traffic travel network based on travel data of a user, wherein points in the traffic travel network are the initial and final points of travel of the user, and a journey from the initial point to the final point is considered as an edge between nodes;
a directional transportation travel network G= (V, E, W) is constructed based on shared bicycle data of the front, middle and rear periods of a specified time period, wherein V= { V 1 ,…,V N The set of points, e= { E } represents ij |i,j=1,2,…,N,i≠j},e ij =1 indicates that there is an edge between node i and node j, e ij =0 means that there is no connecting edge between node i and node j, w= { W ij I, j=1, 2, …, N, i+.j }, is a set of weights, w ij Representing edge e ij I.e. the amount of travel of the edge between node i and node j.
Preferably, the quantitatively describing the travel characteristics of the user in the shared bicycle travel network, taking the user travel quantitative index as a network measurement index of the graph neural network model, and constructing the graph neural network model includes:
quantitatively describing travel characteristics of users in a shared bicycle travel network by using user travel quantification indexes, taking the user travel quantification indexes as network measurement indexes of a graph neural network model, taking the users in the shared bicycle travel network as nodes in the graph neural network, and constructing the graph neural network model based on the shared bicycle travel data by using all network measurement indexes, wherein the network measurement indexes comprise: degree, intensity, cluster coefficient, pageRank value, net flow ratio, and molan index of the nodes;
in the graph neural network model, the degree d of the node i is defined i As the node number of the connected nodes, if j is a neighbor node of i, e as shown in equation 1 ij =1; otherwise, e ij =0;
Defining the intensity s of node i i Describing the intensity of the passenger flow between the nodes, as shown in formula 2, w ij The OD passenger flow from the starting point to the end point between the node i and the node j;
defining a clustering coefficient C of the network as an average value of all node clustering coefficients, as shown in a formula 3 and a formula 4;
c is the clustering coefficient of the network, C i Is the clustering coefficient of node i, e i Inter-node connection for node i neighbor nodesThe number of sides;
the PageRank value is used for representing the influence score of the node, and the calculation formula of the PageRank value of the node is shown as formula (5);
c i PageRank value, c, for the ith node i ∈[0,1]P is the damping coefficient,indicating the degree of egress of the jth node, wherein the degree of egress of a node in a complex network refers to the number of connections from a certain node, i.e. the number of edges pointing from the node to other nodes, a ji The adjacency matrix of any directed network is, the higher the PageRank value of the node is, the more important the node is;
the inflow and outflow passenger flows in different areas with different peak time periods are analyzed by adopting a net flow ratio NFR, and the calculation method of the NFR is shown in a formula 6:
NFR i in the range between-1 and 1, O i And D i Is the inflow and outflow of travel in region i;
the Morgan index is used for representing the space distribution rule and the evolution rule of the shared bicycle, and the calculation formula of the Morgan index is as follows:
wherein n represents the number of spatial regions, w ij Is the weight between site i and site j, y i And y j The attribute values representing site i and site j,is the average of all observations.
Preferably, the community cluster analysis is performed by using a clusterin algorithm based on the graph neural network model, and the mining of the community structure comprises:
performing community cluster analysis by using a ClusterNet algorithm based on a graph neural network model, performing graph embedding on the input data through a graph rolling network GCN, then putting the output of the convolution network into a Kmeans cluster function for iterative clustering, and finally calculating a loss function, namely an optimization target, by using an output distribution matrix and a modularity, and performing parameter optimization through error reverse transfer, wherein the output of the ClusterNet algorithm is a community partition label of each sharing bicycle station;
constructing a graph neural network model based on shared bicycle travel data, extracting travel connection relation between each pair of stations, taking the travel quantity of each pair of starting stations and end stations as an adjacent matrix, taking the adjacent matrix as the characteristic of a graph rolling network edge, and defining the adjacent matrixDescribing the space connection relation between stations, defining a characteristic matrix gamma by taking longitude and latitude of the stations, the number of stations 'departure day and passenger flows, the number of stations' arrival day and passenger flows, the number of hours 'departure passenger flows, and the statistics of the arrival passenger flows of the hours' stations as the characteristics of the stations t The characteristic matrix gamma t As input data for the clusterit algorithm;
the graph-rolling network model constructs a filter in the fourier domain, the filter acting on nodes of the graph, captures spatial features between shared bicycle sites based on a first order neighborhood of the filter, and constructs a deep GCN model by stacking multiple convolution layers, the modeling process being represented by equation 8:
wherein, is an adjacency matrix, I N Is an identity matrix>Is a degree matrix of a shared bicycle site network, wherein +.>H (l) Is the output of the first layer, θ (l) Is a training parameter of the first layer, sigma (·) represents an activation function of a nonlinear model, and a feature matrix gamma is given by adopting a ReLU activation function t And adjacency matrix->A two-layer GCN model is represented by equation 9, where θ (1) Is a trainable weight matrix from the input layer to the hidden layer, θ (2) Is a trainable weight matrix from the hidden layer to the output layer.
Dividing the feature matrix into communities by using a Kmeans algorithm, assuming that N nodes exist, each node represents one input, dividing the nodes of the graph into k different communities according to the input based on a ClusterGCN model, and finding a dividing mode r to maximize the modularity of the k communities, wherein the modularity is defined as a loss function, and the calculation formula of the loss function is shown as a formula 10;
where i and j are any two nodes in the graph, A when the two nodes are directly connected i,j =1, otherwise a i,j =0。d i Is the degree of point i. Delta (r) i ,r j ) Is used to determine whether nodes i and j are in the same community, delta (r) i ,r j ) =1, otherwise δ (r i ,r j )=0;
And calculating gradients through a back propagation algorithm, updating network parameters through an optimization algorithm, repeatedly executing the steps of forward propagation, loss calculation, back propagation and parameter updating, iteratively training the neural network, evaluating the model through a verification set in the training process, learning travel characteristics of the shared bicycle stations and stations, and mining community structures of the shared bicycle travel network.
According to the technical scheme provided by the embodiment of the invention, the deep learning method for user travel characteristic analysis and community structure detection based on the graph neural network is provided. And constructing a dynamic travel network according to different time periods, and analyzing the space-time distribution of travel demands in different stages. And the travel characteristics are quantified by using space statistics and a complex network method to construct indexes, so that the change of the travel modes of users in different periods can be clearly known. Then, an end-to-end deep learning model ClusterGCN is provided, and a community structure in the travel network is dynamically mined.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a process flow diagram of a deep learning-based shared bicycle travel network community mining method provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of a New York sharing bicycle research area;
FIG. 3 is a sunrise demand distribution feature for a designated area sharing bicycle;
FIG. 4 is a spatial distribution diagram of the OD travel of a shared bicycle in a designated area;
fig. 5 is a layout diagram of a shared bicycle travel network Fruchterman Reingold;
fig. 6 is a community structure mining of a single vehicle travel network shared at different periods.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the present invention and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the purpose of facilitating an understanding of the embodiments of the invention, reference will now be made to the drawings of several specific embodiments illustrated in the drawings and in no way should be taken to limit the embodiments of the invention.
The invention provides a deep learning method for detecting travel characteristics and community structures of users based on a network based on shared bicycle travel data. And constructing a dynamic travel network according to different time periods, and analyzing the space-time distribution of travel demands in different stages. And the travel is quantified by constructing indexes by using a space statistics and complex network method, so that the change of the travel modes of users in different periods can be clearly known. Then, an end-to-end deep learning model ClusterGCN is provided, and a community structure in a travel network is dynamically mined to discuss the change of travel characteristics of users.
The processing flow of the deep learning-based shared bicycle travel network community mining method provided by the embodiment of the invention is shown in fig. 1, and comprises the following processing flows:
step S1, data preparation and data analysis. And preparing and analyzing road network data information and shared bicycle data information in the statistical designated area.
Step S2: the network constructs a traffic travel network.
And analyzing the shared bicycle data in the designated area, and constructing a traffic travel network based on travel data of the user, wherein a point in the traffic travel network is a start-stop point of travel of the user, and a journey from the start point to the end point is considered as an edge between nodes. A directional weighting network G= (V, E, W) is constructed based on shared bicycle data of the front, middle and rear periods of a specified time period, wherein V= { V 1 ,…,V N The set of points, e= { E } represents ij I, j=1, 2, …, N, i+.j }. Wherein e ij =1, indicating that there is an edge between node i and node j, otherwise e ij =0。W={w ij I, j=1, 2, …, N, i+.j } is a set of weights, w ij Representing edge e ij I.e. the amount of travel of the edge between node i and node j.
Step S3: and constructing user travel quantification indexes by using a space statistics and complex network method, and taking the user travel quantification indexes as network measurement indexes of the graph neural network model.
And quantitatively describing the travel characteristics of the users in the shared bicycle travel network by using the user travel quantification indexes, taking the user travel quantification indexes as network measurement indexes of the graph neural network model, taking the users in the shared bicycle travel network as nodes in the graph neural network, and constructing the graph neural network model based on the shared bicycle travel data by using all the network measurement indexes. The network metric index includes: degree, intensity, cluster coefficient, pageRank, net flow ratio, and molan index of the nodes. The graph convolutional neural network is a machine learning model for processing graph structure data. Information transfer and feature aggregation are implemented on graph data by updating the representation of the node in view of the information of the nodes' neighbors and edges. The graph neural network structure is typically composed of an input layer, a graph roll layer, a pooling layer, and an output layer. The input layer receives a representation of the graph data, typically structural information such as feature vectors of nodes, attributes of edges, and the like. The graph convolution layer is a core layer of the graph neural network and is used for information transmission and feature aggregation on graph data. The pooling layer is used for reducing the dimension and complexity of the graph data, thereby reducing the calculation amount and the memory consumption. The output layer converts the final representation of the neural network into the desired output form. The graph convolutional neural network gradually aggregates more local and global information through multiple layers of iterations.
In the graph neural network model, the degree d of the node i i Defined as the number of nodes connected to the node, which can measure the importance of node i in the network. The degree of the node is defined as in equation 1. If j is a neighbor node of i, e ij =1; otherwise, e ij =0。
Defining the intensity s of node i i To describe the intensity of traffic between nodes as in equation 2.w (w) ij Is the OD (Origin) between node i and node jto Destination, start to end) traffic.
The clustering coefficient represents the aggregation degree of the network, and the network with larger clustering coefficient may have more node groups with larger node connection degree. C is the clustering coefficient of the network, C i Is the cluster coefficient of node i. The calculations are as in equations 3 and 4. Wherein d i Is the degree of node i. e, e i The number of the connected edges between the neighbor nodes of the node i. The clustering coefficient C of the network is the average value of all node clustering coefficients.
The importance degree of the nodes in the network can be calculated by PageRank indexes and the PageRank algorithm is adopted to calculate the influence score of the nodes. The PageRank index is proposed as an algorithm for calculating the importance of Internet web pages, and the calculation method is shown as formula 5. Wherein c i Scoring c for influence of the ith node i ∈[0,1]. p is the damping coefficient, let p=0.85 empirically.Represents the outbound degree of the jth node, a ji Is an adjacency matrix for any directed network. The higher the PageRank value of a node, the more important the node.
In a complex network, each node has an in-degree (in-degree) and an out-degree (out-degree) that are used to describe the connection mode of the node in the network. Ingress refers to the number of connections that point to a node, i.e., the number of edges that point to the node from other nodes. The ingress represents how many other nodes are connected to the node and can be understood as the number of received connections for the node. The degree of egress refers to the number of connections from a certain node, i.e. the number of edges pointing from that node to other nodes. The degree of egress indicates how many other nodes can be reached by the node and can be understood as the number of transmit connections of the node.
To extract the difference of the user's usage characteristics of the shared bicycle at different stages, the inflow and outflow passenger flows of different areas at different peak periods are analyzed by Net Flow Rate (NFR). The NFR is calculated as shown in equation 6. Wherein, NFR i In the range between-1 and 1, O i And D i Is the inflow and outflow of travel in region i. NFR (NFR) i And the vehicle quantity of the region i is more than the vehicle quantity of the region i, namely the outflow quantity of the region i is more, and more vehicles need to be allocated to meet the travel requirements of users. Conversely, NFR i A value greater than 0 indicates that there is more vehicle return than vehicle pick-up, and inflow is dominant.
Spatial autocorrelation analysis is a statistical method used to study geospatial data and may reveal regional structure information of spatial variables. The spatial distribution rule and the evolution rule of the shared bicycle are researched through a global Moran index and a local spatial autocorrelation Moran index (Moran's I), and a calculation formula is as follows. Wherein n represents the number of spatial regions, w ij Is the weight between site i and site j, y i And y j The attribute values representing site i and site j,is the average of all observations. The value of the Morgan index is generally between-1 and 1, and when the value is positive, morgan I is a coefficient between-1 and 1. When the value is greater than 0, it indicates positive correlation of data, and the larger the value is, the more significant the spatial correlation is.
Step S4: and carrying out community cluster analysis by using a ClusterNet algorithm based on the graph neural network model, and mining a community structure.
The invention provides an end-to-end model ClusterGCN, which is characterized in that data is subjected to graph embedding through GCN (Convolutional Neural Network, graph rolling network), then the output of the convolution network is put into Kmeans clustering function for iterative clustering, finally the loss function, namely an optimization target, is calculated by using the output distribution matrix and modularity, and parameter optimization is carried out through error reverse transfer. The inputs based on the sharing bicycle travel data ClusterNet algorithm are longitude and latitude of the stations, the number of station departure day passenger flows, the number of station arrival day passenger flows, the number of hour station departure passenger flows, the hour station arrival passenger flow statistics and the travel quantity among the stations. The output of the clusterit algorithm is a corporate division tag for each shared bicycle site.
Constructing a graph neural network model based on shared bicycle travel data, extracting a travel connection relation between each pair of stations, taking the travel quantity (OD quantity) of each pair of starting stations and end stations as an adjacent matrix, taking the adjacent matrix as a characteristic of a graph rolling network edge, and defining the adjacent matrixTo describe the spatial connection relationship between sites. The longitude and latitude of the station, the number of station departure day-stream and the number of station arrival day-stream, the station departure passenger stream of the hour station, and the statistics of the arrival passenger stream of the hour station are taken as the characteristics of the station, and a characteristic matrix gamma is defined t . The GCN model constructs a filter in the fourier domain that acts on the nodes of the graph, captures spatial features between the shared bicycle sites based on its first order neighborhood, and constructs a deep GCN model by superimposing multiple convolution layers. The modeling process is represented by equation 8.
Wherein, is an adjacency matrix, I N Is an identity matrix>Is a degree matrix of a shared bicycle site network, wherein +.>H (l) Is the output of the first layer, θ (l) Is a training parameter of the first layer, sigma (·) represents an activation function of the nonlinear model, and the invention adopts a ReLU activation function. Given a feature matrix gamma t And adjacency matrix->A two-layer GCN model can be represented by equation 9. Wherein θ is (1) Is a trainable weight matrix from the input layer to the hidden layer, θ (2) Is a trainable weight matrix from the hidden layer to the output layer.
Through feature extraction of the dual-graph neural network, structural information and attribute information in an original graph structure are well reduced in dimension, and then a feature matrix of the dual-graph neural network is used as input of a clustering module to divide communities. In this module, communities are partitioned using classical Kmeans algorithms.
The Kmeans algorithm is an unsupervised clustering method, has a much smaller calculation speed than a common community discovery algorithm, has a good clustering effect, and can be applied to the community discovery algorithm. The basic idea of the Kmeans algorithm is to gather the nodes in space around k cluster centers, divide the nodes closest to the k cluster centers together, and sequentially update each cluster center by adopting an iteration method until the optimal clustering effect is obtained. Firstly, determining the number of cluster centers k by a Kmeans algorithm, then clustering other nodes in the graph according to a parameter k, and finally obtaining a clustering result, wherein the nodes with high similarity degree are called one type, and the nodes with low similarity degree are not classified into one type, wherein the similarity degree is obtained by calculating the Euclidean distance of vectors among the nodes.
Assuming that there are N nodes, each representing an input, the nodes are divided into k communities based on the ClusterGCN model according to the inputs, and there are m connections between each other. The purpose of the community partitioning task is to partition the nodes of the graph into k different communities, which are internally dense with few connecting edges between groups. The goal of model training is therefore to find a partitioning pattern r, maximizing the modularity, defined as equation 10.
The quality of the network community division is often measured by modularity. Q is modularity, and the larger the modularity is, the more reasonable the corresponding community division is; the smaller the modularity, the more ambiguous the corresponding web community partitioning. The Q value ranges from-0.5 to 1, and the research shows that the clustering effect is good when the Q value is between 0.3 and 0.7. Where i and j are any two nodes in the graph, A when the two nodes are directly connected i,j =1, otherwise a i,j =0。d i Is the degree of point i. Delta (r) i ,r j ) Is used to determine whether nodes i and j are in the same community, delta (r) i ,r j ) =1, otherwise δ (r i ,r j )=0。
The training process of the community division model ClusterGCN for sharing the bicycle travel comprises the following key steps: firstly, based on the travel data of the shared bicycle, the travel characteristics of the stations and the travel quantity characteristics among the stations are extracted, and a travel network of the shared bicycle is constructed. And then, inputting the graph network data into a model, and obtaining a community division result through forward propagation calculation. During the forward propagation process, the clustericn network sequentially performs graph convolution, pooling and Kmeans clustering operations, and propagation between layers of the graph roll-up network is as shown in formula 8 and formula 9. The modularity is then defined as a loss function to measure the merits of the web community partitions, as in equation 10. Gradients are calculated by a back-propagation algorithm and network parameters are updated using an optimization algorithm. The steps of forward propagation, loss calculation, backward propagation and parameter updating are repeatedly performed, and the neural network is iteratively trained. During training, the model is evaluated using the validation set to learn its performance on unseen data. Through the steps, the community division model ClusterGCN for the travel of the shared bicycle can learn travel characteristics of the sites and the sites of the shared bicycle, so that the method is applied to community structure division tasks.
The urban traffic community division based on the travel data can realize intelligent traffic management and resource optimization, and different travel modes and groups, such as commuters, students, business travelers and the like, can be identified by analyzing the large-scale travel data, so that an accurate information basis is provided for urban traffic planning and decision-making; the difference of travel behaviors and demands of different groups can be better known through community division, and customized traffic services are provided for the different groups. For example, for commuter public transportation optimization, designing safer campus transportation schemes for students, personalized transportation solutions will improve resident travel experience and satisfaction; the community division based on the travel data can guide the optimal configuration of resources, promote popularization and promotion of sustainable travel modes, such as public transportation, walking, bicycle sharing and the like, and is beneficial to reducing traffic jams and promoting sustainable development of cities.
Examples
Fig. 1 is a flow chart of a method for mining a community of a travel network of a shared bicycle according to the embodiment, and referring to fig. 1, the method includes:
step S1: data preparation and data analysis.
The present invention requires the preparation of the following data: the united states designated area site data information and the united states designated area shared bicycle trip record data information. In order to verify the quantification effect of the constructed index based on the space statistics and the complex network method on the travel and the effectiveness of the community structure mining algorithm based on the graph neural network model ClusterGCN, shared bicycle data of the designated area in the United states in the front, middle and later periods of designated time periods are selected as data sources, and the travel characteristics of users in different time periods are respectively mined in 2019 to 2022. The new york CitiBike dataset includes order numbers, user pick-up and return times, pick-up and return stations, user IDs, user gender, rental times, etc. FIG. 2 depicts the study area of New York sharing bicycle; fig. 3 depicts sunrise demand time distribution characteristics for regional sharing bicycles specified in 2019, 2020, and 2021.
Step S2: and analyzing the shared bicycle data in the designated area, and constructing a shared bicycle travel network based on the shared bicycle data. The points in the travel network are shared bicycle stations, and the OD (optical density) quantity among the shared bicycle stations is the network side weight. Fig. 4 reflects the OD travel distribution of the sharing bicycle at different time periods for a given time period. The sharing list mainly travels in a short distance, so that the sharing list is concentrated in Manhattan urban areas for more travel. The designated time period has relatively little effect on the amount of travel of the shared bicycle. As shown in fig. 4 (a) and 4 (b), the daily trip amount of the shared bicycle is 58869 before the designated time period, 784 stations are provided, and the daily trip amount of the shared bicycle is 22758 in the period of 2020, which is serious in the designated time period, and is reduced by 61.34%. In the recovery stage of the appointed time period, the travel amount of the shared bicycle is increased. In particular, the number of newly built stations is large, and the service range of the shared bicycle is further expanded. Of these, the requirement for sunrise in 2022, 4, is 77320, with 1547 sites.
Step S3: index quantification trip is constructed through space statistics and complex network method
Table 1 is travel law change of the shared bicycle network in the front, middle and later rest days and working days of a specified period. On weekdays, the degree of nodes of the shared bicycle network decreases, indicating poorer connectivity between areas, affected by the specified time period. The degree distribution of the shared bicycle network is reduced in the recovery period of the designated time period, which means that the connection degree of nodes in the shared bicycle network is reduced in the later period of the designated time period. The shared bicycle system has more newly added stations after a specified time period, and has 784 stations in 2019, 887 stations in 2020 and 1396 stations in 2021. The shared bicycle network node strength is also somewhat reduced under the influence of the designated time period on the network strength index. The larger clustering coefficient indicates that the network may have more node groups with larger node connection degree, the clustering coefficient of the shared bicycle network is smaller, and the connection between stations is not very tight. The shared bicycle network has an inflow greater than an outflow before a specified period of time, whereas an outflow greater than an inflow after the specified period of time. Over the network spatial signature distribution, the Moran index of the shared bicycle network is 0.6996 before a given time period, a larger Moran's I value indicates more pronounced spatial correlation, and the Moran index is reduced for periods of severe given time period. As the specified time period is restored, the shared bicycle network spatial correlation decreases. The degree and strength of nodes in the holiday shared bicycle network are reduced compared to the workday.
Table 1 average metric index value for shared bicycle network
The Fruchterman Reingold layout is used to display the volume and graphical structure in a shared bicycle system, see fig. 5. The structure of the figure has significant multi-core, multi-cluster features. The dot size shows the degree of the node, which is the sum of the ingress and egress degrees. The gray scale of the line color indicates the magnitude of the OD quantity between stations, and the brighter the color, the greater the OD quantity. The results show that the characteristics of the graph structure are different at different stages of the designated time period. The number of the stations of the multi-cluster before the appointed time period is more, and the degree of sharing the single-car stations is larger. The number of sites of the multi-cluster is significantly reduced during a given period of time, as is the degree of closeness of the connections between the nodes. The degree distribution of the shared bicycle stations is not obviously increased at the later stage of the designated time period, and the links among the stations are not close before the designated time period, which is related to more newly built stations.
Step S4: clusterGCN mining community structure based on graph neural network model
Community cluster analysis was performed with the clusterin algorithm based on the 2019 to 2022 designated area sharing bicycle data. Citibike, new York, is mainly focused on the Manhattan region. Table 2 shows the modularity index of community division, and the modularity value is between 0.3 and 0.7, so that the clustering effect is better. The modularity of the shared bicycle system is larger in 1 month, which indicates that the travel network is more tightly connected, and the modularity is smaller in 4 months. The method is characterized in that the travel of the shared bicycle is greatly influenced by weather, and the travel of the shared bicycle is gathered in the community in winter when the air temperature is lower. When the weather is gradually warmed, the travel modes of users become more diversified, and travel among communities is gradually increased.
The community structure divided based on the shared bicycle data is shown in fig. 6, and the community structure is represented by different symbols. As can be seen from fig. 6, there is a certain variation in the community division structure in different time periods, and two communities of manhattan and one community of queen region in fig. 6 are then merged into one large community, and the change of the area structure reflects the change of the travel characteristics of the user in different stages and the change of the interaction characteristic intensity with other areas. The fusion of the internal trips of each administrative area is promoted during a specified period of time, and the areas of the riding connection are more local, thereby forming a cluster community of smaller extent that is highly consistent with the geographic administrative area.
TABLE 2ClusterNet community discovery modularity index
In summary, the embodiment of the invention constructs the network based on the travel data of the shared bicycle in different periods, and analyzes the space-time distribution dynamic changes of the travel demands in different periods. And quantifying travel by using space statistics and a complex network method to construct indexes, and revealing the intrinsic law of the travel mode change of the user.
The invention provides an end-to-end deep learning model ClusterGCN, which adopts a graph convolution neural network and Kmeans algorithm to carry out community division and iterative optimization on a user dynamic travel network, and combines the learning of a network topological structure and node characteristics, thereby effectively identifying communities existing in a shared bicycle travel network and the changes of the community structure.
The method adopts the graph neural network to extract the multisource characteristics, trains the neural network through iterative optimization, is easy to understand and calculate, and has strong applicability.
The method provided by the invention can be used for knowing the travel demands, travel behaviors and travel community structure changes of the user, and providing better service for the travel of the passenger in the last kilometer.
Those of ordinary skill in the art will appreciate that: the drawing is a schematic diagram of one embodiment and the modules or flows in the drawing are not necessarily required to practice the invention.
From the above description of embodiments, it will be apparent to those skilled in the art that the present invention may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present invention.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, with reference to the description of method embodiments in part. The apparatus and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (4)

1. The utility model provides a sharing bicycle trip network community mining method based on deep learning, which is characterized by comprising the following steps:
analyzing and counting road network data information, shared bicycle data information and travel data of users in a designated area to construct a transportation travel network;
quantitatively describing travel characteristics of users in the shared bicycle travel network, taking the travel quantitative indexes of the users as network measurement indexes of the graph neural network model, and constructing the graph neural network model;
and carrying out community cluster analysis by using a ClusterNet algorithm based on the graph neural network model, and mining a community structure.
2. The method of claim 1, wherein the analyzing and counting the road network data information, the shared bicycle data information and the travel data of the user in the designated area to construct the transportation travel network comprises:
analyzing shared bicycle data and road network data information of a designated area, and constructing a traffic travel network based on travel data of a user, wherein points in the traffic travel network are the initial and final points of travel of the user, and a journey from the initial point to the final point is considered as an edge between nodes;
a directional transportation travel network G= (V, E, W) is constructed based on shared bicycle data of the front, middle and rear periods of a specified time period, wherein V= { V 1 ,...,V N The set of points, e= { E } represents ij |i,j=1,2,...,N,i≠j},e ij =1 indicates that there is an edge between node i and node j, e ij =0 means that there is no connecting edge between node i and node j, w= { W ij I, j=1, 2,..n, i+.j }, is a set of weights, w ij Representing edge e ij I.e. the amount of travel of the edge between node i and node j.
3. The method of claim 2, wherein quantitatively describing the travel characteristics of the users in the shared bicycle travel network, and constructing the graphic neural network model by using the user travel quantitative index as the network metric index of the graphic neural network model comprises:
quantitatively describing travel characteristics of users in a shared bicycle travel network by using user travel quantification indexes, taking the user travel quantification indexes as network measurement indexes of a graph neural network model, taking the users in the shared bicycle travel network as nodes in the graph neural network, and constructing the graph neural network model based on the shared bicycle travel data by using all network measurement indexes, wherein the network measurement indexes comprise: degree, intensity, cluster coefficient, pageRank value, net flow ratio, and molan index of the nodes;
in the graph neural network model, the degree d of the node i is defined i As the node number of the connected nodes, if j is a neighbor node of i, e as shown in equation 1 ij =1; otherwise, e ij =0;
Defining the intensity s of node i i Describing the intensity of the passenger flow between the nodes, as shown in formula 2, w ij The OD passenger flow from the starting point to the end point between the node i and the node j;
defining a clustering coefficient C of the network as an average value of all node clustering coefficients, as shown in a formula 3 and a formula 4;
c is the clustering coefficient of the network, C i Is the clustering coefficient of node i, e i The number of the connected edges between the neighbor nodes of the node i;
the PageRank value is used for representing the influence score of the node, and the calculation formula of the PageRank value of the node is shown as formula (5);
c i PageRank value, c, for the ith node i ∈[0,1]P is the damping coefficient,indicating the degree of egress of the jth node, wherein the degree of egress of a node in a complex network refers to the number of connections from a certain node, i.e. the number of edges pointing from the node to other nodes, a ji The adjacency matrix of any directed network is, the higher the PageRank value of the node is, the more important the node is;
the inflow and outflow passenger flows in different areas with different peak time periods are analyzed by adopting a net flow ratio NFR, and the calculation method of the NFR is shown in a formula 6:
NFR i in the range between-1 and 1, O i And D i Is the inflow and outflow of travel in region i;
the Morgan index is used for representing the space distribution rule and the evolution rule of the shared bicycle, and the calculation formula of the Morgan index is as follows:
wherein n represents the number of spatial regions, w ij Is the weight between site i and site j, y i And y j The attribute values representing site i and site j,is the average of all observations.
4. The method of claim 3, wherein the performing community cluster analysis based on the neural network model by using a clusterin algorithm to mine community structures comprises:
performing community cluster analysis by using a ClusterNet algorithm based on a graph neural network model, performing graph embedding on the input data through a graph rolling network GCN, then putting the output of the convolution network into a Kmeans cluster function for iterative clustering, and finally calculating a loss function, namely an optimization target, by using an output distribution matrix and a modularity, and performing parameter optimization through error reverse transfer, wherein the output of the ClusterNet algorithm is a community partition label of each sharing bicycle station;
building a graph neural network model based on shared bicycle travel data, and extracting travel connection relation between each pair of stationsThe trip amount of each pair of the starting station and the end station is taken as an adjacent matrix, and the adjacent matrix is taken as the characteristic of the network side of the graph convolution, thereby defining the adjacent matrixDescribing the space connection relation between stations, defining a characteristic matrix gamma by taking longitude and latitude of the stations, the number of stations 'departure day and passenger flows, the number of stations' arrival day and passenger flows, the number of hours 'departure passenger flows, and the statistics of the arrival passenger flows of the hours' stations as the characteristics of the stations t The characteristic matrix gamma t As input data for the clusterit algorithm;
the graph-rolling network model constructs a filter in the fourier domain, the filter acting on nodes of the graph, captures spatial features between shared bicycle sites based on a first order neighborhood of the filter, and constructs a deep GCN model by stacking multiple convolution layers, the modeling process being represented by equation 8:
wherein, is an adjacency matrix, I N Is an identity matrix>Is a degree matrix of a shared bicycle site network, wherein +.>H (l) Is the output of the first layer, θ (l) Is a training parameter of the first layer, sigma (·) represents an activation function of a nonlinear model, and a feature matrix gamma is given by adopting a ReLU activation function t And adjacency matrix->A two-layer GCN model is represented by equation 9, where θ (1) Is a trainable weight matrix from the input layer to the hidden layer, θ (2) Is a trainable weight matrix from the hidden layer to the output layer.
Dividing the feature matrix into communities by using a Kmeans algorithm, assuming that N nodes exist, each node represents one input, dividing the nodes of the graph into k different communities according to the input based on a ClusterGCN model, and finding a dividing mode r to maximize the modularity of the k communities, wherein the modularity is defined as a loss function, and the calculation formula of the loss function is shown as a formula 10;
where i and j are any two nodes in the graph, A when the two nodes are directly connected i,j =1, otherwise a i,j =0。d i Is the degree of point i. Delta (r) i ,r j ) Is used to determine whether nodes i and j are in the same community, delta (r) i ,r j ) =1, otherwise δ (r i ,r j )=0;
And calculating gradients through a back propagation algorithm, updating network parameters through an optimization algorithm, repeatedly executing the steps of forward propagation, loss calculation, back propagation and parameter updating, iteratively training the neural network, evaluating the model through a verification set in the training process, learning travel characteristics of the shared bicycle stations and stations, and mining community structures of the shared bicycle travel network.
CN202311327161.7A 2023-10-13 2023-10-13 Deep learning-based shared bicycle travel network community mining method Pending CN117454208A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311327161.7A CN117454208A (en) 2023-10-13 2023-10-13 Deep learning-based shared bicycle travel network community mining method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311327161.7A CN117454208A (en) 2023-10-13 2023-10-13 Deep learning-based shared bicycle travel network community mining method

Publications (1)

Publication Number Publication Date
CN117454208A true CN117454208A (en) 2024-01-26

Family

ID=89593904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311327161.7A Pending CN117454208A (en) 2023-10-13 2023-10-13 Deep learning-based shared bicycle travel network community mining method

Country Status (1)

Country Link
CN (1) CN117454208A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118037517A (en) * 2024-04-12 2024-05-14 深圳大学 Travel network dynamic community division method and device, computer equipment and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118037517A (en) * 2024-04-12 2024-05-14 深圳大学 Travel network dynamic community division method and device, computer equipment and medium

Similar Documents

Publication Publication Date Title
Chen et al. A novel fuzzy deep-learning approach to traffic flow prediction with uncertain spatial–temporal data features
Xing et al. Traffic state estimation of urban road networks by multi-source data fusion: Review and new insights
Long et al. Unified spatial-temporal neighbor attention network for dynamic traffic prediction
CN113762595B (en) Traffic time prediction model training method, traffic time prediction method and equipment
CN106910199A (en) Towards the car networking mass-rent method of city space information gathering
CN113283581B (en) Multi-fusion graph network collaborative multi-channel attention model and application method thereof
CN117454208A (en) Deep learning-based shared bicycle travel network community mining method
CN110837973B (en) Human trip selection information mining method based on traffic trip data
CN114944053A (en) Traffic flow prediction method based on spatio-temporal hypergraph neural network
Xiong et al. A descriptive bayesian approach to modeling and calibrating drivers' en route diversion behavior
CN113326974A (en) Multi-source traffic flow prediction method based on multitask hypergraph convolution network
CN111815046A (en) Traffic flow prediction method based on deep learning
Lim et al. Determinants of household flood evacuation mode choice in a developing country
CN115545758A (en) Method and system for self-adaptive incremental site selection of urban service facilities
CN115966083A (en) Distributed traffic flow prediction method and system based on time-space decoupling
Alizadeh et al. On the role of bridges as anchor points in route choice modeling
CN113159371B (en) Unknown target feature modeling and demand prediction method based on cross-modal data fusion
CN115080795A (en) Multi-charging-station cooperative load prediction method and device
CN109064750B (en) Urban road network traffic estimation method and system
CN111538916A (en) Interest point recommendation method based on neural network and geographic influence
Liu et al. Foreseeing private car transfer between urban regions with multiple graph-based generative adversarial networks
CN112559909B (en) Business area discovery method based on GCN embedded spatial clustering model
CN104778355B (en) The abnormal track-detecting method of traffic system is distributed based on wide area
CN117037483A (en) Traffic flow prediction method based on multi-head attention mechanism
CN115565376B (en) Vehicle journey time prediction method and system integrating graph2vec and double-layer LSTM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination