CN117454208A

CN117454208A - Deep learning-based shared bicycle travel network community mining method

Info

Publication number: CN117454208A
Application number: CN202311327161.7A
Authority: CN
Inventors: 昌锡铭; 孙会君; 杨欣; 刘天宇; 吴建军; 闫学东; 尹浩东; 屈云超
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2023-10-13
Filing date: 2023-10-13
Publication date: 2024-01-26

Abstract

The invention provides a deep learning-based method for mining a shared bicycle travel network community. The method comprises the following steps: analyzing and counting road network data information, shared bicycle data information and travel data of users in a designated area to construct a transportation travel network; quantitatively describing travel characteristics of users in the shared bicycle travel network, taking the travel quantitative indexes of the users as network measurement indexes of the graph neural network model, and constructing the graph neural network model; and carrying out community cluster analysis by using a ClusterNet algorithm based on the graph neural network model, and mining a community structure. According to the method, a dynamic travel network is constructed according to different time periods, and the time-space distribution of travel demands in different stages is analyzed. And (3) quantifying travel characteristics by utilizing space statistics and a complex network method to construct indexes, so that the change of travel modes of users in different periods is clearly known, and a community structure in a travel network is dynamically mined.

Description

Deep learning-based shared bicycle travel network community mining method

Technical Field

The invention relates to the technical field of computer application, in particular to a deep learning-based method for mining a shared bicycle travel network community.

Background

Cities are a large and complex system, and planning and transportation of cities are closely related to our daily lives. Therefore, the space-time pattern of the resident travel flow is quantized, and the dynamics of the city constituent elements can be effectively reflected. The development of urban economy results in the continuous expansion of traffic scale, and the travel demands of people are increasing. The frequent traffic problems such as traffic jams, unbalanced travel demand distribution and the like are increasingly prominent, and the traffic problems are always concerned by various communities. Knowing the travel demand, travel behavior and travel community structure changes of the user can assist the government and operators to provide better services for passengers. The users have different travel selection behaviors on different travel modes, and the emerging shared bicycle provides a new traffic mode for short-distance travel and strengthens the connection with other traffic modes such as buses, subways and the like.

At present, some breakthroughs and innovations are made in the research of the travel movement characteristic method of the human beings. However, there are still some problems. First, the change study of the travel demands, travel behaviors and travel community structures of the users is insufficient. Secondly, the emerging deep learning graph neural network can create more powerful node attribute and community structure representation, and has strong learning ability for community discovery of a user trip network, wherein learning of a network topological structure and node characteristics is considered.

Disclosure of Invention

The embodiment of the invention provides a deep learning-based community mining method for a shared bicycle travel network, which is used for effectively identifying communities and community structure changes existing in the shared bicycle travel network.

In order to achieve the above purpose, the present invention adopts the following technical scheme.

A deep learning-based method for mining a shared bicycle travel network community comprises the following steps:

analyzing and counting road network data information, shared bicycle data information and travel data of users in a designated area to construct a transportation travel network;

quantitatively describing travel characteristics of users in the shared bicycle travel network, taking the travel quantitative indexes of the users as network measurement indexes of the graph neural network model, and constructing the graph neural network model;

and carrying out community cluster analysis by using a ClusterNet algorithm based on the graph neural network model, and mining a community structure.

Preferably, the analyzing and counting the road network data information, the shared bicycle data information and the travel data of the user in the designated area to construct a traffic travel network includes:

analyzing shared bicycle data and road network data information of a designated area, and constructing a traffic travel network based on travel data of a user, wherein points in the traffic travel network are the initial and final points of travel of the user, and a journey from the initial point to the final point is considered as an edge between nodes;

a directional transportation travel network G= (V, E, W) is constructed based on shared bicycle data of the front, middle and rear periods of a specified time period, wherein V= { V ₁ ,…,V _N The set of points, e= { E } represents _ij |i,j＝1,2,…,N,i≠j}，e _ij =1 indicates that there is an edge between node i and node j, e _ij =0 means that there is no connecting edge between node i and node j, w= { W _ij I, j=1, 2, …, N, i+.j }, is a set of weights, w _ij Representing edge e _ij I.e. the amount of travel of the edge between node i and node j.

Preferably, the quantitatively describing the travel characteristics of the user in the shared bicycle travel network, taking the user travel quantitative index as a network measurement index of the graph neural network model, and constructing the graph neural network model includes:

quantitatively describing travel characteristics of users in a shared bicycle travel network by using user travel quantification indexes, taking the user travel quantification indexes as network measurement indexes of a graph neural network model, taking the users in the shared bicycle travel network as nodes in the graph neural network, and constructing the graph neural network model based on the shared bicycle travel data by using all network measurement indexes, wherein the network measurement indexes comprise: degree, intensity, cluster coefficient, pageRank value, net flow ratio, and molan index of the nodes;

in the graph neural network model, the degree d of the node i is defined _i As the node number of the connected nodes, if j is a neighbor node of i, e as shown in equation 1 _ij =1; otherwise, e _ij ＝0；

Defining the intensity s of node i _i Describing the intensity of the passenger flow between the nodes, as shown in formula 2, w _ij The OD passenger flow from the starting point to the end point between the node i and the node j;

defining a clustering coefficient C of the network as an average value of all node clustering coefficients, as shown in a formula 3 and a formula 4;

c is the clustering coefficient of the network, C _i Is the clustering coefficient of node i, e _i Inter-node connection for node i neighbor nodesThe number of sides;

the PageRank value is used for representing the influence score of the node, and the calculation formula of the PageRank value of the node is shown as formula (5);

c _i PageRank value, c, for the ith node _i ∈[0,1]P is the damping coefficient,indicating the degree of egress of the jth node, wherein the degree of egress of a node in a complex network refers to the number of connections from a certain node, i.e. the number of edges pointing from the node to other nodes, a _ji The adjacency matrix of any directed network is, the higher the PageRank value of the node is, the more important the node is;

the inflow and outflow passenger flows in different areas with different peak time periods are analyzed by adopting a net flow ratio NFR, and the calculation method of the NFR is shown in a formula 6:

NFR _i in the range between-1 and 1, O _i And D _i Is the inflow and outflow of travel in region i;

the Morgan index is used for representing the space distribution rule and the evolution rule of the shared bicycle, and the calculation formula of the Morgan index is as follows:

wherein n represents the number of spatial regions, w _ij Is the weight between site i and site j, y _i And y _j The attribute values representing site i and site j,is the average of all observations.

Preferably, the community cluster analysis is performed by using a clusterin algorithm based on the graph neural network model, and the mining of the community structure comprises:

performing community cluster analysis by using a ClusterNet algorithm based on a graph neural network model, performing graph embedding on the input data through a graph rolling network GCN, then putting the output of the convolution network into a Kmeans cluster function for iterative clustering, and finally calculating a loss function, namely an optimization target, by using an output distribution matrix and a modularity, and performing parameter optimization through error reverse transfer, wherein the output of the ClusterNet algorithm is a community partition label of each sharing bicycle station;

constructing a graph neural network model based on shared bicycle travel data, extracting travel connection relation between each pair of stations, taking the travel quantity of each pair of starting stations and end stations as an adjacent matrix, taking the adjacent matrix as the characteristic of a graph rolling network edge, and defining the adjacent matrixDescribing the space connection relation between stations, defining a characteristic matrix gamma by taking longitude and latitude of the stations, the number of stations 'departure day and passenger flows, the number of stations' arrival day and passenger flows, the number of hours 'departure passenger flows, and the statistics of the arrival passenger flows of the hours' stations as the characteristics of the stations _t The characteristic matrix gamma _t As input data for the clusterit algorithm;

the graph-rolling network model constructs a filter in the fourier domain, the filter acting on nodes of the graph, captures spatial features between shared bicycle sites based on a first order neighborhood of the filter, and constructs a deep GCN model by stacking multiple convolution layers, the modeling process being represented by equation 8:

wherein, is an adjacency matrix, I _N Is an identity matrix>Is a degree matrix of a shared bicycle site network, wherein +.>H ^(l) Is the output of the first layer, θ ^(l) Is a training parameter of the first layer, sigma (·) represents an activation function of a nonlinear model, and a feature matrix gamma is given by adopting a ReLU activation function _t And adjacency matrix->A two-layer GCN model is represented by equation 9, where θ ⁽¹⁾ Is a trainable weight matrix from the input layer to the hidden layer, θ ⁽²⁾ Is a trainable weight matrix from the hidden layer to the output layer.

Dividing the feature matrix into communities by using a Kmeans algorithm, assuming that N nodes exist, each node represents one input, dividing the nodes of the graph into k different communities according to the input based on a ClusterGCN model, and finding a dividing mode r to maximize the modularity of the k communities, wherein the modularity is defined as a loss function, and the calculation formula of the loss function is shown as a formula 10;

where i and j are any two nodes in the graph, A when the two nodes are directly connected _i,j =1, otherwise a _i,j ＝0。d _i Is the degree of point i. Delta (r) _i ,r _j ) Is used to determine whether nodes i and j are in the same community, delta (r) _i ,r _j ) =1, otherwise δ (r _i ,r _j )＝0；

And calculating gradients through a back propagation algorithm, updating network parameters through an optimization algorithm, repeatedly executing the steps of forward propagation, loss calculation, back propagation and parameter updating, iteratively training the neural network, evaluating the model through a verification set in the training process, learning travel characteristics of the shared bicycle stations and stations, and mining community structures of the shared bicycle travel network.

According to the technical scheme provided by the embodiment of the invention, the deep learning method for user travel characteristic analysis and community structure detection based on the graph neural network is provided. And constructing a dynamic travel network according to different time periods, and analyzing the space-time distribution of travel demands in different stages. And the travel characteristics are quantified by using space statistics and a complex network method to construct indexes, so that the change of the travel modes of users in different periods can be clearly known. Then, an end-to-end deep learning model ClusterGCN is provided, and a community structure in the travel network is dynamically mined.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a process flow diagram of a deep learning-based shared bicycle travel network community mining method provided by an embodiment of the invention;

FIG. 2 is a schematic diagram of a New York sharing bicycle research area;

FIG. 3 is a sunrise demand distribution feature for a designated area sharing bicycle;

FIG. 4 is a spatial distribution diagram of the OD travel of a shared bicycle in a designated area;

fig. 5 is a layout diagram of a shared bicycle travel network Fruchterman Reingold;

fig. 6 is a community structure mining of a single vehicle travel network shared at different periods.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the present invention and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

For the purpose of facilitating an understanding of the embodiments of the invention, reference will now be made to the drawings of several specific embodiments illustrated in the drawings and in no way should be taken to limit the embodiments of the invention.

The invention provides a deep learning method for detecting travel characteristics and community structures of users based on a network based on shared bicycle travel data. And constructing a dynamic travel network according to different time periods, and analyzing the space-time distribution of travel demands in different stages. And the travel is quantified by constructing indexes by using a space statistics and complex network method, so that the change of the travel modes of users in different periods can be clearly known. Then, an end-to-end deep learning model ClusterGCN is provided, and a community structure in a travel network is dynamically mined to discuss the change of travel characteristics of users.

The processing flow of the deep learning-based shared bicycle travel network community mining method provided by the embodiment of the invention is shown in fig. 1, and comprises the following processing flows:

step S1, data preparation and data analysis. And preparing and analyzing road network data information and shared bicycle data information in the statistical designated area.

Step S2: the network constructs a traffic travel network.

And analyzing the shared bicycle data in the designated area, and constructing a traffic travel network based on travel data of the user, wherein a point in the traffic travel network is a start-stop point of travel of the user, and a journey from the start point to the end point is considered as an edge between nodes. A directional weighting network G= (V, E, W) is constructed based on shared bicycle data of the front, middle and rear periods of a specified time period, wherein V= { V ₁ ,…,V _N The set of points, e= { E } represents _ij I, j=1, 2, …, N, i+.j }. Wherein e _ij =1, indicating that there is an edge between node i and node j, otherwise e _ij ＝0。W＝{w _ij I, j=1, 2, …, N, i+.j } is a set of weights, w _ij Representing edge e _ij I.e. the amount of travel of the edge between node i and node j.

Step S3: and constructing user travel quantification indexes by using a space statistics and complex network method, and taking the user travel quantification indexes as network measurement indexes of the graph neural network model.

And quantitatively describing the travel characteristics of the users in the shared bicycle travel network by using the user travel quantification indexes, taking the user travel quantification indexes as network measurement indexes of the graph neural network model, taking the users in the shared bicycle travel network as nodes in the graph neural network, and constructing the graph neural network model based on the shared bicycle travel data by using all the network measurement indexes. The network metric index includes: degree, intensity, cluster coefficient, pageRank, net flow ratio, and molan index of the nodes. The graph convolutional neural network is a machine learning model for processing graph structure data. Information transfer and feature aggregation are implemented on graph data by updating the representation of the node in view of the information of the nodes' neighbors and edges. The graph neural network structure is typically composed of an input layer, a graph roll layer, a pooling layer, and an output layer. The input layer receives a representation of the graph data, typically structural information such as feature vectors of nodes, attributes of edges, and the like. The graph convolution layer is a core layer of the graph neural network and is used for information transmission and feature aggregation on graph data. The pooling layer is used for reducing the dimension and complexity of the graph data, thereby reducing the calculation amount and the memory consumption. The output layer converts the final representation of the neural network into the desired output form. The graph convolutional neural network gradually aggregates more local and global information through multiple layers of iterations.

In the graph neural network model, the degree d of the node i _i Defined as the number of nodes connected to the node, which can measure the importance of node i in the network. The degree of the node is defined as in equation 1. If j is a neighbor node of i, e _ij =1; otherwise, e _ij ＝0。

Defining the intensity s of node i _i To describe the intensity of traffic between nodes as in equation 2.w (w) _ij Is the OD (Origin) between node i and node jto Destination, start to end) traffic.

The clustering coefficient represents the aggregation degree of the network, and the network with larger clustering coefficient may have more node groups with larger node connection degree. C is the clustering coefficient of the network, C _i Is the cluster coefficient of node i. The calculations are as in equations 3 and 4. Wherein d _i Is the degree of node i. e, e _i The number of the connected edges between the neighbor nodes of the node i. The clustering coefficient C of the network is the average value of all node clustering coefficients.

The importance degree of the nodes in the network can be calculated by PageRank indexes and the PageRank algorithm is adopted to calculate the influence score of the nodes. The PageRank index is proposed as an algorithm for calculating the importance of Internet web pages, and the calculation method is shown as formula 5. Wherein c _i Scoring c for influence of the ith node _i ∈[0,1]. p is the damping coefficient, let p=0.85 empirically.Represents the outbound degree of the jth node, a _ji Is an adjacency matrix for any directed network. The higher the PageRank value of a node, the more important the node.

In a complex network, each node has an in-degree (in-degree) and an out-degree (out-degree) that are used to describe the connection mode of the node in the network. Ingress refers to the number of connections that point to a node, i.e., the number of edges that point to the node from other nodes. The ingress represents how many other nodes are connected to the node and can be understood as the number of received connections for the node. The degree of egress refers to the number of connections from a certain node, i.e. the number of edges pointing from that node to other nodes. The degree of egress indicates how many other nodes can be reached by the node and can be understood as the number of transmit connections of the node.

To extract the difference of the user's usage characteristics of the shared bicycle at different stages, the inflow and outflow passenger flows of different areas at different peak periods are analyzed by Net Flow Rate (NFR). The NFR is calculated as shown in equation 6. Wherein, NFR _i In the range between-1 and 1, O _i And D _i Is the inflow and outflow of travel in region i. NFR (NFR) _i And the vehicle quantity of the region i is more than the vehicle quantity of the region i, namely the outflow quantity of the region i is more, and more vehicles need to be allocated to meet the travel requirements of users. Conversely, NFR _i A value greater than 0 indicates that there is more vehicle return than vehicle pick-up, and inflow is dominant.

Spatial autocorrelation analysis is a statistical method used to study geospatial data and may reveal regional structure information of spatial variables. The spatial distribution rule and the evolution rule of the shared bicycle are researched through a global Moran index and a local spatial autocorrelation Moran index (Moran's I), and a calculation formula is as follows. Wherein n represents the number of spatial regions, w _ij Is the weight between site i and site j, y _i And y _j The attribute values representing site i and site j,is the average of all observations. The value of the Morgan index is generally between-1 and 1, and when the value is positive, morgan I is a coefficient between-1 and 1. When the value is greater than 0, it indicates positive correlation of data, and the larger the value is, the more significant the spatial correlation is.

Step S4: and carrying out community cluster analysis by using a ClusterNet algorithm based on the graph neural network model, and mining a community structure.

The invention provides an end-to-end model ClusterGCN, which is characterized in that data is subjected to graph embedding through GCN (Convolutional Neural Network, graph rolling network), then the output of the convolution network is put into Kmeans clustering function for iterative clustering, finally the loss function, namely an optimization target, is calculated by using the output distribution matrix and modularity, and parameter optimization is carried out through error reverse transfer. The inputs based on the sharing bicycle travel data ClusterNet algorithm are longitude and latitude of the stations, the number of station departure day passenger flows, the number of station arrival day passenger flows, the number of hour station departure passenger flows, the hour station arrival passenger flow statistics and the travel quantity among the stations. The output of the clusterit algorithm is a corporate division tag for each shared bicycle site.

Constructing a graph neural network model based on shared bicycle travel data, extracting a travel connection relation between each pair of stations, taking the travel quantity (OD quantity) of each pair of starting stations and end stations as an adjacent matrix, taking the adjacent matrix as a characteristic of a graph rolling network edge, and defining the adjacent matrixTo describe the spatial connection relationship between sites. The longitude and latitude of the station, the number of station departure day-stream and the number of station arrival day-stream, the station departure passenger stream of the hour station, and the statistics of the arrival passenger stream of the hour station are taken as the characteristics of the station, and a characteristic matrix gamma is defined _t . The GCN model constructs a filter in the fourier domain that acts on the nodes of the graph, captures spatial features between the shared bicycle sites based on its first order neighborhood, and constructs a deep GCN model by superimposing multiple convolution layers. The modeling process is represented by equation 8.

Wherein, is an adjacency matrix, I _N Is an identity matrix>Is a degree matrix of a shared bicycle site network, wherein +.>H ^(l) Is the output of the first layer, θ ^(l) Is a training parameter of the first layer, sigma (·) represents an activation function of the nonlinear model, and the invention adopts a ReLU activation function. Given a feature matrix gamma _t And adjacency matrix->A two-layer GCN model can be represented by equation 9. Wherein θ is ⁽¹⁾ Is a trainable weight matrix from the input layer to the hidden layer, θ ⁽²⁾ Is a trainable weight matrix from the hidden layer to the output layer.

Through feature extraction of the dual-graph neural network, structural information and attribute information in an original graph structure are well reduced in dimension, and then a feature matrix of the dual-graph neural network is used as input of a clustering module to divide communities. In this module, communities are partitioned using classical Kmeans algorithms.

The Kmeans algorithm is an unsupervised clustering method, has a much smaller calculation speed than a common community discovery algorithm, has a good clustering effect, and can be applied to the community discovery algorithm. The basic idea of the Kmeans algorithm is to gather the nodes in space around k cluster centers, divide the nodes closest to the k cluster centers together, and sequentially update each cluster center by adopting an iteration method until the optimal clustering effect is obtained. Firstly, determining the number of cluster centers k by a Kmeans algorithm, then clustering other nodes in the graph according to a parameter k, and finally obtaining a clustering result, wherein the nodes with high similarity degree are called one type, and the nodes with low similarity degree are not classified into one type, wherein the similarity degree is obtained by calculating the Euclidean distance of vectors among the nodes.

Assuming that there are N nodes, each representing an input, the nodes are divided into k communities based on the ClusterGCN model according to the inputs, and there are m connections between each other. The purpose of the community partitioning task is to partition the nodes of the graph into k different communities, which are internally dense with few connecting edges between groups. The goal of model training is therefore to find a partitioning pattern r, maximizing the modularity, defined as equation 10.

The quality of the network community division is often measured by modularity. Q is modularity, and the larger the modularity is, the more reasonable the corresponding community division is; the smaller the modularity, the more ambiguous the corresponding web community partitioning. The Q value ranges from-0.5 to 1, and the research shows that the clustering effect is good when the Q value is between 0.3 and 0.7. Where i and j are any two nodes in the graph, A when the two nodes are directly connected _i,j =1, otherwise a _i,j ＝0。d _i Is the degree of point i. Delta (r) _i ,r _j ) Is used to determine whether nodes i and j are in the same community, delta (r) _i ,r _j ) =1, otherwise δ (r _i ,r _j )＝0。

The training process of the community division model ClusterGCN for sharing the bicycle travel comprises the following key steps: firstly, based on the travel data of the shared bicycle, the travel characteristics of the stations and the travel quantity characteristics among the stations are extracted, and a travel network of the shared bicycle is constructed. And then, inputting the graph network data into a model, and obtaining a community division result through forward propagation calculation. During the forward propagation process, the clustericn network sequentially performs graph convolution, pooling and Kmeans clustering operations, and propagation between layers of the graph roll-up network is as shown in formula 8 and formula 9. The modularity is then defined as a loss function to measure the merits of the web community partitions, as in equation 10. Gradients are calculated by a back-propagation algorithm and network parameters are updated using an optimization algorithm. The steps of forward propagation, loss calculation, backward propagation and parameter updating are repeatedly performed, and the neural network is iteratively trained. During training, the model is evaluated using the validation set to learn its performance on unseen data. Through the steps, the community division model ClusterGCN for the travel of the shared bicycle can learn travel characteristics of the sites and the sites of the shared bicycle, so that the method is applied to community structure division tasks.

The urban traffic community division based on the travel data can realize intelligent traffic management and resource optimization, and different travel modes and groups, such as commuters, students, business travelers and the like, can be identified by analyzing the large-scale travel data, so that an accurate information basis is provided for urban traffic planning and decision-making; the difference of travel behaviors and demands of different groups can be better known through community division, and customized traffic services are provided for the different groups. For example, for commuter public transportation optimization, designing safer campus transportation schemes for students, personalized transportation solutions will improve resident travel experience and satisfaction; the community division based on the travel data can guide the optimal configuration of resources, promote popularization and promotion of sustainable travel modes, such as public transportation, walking, bicycle sharing and the like, and is beneficial to reducing traffic jams and promoting sustainable development of cities.

Examples

Fig. 1 is a flow chart of a method for mining a community of a travel network of a shared bicycle according to the embodiment, and referring to fig. 1, the method includes:

step S1: data preparation and data analysis.

The present invention requires the preparation of the following data: the united states designated area site data information and the united states designated area shared bicycle trip record data information. In order to verify the quantification effect of the constructed index based on the space statistics and the complex network method on the travel and the effectiveness of the community structure mining algorithm based on the graph neural network model ClusterGCN, shared bicycle data of the designated area in the United states in the front, middle and later periods of designated time periods are selected as data sources, and the travel characteristics of users in different time periods are respectively mined in 2019 to 2022. The new york CitiBike dataset includes order numbers, user pick-up and return times, pick-up and return stations, user IDs, user gender, rental times, etc. FIG. 2 depicts the study area of New York sharing bicycle; fig. 3 depicts sunrise demand time distribution characteristics for regional sharing bicycles specified in 2019, 2020, and 2021.

Step S2: and analyzing the shared bicycle data in the designated area, and constructing a shared bicycle travel network based on the shared bicycle data. The points in the travel network are shared bicycle stations, and the OD (optical density) quantity among the shared bicycle stations is the network side weight. Fig. 4 reflects the OD travel distribution of the sharing bicycle at different time periods for a given time period. The sharing list mainly travels in a short distance, so that the sharing list is concentrated in Manhattan urban areas for more travel. The designated time period has relatively little effect on the amount of travel of the shared bicycle. As shown in fig. 4 (a) and 4 (b), the daily trip amount of the shared bicycle is 58869 before the designated time period, 784 stations are provided, and the daily trip amount of the shared bicycle is 22758 in the period of 2020, which is serious in the designated time period, and is reduced by 61.34%. In the recovery stage of the appointed time period, the travel amount of the shared bicycle is increased. In particular, the number of newly built stations is large, and the service range of the shared bicycle is further expanded. Of these, the requirement for sunrise in 2022, 4, is 77320, with 1547 sites.

Step S3: index quantification trip is constructed through space statistics and complex network method

Table 1 is travel law change of the shared bicycle network in the front, middle and later rest days and working days of a specified period. On weekdays, the degree of nodes of the shared bicycle network decreases, indicating poorer connectivity between areas, affected by the specified time period. The degree distribution of the shared bicycle network is reduced in the recovery period of the designated time period, which means that the connection degree of nodes in the shared bicycle network is reduced in the later period of the designated time period. The shared bicycle system has more newly added stations after a specified time period, and has 784 stations in 2019, 887 stations in 2020 and 1396 stations in 2021. The shared bicycle network node strength is also somewhat reduced under the influence of the designated time period on the network strength index. The larger clustering coefficient indicates that the network may have more node groups with larger node connection degree, the clustering coefficient of the shared bicycle network is smaller, and the connection between stations is not very tight. The shared bicycle network has an inflow greater than an outflow before a specified period of time, whereas an outflow greater than an inflow after the specified period of time. Over the network spatial signature distribution, the Moran index of the shared bicycle network is 0.6996 before a given time period, a larger Moran's I value indicates more pronounced spatial correlation, and the Moran index is reduced for periods of severe given time period. As the specified time period is restored, the shared bicycle network spatial correlation decreases. The degree and strength of nodes in the holiday shared bicycle network are reduced compared to the workday.

Table 1 average metric index value for shared bicycle network

The Fruchterman Reingold layout is used to display the volume and graphical structure in a shared bicycle system, see fig. 5. The structure of the figure has significant multi-core, multi-cluster features. The dot size shows the degree of the node, which is the sum of the ingress and egress degrees. The gray scale of the line color indicates the magnitude of the OD quantity between stations, and the brighter the color, the greater the OD quantity. The results show that the characteristics of the graph structure are different at different stages of the designated time period. The number of the stations of the multi-cluster before the appointed time period is more, and the degree of sharing the single-car stations is larger. The number of sites of the multi-cluster is significantly reduced during a given period of time, as is the degree of closeness of the connections between the nodes. The degree distribution of the shared bicycle stations is not obviously increased at the later stage of the designated time period, and the links among the stations are not close before the designated time period, which is related to more newly built stations.

Step S4: clusterGCN mining community structure based on graph neural network model

Community cluster analysis was performed with the clusterin algorithm based on the 2019 to 2022 designated area sharing bicycle data. Citibike, new York, is mainly focused on the Manhattan region. Table 2 shows the modularity index of community division, and the modularity value is between 0.3 and 0.7, so that the clustering effect is better. The modularity of the shared bicycle system is larger in 1 month, which indicates that the travel network is more tightly connected, and the modularity is smaller in 4 months. The method is characterized in that the travel of the shared bicycle is greatly influenced by weather, and the travel of the shared bicycle is gathered in the community in winter when the air temperature is lower. When the weather is gradually warmed, the travel modes of users become more diversified, and travel among communities is gradually increased.

The community structure divided based on the shared bicycle data is shown in fig. 6, and the community structure is represented by different symbols. As can be seen from fig. 6, there is a certain variation in the community division structure in different time periods, and two communities of manhattan and one community of queen region in fig. 6 are then merged into one large community, and the change of the area structure reflects the change of the travel characteristics of the user in different stages and the change of the interaction characteristic intensity with other areas. The fusion of the internal trips of each administrative area is promoted during a specified period of time, and the areas of the riding connection are more local, thereby forming a cluster community of smaller extent that is highly consistent with the geographic administrative area.

TABLE 2ClusterNet community discovery modularity index

In summary, the embodiment of the invention constructs the network based on the travel data of the shared bicycle in different periods, and analyzes the space-time distribution dynamic changes of the travel demands in different periods. And quantifying travel by using space statistics and a complex network method to construct indexes, and revealing the intrinsic law of the travel mode change of the user.

The invention provides an end-to-end deep learning model ClusterGCN, which adopts a graph convolution neural network and Kmeans algorithm to carry out community division and iterative optimization on a user dynamic travel network, and combines the learning of a network topological structure and node characteristics, thereby effectively identifying communities existing in a shared bicycle travel network and the changes of the community structure.

The method adopts the graph neural network to extract the multisource characteristics, trains the neural network through iterative optimization, is easy to understand and calculate, and has strong applicability.

The method provided by the invention can be used for knowing the travel demands, travel behaviors and travel community structure changes of the user, and providing better service for the travel of the passenger in the last kilometer.

Those of ordinary skill in the art will appreciate that: the drawing is a schematic diagram of one embodiment and the modules or flows in the drawing are not necessarily required to practice the invention.

From the above description of embodiments, it will be apparent to those skilled in the art that the present invention may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present invention.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, with reference to the description of method embodiments in part. The apparatus and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. The utility model provides a sharing bicycle trip network community mining method based on deep learning, which is characterized by comprising the following steps:

2. The method of claim 1, wherein the analyzing and counting the road network data information, the shared bicycle data information and the travel data of the user in the designated area to construct the transportation travel network comprises:

a directional transportation travel network G= (V, E, W) is constructed based on shared bicycle data of the front, middle and rear periods of a specified time period, wherein V= { V ₁ ，...，V _N The set of points, e= { E } represents _ij |i，j＝1，2，...，N，i≠j}，e _ij =1 indicates that there is an edge between node i and node j, e _ij =0 means that there is no connecting edge between node i and node j, w= { W _ij I, j=1, 2,..n, i+.j }, is a set of weights, w _ij Representing edge e _ij I.e. the amount of travel of the edge between node i and node j.

3. The method of claim 2, wherein quantitatively describing the travel characteristics of the users in the shared bicycle travel network, and constructing the graphic neural network model by using the user travel quantitative index as the network metric index of the graphic neural network model comprises:

c is the clustering coefficient of the network, C _i Is the clustering coefficient of node i, e _i The number of the connected edges between the neighbor nodes of the node i;

c _i PageRank value, c, for the ith node _i ∈[0，1]P is the damping coefficient,indicating the degree of egress of the jth node, wherein the degree of egress of a node in a complex network refers to the number of connections from a certain node, i.e. the number of edges pointing from the node to other nodes, a _ji The adjacency matrix of any directed network is, the higher the PageRank value of the node is, the more important the node is;

4. The method of claim 3, wherein the performing community cluster analysis based on the neural network model by using a clusterin algorithm to mine community structures comprises:

building a graph neural network model based on shared bicycle travel data, and extracting travel connection relation between each pair of stationsThe trip amount of each pair of the starting station and the end station is taken as an adjacent matrix, and the adjacent matrix is taken as the characteristic of the network side of the graph convolution, thereby defining the adjacent matrixDescribing the space connection relation between stations, defining a characteristic matrix gamma by taking longitude and latitude of the stations, the number of stations 'departure day and passenger flows, the number of stations' arrival day and passenger flows, the number of hours 'departure passenger flows, and the statistics of the arrival passenger flows of the hours' stations as the characteristics of the stations _t The characteristic matrix gamma _t As input data for the clusterit algorithm;

where i and j are any two nodes in the graph, A when the two nodes are directly connected _i，j =1, otherwise a _i，j ＝0。d _i Is the degree of point i. Delta (r) _i ，r _j ) Is used to determine whether nodes i and j are in the same community, delta (r) _i ，r _j ) =1, otherwise δ (r _i ，r _j )＝0；