CN110570035B

CN110570035B - People flow prediction system for simultaneously modeling space-time dependency and daily flow dependency

Info

Publication number: CN110570035B
Application number: CN201910821133.8A
Authority: CN
Inventors: 臧天梓; 朱燕民
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2019-09-02
Filing date: 2019-09-02
Publication date: 2023-04-07
Anticipated expiration: 2039-09-02
Also published as: CN110570035A

Abstract

A human flow prediction system that simultaneously models spatiotemporal dependencies and daily flow correlations, comprising: the system comprises an ST encoder module, an FR encoder module and a decoder module, wherein the system uses the dependency and the time dependency of the convolution long-short term memory network on the adjacent space regions at the same time, hierarchically captures the space dependency of different ranges on the geographic space by overlapping the convolution neural network, simultaneously generates a complete graph reflecting the traffic correlation between each region in the city range based on the daily traffic change mode of the regions, generates a fixed-dimension vector representation for each region by a graph embedding method, and generates a prediction result of the human traffic by a layer of long-short term memory network and two layers of deconvolution neural networks.

Description

People flow prediction system for simultaneously modeling space-time dependency and daily flow dependency

Technical Field

The invention relates to a technology in the field of artificial intelligence application, in particular to a people flow prediction system for simultaneously modeling time-space dependency and daily flow dependency.

Background

The problem of pedestrian flow prediction is a complex space-time sequence prediction problem influenced by various factors, and the existing research can be roughly divided into three types. The first type of study uses a Recurrent Neural Network (RNN) to capture complex time-domain correlations, but in this type of study, regions are independent of each other, thus ignoring the correlation between regions. The second type of research considers the relevance of the space domain, and adopts a Convolutional Neural Network (CNN) to solve the problem of pedestrian flow prediction, however, the CNN cannot capture the variation information in the time domain. The third category of studies attempts to simultaneously model spatio-temporal correlations and proposes structures such as the convolutional long short-term memory network (ConvLSTM) to capture such correlations simultaneously.

Disclosure of Invention

In order to improve the prediction accuracy of the urban-wide pedestrian flow multi-step prediction problem, in particular to predict the number of vehicles/people entering or leaving each area of a city in a plurality of time periods in the future of the city, the technical problems to be solved comprise: 1) How to design a reasonable structure and model the time domain and space domain dependencies at the same time; 2) Under the condition that the construction of urban rapid transit vehicles (such as subways and expressways) is mature, the pedestrian flow of one area is influenced by the regional flow in a larger spatial range, and how to design the spatial dependence of an effective structural modeling in different size ranges; 3) The daily flow change rule of the area is relatively fixed, and areas with quite relevant change rules exist in the city range, so that the correlation based on the daily flow change is effectively modeled. The invention provides a people flow prediction system for simultaneously modeling space-time dependence and daily flow dependence aiming at the problems, and based on observation, the daily flow change mode of each area in the geographic space is relatively fixed, and even two areas which are far away from each other in the space have quite similar flow change modes, and the similarity reflects that stronger correlation exists between the two areas.

The invention is realized by the following technical scheme:

since the traffic in a region is affected not only by the traffic in its surrounding regions, but also by the traffic in some time period before it, convLSTM was used to capture the spatiotemporal correlations simultaneously.

Due to the construction of urban rapid transit, the pedestrian volume of one area is influenced by the area volume in a larger space range and the size of a convolution kernel, and a single-layer ConvLSTM network can only capture the mutual influence among the areas close to the space, so that the spatial dependencies of different ranges in the geographic space are captured hierarchically by superposing CNNs.

In order to capture the daily flow change law of the areas and the similarity of the flow change laws between the areas and simultaneously model the change laws and the similarity, the invention firstly generates a complete graph reflecting the flow correlation between each area in the city-wide range based on the daily flow change mode of the areas and generates a fixed-dimension vector representation for each area by a graph embedding method, so that the areas with the similar flow change laws are closer to each other in the generated vector space.

The invention relates to a people flow prediction system for simultaneously modeling space-time dependency and daily flow dependency, which comprises the following components: a time-domain-Space (ST) encoder module, a region Feature (FR) encoder module, and a decoder module, wherein: the ST encoder module models the dependency of different size space ranges hierarchically through two layers of convolutional neural networks and models the dependency of a time domain through one layer of convolutional long and short term memory networks, and outputs the obtained fixed dimensional vector representation which captures the time-space dependency at the same time to the decoder module; the FR coder module calculates the flow change similarity between each region based on the historical flow data sequence and generates a complete graph to be output to the decoder module; the decoder module generates a prediction result of the pedestrian volume through a layer of long-short term memory network and two layers of deconvolution neural networks.

The historical flow data sequence refers to: each zone had 24 hours of flow conditions per day.

In the complete graph, a representation vector is generated for each area through a graph embedding method, so that areas with similar flow change have closer representation vectors.

Each node in the complete graph represents a region, and each edge represents the flow change correlation of two corresponding regions.

The invention designs a prediction method based on the system, a historical flow data sequence is used as input in an encoding stage, an ST encoder module and an FR encoder module generate a fixed-dimension expression vector, and a prediction sequence of the flow in a future time period is generated in a decoding stage through a ConvLSTM and a deconvolution network of a decoder module according to the generated vector.

The modeling space-time dependency refers to that: to simultaneously model spatio-temporal correlations, the ST encoder block design uses a convolutional long short term memory network (ConvLSTM) that captures spatial features with intrinsic convolution operations while preserving the temporal correlation properties that a general cyclic neural network can capture.

In order to hierarchically capture spatial dependencies of different size ranges while minimizing computational complexity and training time, two layers of Convolutional Neural Networks (CNN) are added at the bottom of ConvLSTM to capture dependencies over a larger spatial range.

The modeling daily flow correlation is as follows: in order to model the correlation based on the daily flow change, the FR coding module firstly extracts the 24-hour daily flow change rule of each region according to historical daily flow data, calculates the correlation between the regions according to the extracted mode, generates a complete graph with the regions as nodes and the daily flow correlation between the regions as edges, and generates a unique fixed-dimension potential representation vector for each region by using a graph embedding method, and meanwhile, ensures that the region with stronger daily flow correlation has closer distance in a vector space.

Technical effects

Compared with the prior art, the method fully considers the influence of time domain and spatial domain correlation on the pedestrian volume of one region and the spatial domain correlation of different spatial sizes in the whole city range, and simultaneously captures the correlation of the flow change among the regions based on the daily flow change rule of the regions, thereby obtaining higher prediction precision on the multi-step prediction problem of the pedestrian volume in the whole city range.

Drawings

FIG. 1 is a flow chart of a city pedestrian volume prediction system;

fig. 2 is a schematic diagram of the system architecture.

Detailed Description

As shown in fig. 2, a prediction system with dual encoder modules based on codec framework proposed for the present embodiment example for the multi-step prediction problem of urban-wide pedestrian volume includes:

(1) an ST encoder module: the system comprises two layers of convolutional neural networks for hierarchically capturing the spatial dependencies in different ranges, one layer of convolutional long-term and short-term memory network for simultaneously capturing the time domain and space domain dependencies, and an input human flow tensor sequence is coded into a fixed-dimension expression vector;

(2) an FR encoder module: capturing the correlation of daily flow change among regions, firstly, calculating the similarity of the flow change among the regions to obtain the daily flow correlation among the regions, generating a complete graph which takes the regions as nodes and takes the flow change correlation among the regions as edges, and then generating a unique expression vector for each region by using a graph embedding method;

(3) a decoder module: based on the fixed-dimension expression vectors generated by the two encoder modules, a tensor sequence expressing flow prediction in a plurality of time periods in the future is generated through a layer of convolution long-term and short-term memory network and two layers of deconvolution neural networks.

Before the data are applied to the proposed prediction system, the trajectory data are preprocessed, and the specific steps include:

1) Dividing a city into grid maps with the size of M multiplied by N based on geographic coordinates, wherein each grid represents an area;

2) The raw data is of the form (p) ^start ，p ^end ，t ^start ，t ^end ) Represents a piece of track data at t ^start At a time of p ^start As a starting point, at t ^end At a time of p ^end For the track of the end point, converting the track data into inflow and outflow of an area in a certain time period, specifically:

wherein: />

And &>

Respectively, in the t-th time period(s) _t ，e _t ]Inner, region r _ij The inflow and outflow of (c); the system combines the inflow and outflow of all areas of the whole city together to form the full-city traffic tensor on the t-th time period>

Sequence of flow tensors a time periods before the current time (Δ;) _t I T = T-a + 1., T-1, T } is input, predicting the flow tensor sequence { Δ ] in B time periods after prediction _t |t＝T+1，T+2，...，T+B}。

Since the construction of urban mass transit systems (such as subways and highways) allows people to move large spatial distances in a short time, spatial dependencies may exist between regions that are relatively far away in geographic space, but the convolution operation in ConvLSTM can only capture the dependencies of regions that are very close in space (limited by the size of the convolution kernel). The embodiment adopts two layers of Convolutional Neural Networks (CNN) added at the bottom of the ConvLSTM layer, and can reduce the required calculation amount and training time as much as possible while capturing a wider range of spatial dependencies hierarchically.

At a particular point in time T, the input to the ST encoder block is a sequence of flow tensors for the first A time periods { Δ } _T-A+1 ，...，Δ _T-1 ，Δ _T Two layers of CNN in the ST encoder module extract the spatial dependency with different ranges in a layering way to generate corresponding feature expression tensors

The method specifically comprises the following steps: />

Wherein: t = T-a + 1., T-1, T,. Denotes a convolution operation, f (·) denotes a non-linear activation function, W and b are parameters to be learned, shared by the regions at a specific point in time T.

The ST encoder module is further provided with a ConvLSTM layer, the feature expression tensors generated by the CNN layers enter the ConvLSTM layer in sequence as input to further capture space-time dependency, and due to the existence of convolution operation in the ConvLSTM unit, the state updating of the ConvLSTM unit is not only influenced by the input at the current moment and the previous state, but also influenced by the neighbors of the ConvLSTM unit.

The state updating means that: at a particular step t, convLSTM expresses the tensor with the corresponding characteristic

Is input and decides whether to activate its input gate i by a convolution operation _t Forgetting door f _t And an output gate o _t And how to update its memory status>

And hidden state>

Initial state>

And &>

Is initialized to 0 to ensure that the system is "completely unaware" of future conditions; the corresponding update formula for the ConvLSTM cell is as follows: />

Wherein: t =1,2,. A @>

And &>

Is a memory state and a hidden state corresponding to the t step, respectively>

Represents the external input, and ConvLSTM (·) represents the update formula of ConvLSTM, specifically:

i _t ＝σ(W _xi *X _t +W _hi *H _t-1 +b _i ）

f _t ＝σ(W _xf *X _t +W _hf *H _t-1 +b _f )

o _t ＝σ(W _xo *X _t +W _ho *H _t-1 +b _o )

wherein: * Which represents a convolution operation, the operation of the convolution,

denotes the Hadamard product, σ (-) denotes the sigmoid activation function, and W and b are parameters to be learned.

The FR encoder module for modeling the correlation of the daily flow firstly extracts the daily flow change rule of each area, and preferably averages historical flow data in hours by extracting a daily flow change mode to obtain a flow change mode of one area in order to reduce the influence of abnormal values and noise; then calculating the similarity of flow change among the regions based on the extracted mode, thereby obtaining the daily flow correlation among the regions; finally, a unique representation vector is generated for each region, i.e., a region representation vector, to maintain this correlation information.

The extraction of the daily flow change mode is as follows: for region r _ij All historical flow data of the area is used to extract the flow change pattern. In order to reduce the depiction error caused by the great difference between the flow change of the working day and the flow change of the weekend, the system respectively carries out average value processing on the working day data and the weekend data of each area, so that a certain area has 96 values, wherein 48 values represent the average value of inflow and outflow of 24 hours a day of the working day, the other 48 values correspond to the weekend, and the 96 values are represented as one-dimensional vectors by the system

Because the population cardinality of each area is different, although two areas have quite similar flow change rules, the vector values of the two areas have great difference, and in order to eliminate the influence, the system carries out regularization processing on the vector of each area:

I _max ＝max(v _ij )，I _min ＝min(v _ij ) K =0,1, 2.., K-1, wherein: (v) _ij ) _k Denotes the kth value of the vector, K denotes the length of the vector, here 96.

The correlation of daily flow among the areas refers to that: calculating the correlation between the areas based on the daily flow change mode, and expressing the correlation by taking the reciprocal value of the Euler distance of the corresponding vectors of the two areas, specifically as follows:

wherein: i, p =0,1, 2., M-1j, q =0,1, 2., N-1, k is the length of the vector, α is set to 0.01 to avoid the case where the denominator is 0, and M and N represent the number of rows and columns of the grid map, respectively.

In this way, the daily flow correlation c (v) between any two regions is obtained _ij ，v _pq ) It is considered as a complete graph G (V, E), where the node V represents a region and the weight of an edge represents the correlation between two corresponding regions based on the daily traffic variation pattern.

The region represents a vector, and a unique vector representation is generated for each region through a graph embedding method (LINE), namely given a graph G (V, E), the LINE generates a vector representation for each node V (namely, a region) with the original graph structure preserved

The vector representation thus generated preserves the correlation of daily traffic between regions, and vectors that are closer in vector space indicate stronger correlation between regions.

The vectors of all regions together constitute a tensor

F will be concatenated with the final memory state, hidden state of ConvLSTM in the ST encoder module, together as the initial state of ConvLSTM in the decoder module: />

/>

Wherein: />

Denotes a connection operation, C ^e And H ^e Is the final state of ConvLSTM in the ST encoder block.

The decoder module is used for generating a final prediction result tensor sequence, and dynamic space-time dependency is captured by combining vectors generated by the ST encoder module through connection operation and vectors generated by the FR encoder module keep relatively fixed daily flow correlation among areas; decoding in the time dimension by a layer of ConvLSTM; and generating a prediction result tensor sequence with the same size as the input sequence through two layers of deconvolution neural networks.

The time dimension decoding refers to: initial state according to ConvLSTM

And &>

At a particular step t, convLSTM, which has no other external inputs into the decoder module, whether or not to activate its gate valves depends on the status of the previous step->

And &>

The corresponding update formula is: />

The prediction result tensor sequence is output based on ConvLSTM for the T step at a specific time point T through two layers of deconvolution neural networks

Two layers of deconvolution neural networks will generate prediction tensor

Wherein: t =1,2,. -, B,. Based on the measured signal strength>

Representing the deconvolution operation, W and b represent the parameters that need to be learned.

The tensor sequence of the prediction result, namely the tensor sequence { delta ] representing the human flow in B time periods after the T moment _T+1 ，Δ _T+2 ，...，Δ _T+B }。

The training process of all parameters involved in the model including the ST coding module, the FR coding module and the decoding module specifically comprises the following steps: since the goal of this embodiment is to predict the traffic in the future B time period for each region of the city as accurately as possible, the system will minimize the predicted value when training the system

And the true value->

Root mean square error between updates the parameter in the system with a penalty function of @>

Wherein: theta represents all parameters in the system that need to be learned and B is the number of steps that need to be predicted backwards.

The data sets used in the course of the embodied experiments were two real trajectory data sets collected in london city, referred to as BikeNYC and taxinc, respectively, where BikeNYC includes 2.9 million bicycle trajectory records from 2013, 7-1 st day up to 2016, 6-30 th day up to 2016, and taxinc data sets include 10 hundred million taxi trajectory records from 2009, 1 st day up to 2015, 12 th day up to 31 st day.

The parameters are specifically set as: for the convolutional neural network, the deconvolution neural network and the convolutional long and short term memory network, the sizes of convolution kernels are all set to be 3 x 3, the feature numbers of the two layers of convolutional neural networks are respectively set to be 8 and 16, and when the convolution operation is carried out, the step length in the TaxinYC data set is set to be 2 and the step length in the BikenYC is set to be 1. The states in the long and short term memory network include 64 cells, and the length of the representative vector generated by the FR encoder module for each region is set to 64 and 80 for the taxinc and BikeNYC datasets, respectively. Batch processing techniques were employed in training the model and the number of each batch was set to 16. And (3) adopting an Adam optimizer when the parameters are updated, and respectively setting the learning rates to be 0.001 and 0.0005 on the two data sets of taxi and BikenYC.

All experiments in this example were performed on a server equipped with NVIDIA Tesla K80 accelerator, with an RMSE value of 25.604 in the taxinc dataset and 7.668 in the BikeNYC dataset. Compared with the prior art, the system reduces the RMSE by 18.20 percent and 7.31 percent respectively.

The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims

1. A human flow prediction system that simultaneously models spatiotemporal dependencies and daily flow correlations, comprising: an ST encoder module, an FR encoder module, and a decoder module, wherein: the ST encoder module hierarchically models the dependency of space ranges with different sizes through two layers of convolutional neural networks and models the dependency of a time domain through one layer of convolutional long-short term memory networks, and the obtained fixed-dimension vector representation which simultaneously captures the space-time dependency is output to the decoder module; the FR coder module calculates the flow change similarity between each region based on the historical flow data sequence and generates a complete graph to be output to the decoder module; the decoder module generates a prediction result of the pedestrian flow through a layer of long-short term memory network and two layers of deconvolution neural networks;

generating a representation vector for each region in the complete graph by a graph embedding method, so that the regions with similar flow variation have closer representation vectors;

each node in the complete graph represents one area, and each edge represents the flow change correlation of two corresponding areas;

the ST encoder module: the method comprises the following steps that two layers of convolutional neural networks are used for hierarchically capturing the spatial dependencies with different ranges, one layer of convolutional long-term and short-term memory network is used for simultaneously capturing the dependencies of a time domain and a space domain, and an input human flow tensor sequence is coded into a fixed-dimension expression vector, and the method specifically comprises the following steps:

2) The raw data is of the form (p) ^start p ^end ，t ^start ，t ^end ) Represents a piece of track data at t ^start At a time of p ^start As a starting point, at t ^end At a time of p ^end For the track of the end point, converting the track data into inflow and outflow of an area in a certain time period, specifically:

wherein: />

And &>

Sequence of flow tensors a time periods before the current time (Δ;) _t L T = T-A +1,., T-1, T } is input, and the flow tensor sequence { Δ in B time periods after prediction _t |t＝T+1，T+2，...，T+B}；

The ST encoder module inputs a sequence (delta) composed of the flow tensors of the previous A time periods at the time point T _T-A+1 ，...，Δ _T-1 ，Δ _T Two layers of CNN in the ST encoder module extract the spatial dependency with different ranges in a layering way to generate corresponding feature expression tensors

The method specifically comprises the following steps: />

Wherein: t = T-a + 1., T-1, T,. Denotes a convolution operation, f (·) denotes a nonlinear activation function, W and b are parameters to be learned, and T is segmented at a specific time pointDomain sharing;

the ST encoder module is further provided with a ConvLSTM layer, the feature expression tensors generated by the CNN layers enter the ConvLSTM layer in sequence as input to further capture the space-time dependency, and due to the existence of convolution operation in the ConvLSTM unit, the state updating of the ConvLSTM unit is not only influenced by the input at the current moment and the previous state, but also influenced by the neighbors of the ConvLSTM unit;

in the convolution long-short term memory network, convLSTM expresses tensor by corresponding characteristics in a specific step t

Is input and decides whether to activate its input gate i by a convolution operation _t Door f for forgetting to leave _t And an output gate o _t And how to update its memory status>

And hidden state->

Initial status->

And &>

Wherein: t =1,2,. A @>

And &>

Is a memory state and a hidden state corresponding to the t step, respectively>

Represents the external input, and ConvLSTM (·) represents the update formula of ConvLSTM, specifically: i.e. i _t ＝σ(W _xi *X _t +W _hi *H _t-1 +b _i )，f _t ＝σ(W _xf *X _t +W _hf *H _r-1 +b _f )，o _t ＝σ(W _xo *X _t +W _ho *H _t-1 +b _o )，/>

Wherein: * Represents a convolution operation, <' > or>

Representing a Hadamard product, sigma (-) representing a sigmoid activation function, and W and b are parameters needing to be learned;

the FR encoder module: capturing the correlation of daily flow change among regions, firstly, calculating the similarity of the flow change among the regions to obtain the daily flow correlation among the regions, generating a complete graph which takes the regions as nodes and takes the flow change correlation among the regions as edges, and then generating a unique expression vector for each region by using a graph embedding method;

the decoder module makes the vectors of all the regions form tensor together

And the final memory state and the hidden state of the ConvLSTM in the ST encoder module are spliced together to be used as the initial state of the ConvLSTM in the decoder module:

wherein: />

Denotes a connection operation, C ^e And H ^e Is the final shape of ConvLSTM in ST encoder blockState; based on the fixed-dimension expression vectors generated by the two encoder modules, generating a prediction result tensor sequence with the same size as the input sequence through a layer of convolution long-term and short-term memory network and two layers of deconvolution neural networks after decoding in a time dimension;

the time dimension decoding refers to: according to the initial state of ConvLSTM

And &>

And &>

The corresponding update formula is: />

The two-layer deconvolution neural network generates a prediction result tensor>

Wherein: t =1,2,. -, B, -, p->

Representing a deconvolution operation, W and b representing learningA sequence of predicted tensor, i.e. a sequence of tensors { Δ ] representing the flow of the person in B time periods after time T _T+1 ，Δ _T+2 ，...，Δ _T+B }。

2. A prediction method based on the system of claim 1 is characterized in that an encoding stage takes a historical flow data sequence as input, an ST encoder module and an FR encoder module generate a representation vector with fixed dimension, and a decoding stage generates a prediction sequence of the flow in a future time period through ConvLSTM and a deconvolution network of a decoder module according to the generated vector.

3. The method of claim 2, wherein the pre-processing of the trajectory data prior to applying the data to the proposed prediction system comprises:

wherein: />

And &>

Respectively, in the t-th time period(s) _t ，e _t ]Inner, region r _ij The inflow and outflow of (c); the system will be integratedThe inflows and outflows of all areas of each city are combined to form a traffic tensor on the whole city at the t-th time period>

4. Method according to claim 2, characterized in that at a specific point in time T, the input to the sT coder module is a sequence { Δ ] consisting of the flow tensors of the first a epochs _T-A+1 ，...，Δ _T-1 ，Δ _T Two layers of CNN in the ST encoder module extract spatial dependencies of different ranges in a layered manner to generate corresponding eigen-expression tensors

The method specifically comprises the following steps: />

Wherein: t = T-a + 1., T-1, T, · denotes a convolution operation, f (·) denotes a nonlinear activation function, W and b are parameters that need to be learned, at a particular point in time T, shared by the regions.

5. The method of claim 2, wherein the flow rate variation pattern of a region is obtained by extracting a daily flow rate variation pattern and averaging the historical flow rate data in hours; then calculating the similarity of flow change among the regions based on the extracted mode, thereby obtaining the daily flow correlation among the regions; finally, generating a unique representation vector for each region, namely a region representation vector to maintain the correlation information;

the extraction is performed dailyThe flow rate change mode is as follows: for region r _ij In order to reduce the depiction error caused by the huge difference of the flow change between the working day and the weekend, the system respectively averages the working day data and the weekend data of each area, so that a certain area has 96 values, wherein 48 values represent the average value of inflow and outflow 24 hours a day on the working day, and the other 48 values correspond to the weekend, and the system represents the 96 values as one-dimensional vectors

6. The method of claim 5, wherein the vector for each region is regularized by:

I _max ＝max(v _ij )，I _min ＝min(v _ij ) K =0,1, 2.., K-1, wherein: (v) _ij ) _k Represents the kth value of the vector, K represents the length of the vector;

wherein: i, p =0,1, 2., M-1j, q =0,1, 2., N-1, k is the length of the vector, α is set to 0.01 to avoid the case where the denominator is 0, and M and N represent the number of rows and columns of the grid map, respectively;

said regions representing vectors by means of a map embedding method LIThe NE generates a unique vector representation for each region, i.e. given the graph G (V, E), the LINE generates a vector representation that preserves the original graph structure for each node V corresponding to each region

The vector representation thus generated preserves the correlation of daily traffic between regions, and a more recent vector in vector space indicates a stronger correlation between regions;

the vectors of all regions together constitute a tensor

Wherein: />

7. The method as claimed in claim 4, wherein the parameters to be learned are trained by: by minimizing the predicted value

And true value>

Between to update parameters in the system, with a loss function of

Wherein: theta represents all parameters in the system that need to be learned and B is the number of steps that need to be predicted backwards. />