CN110610232A - Long-term and short-term traffic flow prediction model construction method based on deep learning - Google Patents

Long-term and short-term traffic flow prediction model construction method based on deep learning Download PDF

Info

Publication number
CN110610232A
CN110610232A CN201910855977.4A CN201910855977A CN110610232A CN 110610232 A CN110610232 A CN 110610232A CN 201910855977 A CN201910855977 A CN 201910855977A CN 110610232 A CN110610232 A CN 110610232A
Authority
CN
China
Prior art keywords
term
model
lstnet
layer
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910855977.4A
Other languages
Chinese (zh)
Inventor
沈强儒
施佺
曹阳
曹慧
葛文璇
汤天培
刘志杰
邱礼平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong University
Original Assignee
Nantong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong University filed Critical Nantong University
Priority to CN201910855977.4A priority Critical patent/CN110610232A/en
Publication of CN110610232A publication Critical patent/CN110610232A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a long-term and short-term traffic flow prediction model construction method based on deep learning, namely a long-term and short-term time sequence network, a convolution neural network and a recursion neural network are used for extracting short-term local dependence modes among variables and generating long-term modes of time sequence trends. The present invention significantly improves the latest results of time series predictions for multiple reference data sets by combining the strength and autoregressive components of convolutional and recurrent neural networks. Through intensive analysis and empirical research, the efficiency of the LSTNet model architecture is shown, short-term and long-term repetitive patterns in data are successfully captured, and a linear model and a nonlinear model are combined to carry out robustness prediction.

Description

Long-term and short-term traffic flow prediction model construction method based on deep learning
Technical Field
The invention belongs to the field of traffic models, and particularly relates to a long-term and short-term traffic flow prediction model construction method based on deep learning.
Background
Multivariate time series prediction is an important machine learning problem in many fields, including prediction of solar power plant energy output, power consumption, and traffic congestion. The temporal data that occurs in these real-world applications typically involves a mixture of long-term and short-term patterns for which traditional methods such as autoregressive models and gaussian processes may fail.
Deep neural networks are also receiving increasing attention in time series analysis. Previously, there has been a focus on time series classification, i.e. the task of automatically assigning class labels to time series entries. For example, RNN architectures have been studied to extract information patterns from healthcare sequence data and classify the data according to diagnostic categories. RNN architectures have also been applied to mobile data for performing actions or activity classification on input sequences. CNN models are also used for motion, activity recognition, for extracting shift-invariant local patterns from input sequences as features of classification models.
Deep neural networks have also been investigated for time series prediction, i.e., the larger the task-horizon is, the more difficult it is to predict unknown time series in future time series using time series observed in the past. This effort ranged from early work with RNN models and mixed models combining ARIMA and multi-layered perceptron (MLP) to the recent application of vanilla RNN and dynamic boltzmann machines in time series prediction.
One of the most prominent univariate time series models is the autoregressive integrated moving average (ARIMA) model. The popularity of the ARIMA model is due to its statistical properties and the well-known Box-Jenkins method in the model selection process. The ARIMA model is not only applicable to various exponential smoothing techniques, but is also flexible enough to encompass other types of time series models, including Autoregressive (AR), Moving Average (MA), and autoregressive moving average (ARMA). The ARIMA model, however, includes variables used to model long-term time dependencies and is rarely used for high-dimensional multivariate time series prediction due to its high computational cost.
On the other hand, Vector Autoregressive (VAR) is the most widely used model in multivariate time series. The VAR model naturally extends the AR model to multivariate settings, which neglects the dependencies between the output variables. Significant advances have been made in recent years in various VAR models, including elliptical VAR models for heavy tail time series and structured VAR models to better explain the dependencies between high-dimensional variables. More. However, the model capacity of the VAR grows linearly in the size of the time window and is quadratic in the number of variables. This means that inherited large models tend to over-fit when dealing with long-term temporal patterns. To alleviate this problem, the original high-dimensional signal is reduced to a low-dimensional hidden representation, then VAR is applied for prediction, and various regularization options are provided.
The time series prediction problem can also be treated as a standard regression problem with time varying parameters. Therefore, it is not surprising that various regression models with different loss functions and regularization terms are applied to the time series prediction task. For example, linear Support Vector Regression (SVR) learns the maximum edge hyperplane based on regression loss, where the hyperparameter epsilon controls the threshold of the prediction error. Ridge regression is another example, and can be re-overlaid from an SVR model by setting ε to zero. Finally, the LASSO model is applied to encourage sparsity in the model parameters so that interesting patterns between different input signals can be displayed. These linear methods are actually more efficient for multivariate time series prediction due to the high quality ready-to-use solvers in the machine learning community. However, like VARs, these linear models may not be able to capture the complex non-linear relationships of multivariate signals, resulting in their inefficient performance.
The Gaussian Process (GP) is a non-parametric method for modeling distributions over a continuous functional domain. This is in contrast to models defined by parameterized function classes such as VAR and SVR. GP can be applied to the multivariate time series prediction task proposed in the above, and can be used as a priori of the function space in bayesian inference. For example, a full bayesian approach is proposed where GP priors are nonlinear state space models that can capture complex dynamic phenomena. However, the powerful function of the gaussian process comes at the cost of high computational complexity. Direct implementation of the gaussian process for multivariate time series prediction has cubic complexity exceeding the number of observations due to matrix inversion of the kernel matrix.
Disclosure of Invention
The purpose of the invention is as follows: in order to overcome the defects of the prior art, the invention provides a long-term and short-term traffic flow prediction model construction method based on deep learning.
The technical scheme is as follows: a long-term and short-term traffic flow prediction model construction method based on deep learning comprises the following steps:
1. problem formulation:
the task of formulating multivariate time series predictions is:
a series of fully observed time series signals are given: y ═ Y1,y2,...yTIn which yt∈RnN is a variable dimension; to predict yT+hWhere h is the ideal horizon before the current timestamp, assume { y }1,y2,...yTAvailable; also, to predict the next timestamp yT+h+1Value of (c), assume { y }1,y2,...yT,yT+1Available; formulating an input matrix X at a timestamp TT={y1,y2,...yT}∈Rn×T(ii) a Selecting the range of the prediction task according to the requirement set by the environment;
2. convolution component:
the LSTNet framework is a deep learning framework, is specially designed for a multivariate time series prediction task, and mixes long-term and short-term modes; the first layer of the LSTNet architecture is a convolutional network without pools, which is extractedShort-term modes in the inter-dimension and local dependencies between variables, the convolutional layer consists of a number of filters of width ω and height n, the heights being set to be the same as the number of variables; the kth filter scans the input matrix X and generates: h isk=RELU(Wk*X+bk) Wherein: denotes convolution operation, output hkIs a vector, RELU function is RELU (x) max (0, x); each vector h of length T is formed by zero padding of the left hand side of the input matrix XkThe size of the output matrix of the convolutional layer is dcX T of which
dcRepresenting the number of filters;
3. frequent components:
the output of the convolution layer is simultaneously fed into a current component and a current-skip component, wherein the current component is a circulation layer with a gating circulation unit GRU, a RELU function is used as a hiding updating activation function, and the hiding state of a repeating unit at time t is calculated as:
rt=σ(xtWxr+ht-1Wxr+br);
ut=σ(xtWxr+ht-1Wxr+bu);
ct=RELU(xtWxc+rt⊙(ht-1Wxr)+br);
ht=(1-ut)⊙ht-1+ut⊙ct
wherein [ ] is the product of elements, [ sigma ] is the sigmoid function, xtIs the input of the layer at time t; the output of this layer is the hidden state of each timestamp;
4. a recursive skip component:
developing a loop structure with time hopping connections to extend the time span of the information stream and thereby simplify the optimization process, in particular, the skipping connections are added between the current hidden cell and the hidden cells at the same stage during the neighboring period; the update process can be expressed as:
rt=σ(xtWxr+ht-pWhr+br)
ut=σ(xtWxu+ht-pWhu+bu)
ct=RELU(xtWxc+rt⊙(ht-pWhc)+bc)
ht=(1-ut)⊙ht-p+ut⊙ct
the input to this layer is the output of the convolutional layer; p is the number of skipped hidden units, and the value of p can be easily determined for a data set with clear periodic pattern, otherwise, the value is adjusted; good tuning p can significantly improve the performance of the model, and furthermore, LSTNet can be easily extended to variables containing the hop length p;
frequent and frequent skip components of the output using a dense layer combination; the input to the dense layer includes the hidden state of the recursive component at the timestamp t (toRepresentation) and the hidden state of the recursive skip component p of the iterative skip component from timestamp t-p +1 to t, represented asThe output calculation formula of the dense layer is as follows:
wherein the content of the first and second substances,represents the prediction of the (upper) part of the neural network at the time stamp t;
5. time attention layer:
add attention mechanism that learns a weighted combination of implicit representations at each window position of the input matrix, in particular, the attention weight α at the current timestamp tt∈RqThe calculation is as follows:
wherein the content of the first and second substances,for matrix stacking with implicit representation of RNN columns, attncore is some similar function, such as dot product, cosine or parameterized by a simple multi-layered perceptron;
the final output of the temporal attention layer is a weighted context vector ct=HtαtAnd a last window hidden representationAnd a linear projection operation:
6. autoregressive component:
decomposing the final prediction of LSTNet into a linear part, which is mainly focused on local scaling problems, and a non-linear part containing recurring patterns; in the LSTNet architecture, a classical Autoregressive (AR) model is employed as the linear component; all dimensions have the same set of linear parameters, and the AR model is as follows:
then, by integrating the outputs of the neural network part and the AR component, the final prediction of LSTNet is obtained:
wherein the content of the first and second substances,representative modelFinal prediction at timestamp t;
7. establishing an objective function:
the squared error is the default loss function for many prediction tasks, and the corresponding optimization objective is expressed as:
wherein Θ represents a set of parameters of the model; omegaTRAIN;||·||FIs the Frobenius norm;
the objective function of linear support vector regression is:
ξt,i≥0;
wherein C and epsilon are hyper-parameters, excited by the remarkable performance of the linear support vector machine model, and an objective function of the hyper-parameters is taken as a substitute method of square loss and is incorporated into the LSTNET model; suppose ε is 01The above objective function is reduced to the following absolute loss function:
the advantage of the absolute loss function is that it is more sensitive to anomalies in the real-time sequence data, in the experimental part, a validation set is used to decide which objective function to use, whether the squared loss eq.7 or the absolute loss eq.9;
as an optimization: and (3) optimizing the strategy:
suppose the time series of inputs is Yt={y1,y2,...,ytDefine an adjustable window size q and re-represent the input t at the timestamp as Xt={yt-q+1,yt-q+2,...ytThen, thisThe problem becomes a problem with a set of characteristic value pairs { Xt,Yt+hThe regression task of (1) can be solved by a random gradient body surface method or a variable thereof.
Has the advantages that: the present invention proposes a new deep learning framework, namely long-term and short-term time series networks (LSTNet), which use Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) to extract short-term local dependency patterns between variables and develop long-term patterns of time series trends. In addition, the scale insensitivity problem of the neural network model is solved by utilizing the traditional autoregressive model. In the evaluation of actual data with complex mixtures of repeating patterns, LSTNet achieves significant performance improvements over several of the most advanced baseline methods. All data and experimental codes were available online.
Drawings
FIG. 1 is a schematic diagram of the long and short term time series network (LSTNet) of the present invention;
FIG. 2 is a schematic representation of the autocorrelation of sampled variables forming four data sets in accordance with the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below so that those skilled in the art can better understand the advantages and features of the present invention, and thus the scope of the present invention will be more clearly defined. The embodiments described herein are only a few embodiments of the present invention, rather than all embodiments, and all other embodiments that can be derived by one of ordinary skill in the art without inventive faculty based on the embodiments described herein are intended to fall within the scope of the present invention.
Examples
In the present invention, we propose a deep learning framework for multivariate time series prediction, i.e. long and short term time series networks (LSTNet), as shown in fig. 1. It exploits the advantages of convolutional layers to find local dependency patterns between multidimensional input variables and cyclic layers to capture complex long-term dependencies. A novel loop structure, i.e. current-skip, is designed to capture the very long-term dependence patterns and make optimization easier by exploiting the periodic nature of the input time-series signal. Finally, LSTNet partially parallels a traditional autoregressive linear model with a nonlinear neural network, which makes the nonlinear deep learning model more robust against time series violating scale variations. In experiments with real-world seasonal time series datasets, our model is consistently superior to traditional linear models and GRU recurrent neural networks.
The invention is constructed as follows:
in the present invention, the time series prediction problem is first formulated, and then the details of the proposed LSTNet architecture are discussed in the following section, as shown in fig. 1. Finally, an objective function and an optimization strategy are proposed.
1 problem formulation
In this context, the task of multivariate time series prediction is formulated, giving a series of fully observed time series signals Y ═ Y1,y2,...yTIn which yt∈RnWhere n is a variable dimension, our goal is to predict a series of future signals in a rolling predictive manner. To say this, to predict yT+hWhere h is the ideal horizon before the current timestamp, we assume { y }1,y2,...yTIs available. Also, to predict the next timestamp yT+h+1Let us assume the value of y1,y2,...yT,yT+1Is available. Thus, we formulate the input matrix X at the timestamp TT={y1,y2,...yT}∈Rn×T
In most cases, the scope of the predictive task is selected according to the requirements of the environmental settings, for example, for traffic usage, the range of interest is from a few hours to a day; for stock market data, even second/minute advance predictions are meaningful for generating returns.
Fig. 1 outlines the proposed LSTnet architecture. LSTNet is a deep learning framework designed specifically for multivariate time series prediction tasks, mixing long-term and short-term patterns. In the following sections, the building blocks of LSTNet will be described in detail.
2 convolution component
The first layer of LSTNet is a convolutional network without pools, aimed at extracting short-term patterns in the time dimension and local dependencies between variables. The convolutional layer is composed of a plurality of filters of width ω and height n (the height is set to be the same as the number of variables). The kth filter scans the input matrix X and generates: h isk=RELU(Wk*X+bk) Wherein: denotes convolution operation, output hkIs a vector, the RELU function is RELU (x) max (0, x). We fit each vector h of length T by inputting zero padding to the left of the matrix Xk. The size of the output matrix of the convolutional layer is dcX T, wherein dcThe number of filters is indicated.
3 regular assembly
The output of the convolutional layer feeds both the Recurrent and the Recurrent-skip elements. The Recurrent component is a loop layer with gated loop units (GRUs) and uses the RELU function as a hidden update activation function. The hidden state of the repeating unit at time t is calculated as,
rt=σ(xtWxr+ht-1Wxr+br);
ut=σ(xtWxr+ht-1Wxr+bu);
ct=RELU(xtWxc+rt⊙(ht-1Wxr)+br);
ht=(1-ut)⊙ht-1+ut⊙ct
wherein [ ] is the product of elements, [ sigma ] is the sigmoid function, xtIs the input to the layer at time t. The output of this layer is the hidden state of each timestamp. While people are accustomed to using the tanh function as the hidden update activation function, experience has found that RELU results in more reliable performance, by which gradients are more easily propagated backwards.
4 recursive skip component
The loop layer with GRU and LSTM units is carefully designed to remember historical information and thus understand the relatively long-term dependencies. However, GRU and LSTM are generally unable to acquire long-term correlations in practice due to gradient vanishing. The present invention proposes a new recursive skip component using a periodic pattern for real world data sets to alleviate this problem. The invention develops a loop structure with time hopping connections to extend the time span of the information stream and thus simplify the optimization process. Specifically, the skip link is added between the current hidden cell and the hidden cell at the same stage during the neighborhood. The update process can be expressed as:
rt=σ(xtWxr+ht-pWhr+br)
ut=σ(xtWxu+ht-pWhu+bu)
ct=RELU(xtWxc+rt⊙(ht-pWhc)+bc)
ht=(1-ut)⊙ht-p+ut⊙ct
the input to this layer is the output of the convolutional layer. p is the number of skipped hidden units, and the value of p can be easily determined for a well-defined data set of periodic patterns, otherwise it will be adjusted. In experiments it was found that even in the latter case, good tuning p significantly improves the performance of the model. Furthermore, LSTNet can be easily extended to variables containing the hop length p.
Frequent and frequent skip components of the output using a dense layer combination. The input to the dense layer includes the hidden state of the recursive component at the timestamp t (toRepresentation) and the hidden state of the recursive skip component p of the iterative skip component from timestamp t-p +1 to t, represented asThe output calculation formula of the dense layer is as follows:
wherein the content of the first and second substances,the prediction result of the (upper) part of the neural network in fig. 1 is represented at the time stamp t.
5 time attention layer
However, recursive skipping of layers requires a predefined over-parameter p, which is disadvantageous in non-seasonal time series prediction or whose period length changes dynamically over time. To alleviate this problem, we consider another approach, the attention mechanism, which learns a weighted combination of implicit representations at each window position of the input matrix. In particular, the attention weight α at the current timestamp tt∈RqThe calculation is as follows:
wherein the content of the first and second substances,for matrix stacking with implicit representation of RNN columns, attncore is some similar function, such as dot product, cosine or parameterized by a simple multi-layered perceptron.
The final output of the temporal attention layer is a weighted context vector ct=HtαtAnd a last window hidden representationAnd a linear projection operation.
6 autoregressive component
One major drawback of neural network models is that the output scale is insensitive to the input scale due to the non-linear nature of the convolution and recursion components. Unfortunately, in a particular real dataset, the input signal scales constantly change in a non-periodic manner, greatly reducing the prediction accuracy of the neural network model. To remedy this deficiency, similar in spirit to the road network, we decompose the final prediction of LSTNet into a linear part, which focuses mainly on local scaling problems, and a non-linear part containing recurring patterns. In the LSTNet architecture, a classical Autoregressive (AR) model is employed as the linear component. In the model of the invention, all dimensions have the same set of linear parameters. The AR model is as follows:
then, by integrating the outputs of the neural network part and the AR component, the final prediction of LSTNet is obtained:
wherein the content of the first and second substances,representing the final prediction of the model at time stamp t.
7 objective function
The squared error is the default loss function for many prediction tasks, and the corresponding optimization objective is expressed as:
wherein Θ represents a set of parameters of the model; omegaTRAIN;||·||FIs the Frobenius norm; a conventional linear regression model with a squared loss function is called a linear ridge, which is equivalent to a vector autoregressive model with ridge regularization. However, experiments have shown that linear support vector regression is controlled in some data setsA linear ridge model is shown. The only difference between linear support vector regression and linear ridge is the objective function. The objective function of linear support vector regression is:
ξt,i≥0;
where C and ε are hyperparameters. Excited by the significant performance of the linear support vector machine model, its objective function is incorporated into the LSTNET model as an alternative to the square loss. For convenience, let ε be 01The above objective function is reduced to the following absolute loss function:
the advantage of the absolute loss function is that it is more sensitive to anomalies in the real-time sequence data. In the experimental part, a validation set is used to decide which objective function to use, the squared penalty eq.7 or the absolute penalty eq.9.
8 optimization strategy
The optimization strategy provided by the invention is the same as the traditional time series prediction model. Suppose the time series of inputs is Yt={y1,y2,...,yt}. Defining an adjustable window size q and re-representing the input t at the timestamp as Xt={yt-q+1,yt-q+2,...yt}. The problem then becomes a problem with a set of feature value pairs { X }t,Yt+hThe regression task of (1) can be solved by a random gradient body surface method or a variable thereof.
Second, the comparative test evaluation of the present invention:
the invention has performed 9 methods of extensive experiments on the time series prediction task on 4 reference datasets. All data and experimental codes are available from the web.
a comparison method
The method for comparative evaluation of the present invention is as follows:
(1) AR represents an autoregressive model, equivalent to a one-dimensional Vector Autoregressive (VAR) model.
(2) The LRidge model is a vector auto-regression (VAR) model with L2 regularization, and is widely used in multivariate time series prediction.
(3) LSVR is a Vector Autoregressive (VAR) model with a support vector regression objective function.
(4) TRMF is an autoregressive model factorized with a temporal regular matrix.
(5) GP is a gaussian process modeled in time series.
(6) Va-MLP is a model combining multi-layer perception (MLP) and autoregressive models proposed in the above.
(7) RNN-GRU is a recurrent neural network model based on GRU units.
(8) LSTNet-Skip is our proposed LSTNet model with skipped RNN layers.
(9) LSTNet-Attn is our proposed LSTNet model with temporal attention layers.
For the above single output methods, such as AR, LRidge, LSVR and GP, only n models are trained independently, i.e. n is different for each model output.
b measure index
The invention uses three traditional evaluation indexes, defined as:
root Relative Square Error (RSE):
empirical correlation Coefficient (COR):
Y,representing the true signal and the system predicted signal, respectively, RSE is a widely used scaled version of the Root Mean Square Error (RMSE) designed to make the evaluation more readable, regardless of the data size. For RSE, lower values are better, while higher CORR values are better.
c data
The present invention uses four publicly available reference data sets.
Traffic: traffic hour data was collected from the California department of transportation for 48 months (2015-2016). These data describe road occupancy (between 0 and 1) measured by different sensors on the highway in the gulf of san francisco.
Solar energy: record of solar power generation in 2006 with 137 photovoltaic plants in alabama sampled every 10 minutes.
Electric power: from 2012 to 2014, electricity usage (kWh) was recorded every 15 minutes for n-321 customers. We convert the data to reflect the power consumption per hour;
exchange rate: a summary of daily rates from 1990 to 20 years in 8 foreign countries australia, uk, canada, switzerland, china, japan, new zealand and singapore.
All data sets were chronologically divided into a training set (60%), a validation set (20%) and a test set (20%). To facilitate future multivariate time series prediction studies, we published all raw datasets and web-preprocessed datasets.
Table 1: data set statistical table
In the data set statistical table of table 1, T is the time series length, D is the number of variables, and L is the sampling rate.
To test for the presence of long-term and/or short-term repetitive patterns in time series data, we plotted an autocorrelation of randomly selected ones of the variables in the four data sets in fig. 2. Autocorrelation, also known as sequence correlation, is the correlation between a signal and its delayed replica as a function of delay as defined below.
Wherein, XtRepresents a time series signal; μ represents the mean; σ represents the variance. In practical applications, empirical unbiased estimates are considered to calculate the autocorrelation.
It can be seen from the graphs (a), (b) and (c) of fig. 2 that there is a repeating pattern of high autocorrelation in the traffic, solar and power data sets, but not in the exchange rate data set. Furthermore, we can also observe short-term daily patterns (every 24 hours) and long-term weekly patterns (every 7 days) in the graphs of traffic and power data sets, which perfectly reflects the expected regularity of road traffic conditions and power consumption. On the other hand, in the plot (d) of the exchange rate dataset we hardly see any repetitive long-term patterns, except some short-term local continuity. These observations are important to our later analysis of the empirical results of the different methods, that is, they should perform better when the data contains such repetitive patterns (e.g., electricity, traffic, and solar energy) for methods that can correctly model and successfully utilize the short and long term repetitive patterns in the data. Argon energy). On the other hand, if the data set does not contain such patterns (e.g., exchange rates), the advantages of these methods may not lead to better performance than other less powerful methods.
d details of the experiment
A grid search is performed for all tunable hyper-parameters on the retention validation set for each method and data set. Specifically, if applied, all methods share the same grid search range of window size q, which varies by {2 }0,21,...,29}. For LRidge and LSVR, the regularization factor in is from {2 }-10,2-8,...,28,210And (6) selecting. For GP, the RBF kernel bandwidth σ and the noise level α are from {2 }-10,2-8,...,28,210And (6) selecting. For TRMF, the hidden dimension is from {2 }2,...,26Chosen, the regularization coefficients λ are chosen from {0.1, 1, 10 }. For LST-Skip and LST-Attn, the training strategy described above was used. The hidden dimensions of the recursive layer and convolutional layer are selected from {50, 100, 200} and the recursive skip layer is selected from {20, 50, 100 }. The skip length p of the recursive skip layer is set to 24 for the traffic and power data set and adjusted from 2 for the solar and exchange rate data set1To 26. The regularization coefficients for the AR components are chosen from 0.1, 1, 10 to achieve the best performance. We perform the dropping operation after each layer except for input and output, and the rate is typically set to 0.1 or 0.2, using ADAM algorithms to optimize the model parameters.
e main result
Setting h {3, 6, 12, 24} means predicting power and traffic data from 3 hours to 24 hours, solar data from 30 minutes to 240 minutes, and exchange rate data from 3 days to 24 days, respectively. The larger h, the more difficult the prediction task. The best results for each (data, metric) pair are highlighted in bold in this table. For LSTNet-Skip (a version of the proposed LSTNet), the total number of bold results is 17. The result for LSTNet-ATT (another version of LSTNet) was 7. The values for the remaining methods are between 0 and 3.
It is clear that the proposed two models LSTNet-Skip and LSTNet-Attn are constantly improving the state of the art on data sets with periodic patterns, especially over a wide range of conditions. Furthermore, the RSE metrics of LSTNet on solar, traffic and power data sets were 9.2%, 11.7% and 22.2% higher than the strong neural baseline RNN-GRU, respectively, when the range was 24, demonstrating the effectiveness of the complex repetitive pattern framework design. More importantly, given that the former still provides a considerable improvement over baseline, the user may consider LSTNet-attn as an alternative to LSTNet-skip when the application is unaware of the periodic pattern q. However, on the exchange rate data set, the proposed LSTNET is slightly worse than AR and LRIDGE. The autocorrelation curves of these data sets are used to show that there are repetitive patterns in the solar, traffic and power data sets rather than in the exchange rates. The current results provide empirical evidence for the success of the LSTNET model in modeling long-term and short-term dependence patterns. When they do appear in the data. Otherwise, LSTNET performed comparable to the better baselines (AR and LRIDGE) in the representative baseline.
Comparing the results of the univariate AR with the multivariate baseline methods (LRIDGE, LSVR and RNN), we found that in some data sets, such as solar energy and traffic, the multivariate methods are stronger but weaker, which means that richer input information can lead to overfitting of the traditional multivariate methods. In contrast, LSTNET has a powerful performance under different circumstances, partly because of its autoregressive components.
Ablation study
To demonstrate the effectiveness of our framework design, we conducted careful ablation studies. Specifically, we delete each component at a time in the LSTNet framework. First, we named LSTNet without the different components as shown below.
LSTw/oskip: LSTNet model without the Recurrent-skip component and attention component.
LSTw/oCNN: LSTNet has no skip model for the convolution component.
LSTw/oAR: LSTNet has no skip model for the AR component.
For different baselines, we adjust the hidden dimensions of the models to make them have similar model parameters to the completed LSTNet model, thereby eliminating the performance gain due to model complexity.
In the results of the test using RSE and CORR measurements: the best results for each dataset are obtained by either LST-Skip or LST-Attn. The deletion of the AR component (in LSTw/oAR) from the entire model caused the most significant performance degradation for most data sets, showing the critical role of the AR component in general. Deleting Skip and CNN components (LSTw/oCNN or LSTw/oskip) causes a significant performance degradation on some datasets, but not all. All components of LSTNet together lead to our robust performance for all datasets. The conclusion is that the architectural design of the present invention is most robust in all experimental environments, especially in large field of view environments.
The reason why the AR component has such an important role is that: AR is generally robust, scale-varying data. To empirically verify this intuition, one dimension (a variable) of the time series signal in the power consumption data set was plotted over a duration of 1 to 5000 hours, and it can be seen that true consumption increases suddenly around 1000 hours, while lstnet-Skip successfully catches this sudden change, but LSTw/oAR fails to react correctly. Ablation studies clearly demonstrate the efficiency of our architectural design. All components contribute to the excellent and robust performance of LSTNet.
The invention provides a novel deep learning framework (LSTNet) aiming at the multivariate time series prediction problem. By combining the strength and autoregressive components of convolutional and recurrent neural networks, the method significantly improves the latest results of time series prediction for multiple reference data sets. Through intensive analysis and empirical research, the efficiency of the LSTNet model architecture is shown, short-term and long-term repetitive patterns in data are successfully captured, and a linear model and a nonlinear model are combined to carry out robustness prediction.

Claims (2)

1. A long-term and short-term traffic flow prediction model construction method based on deep learning is characterized in that: comprises the following steps:
1. problem formulation:
the task of formulating multivariate time series predictions is:
a series of fully observed time series signals are given: y ═ Y1,y2,...yTIn which yt∈RnN is a variable dimension; to predict yT+hWhere h is the ideal horizon before the current timestamp, assume { y }1,y2,...yTAvailable; also, to predict the next timestamp yT+h+1Value of (c), assume { y }1,y2,...yT,yT+1Available; formulating an input matrix X at a timestamp TT={y1,y2,...yT}∈Rn×T(ii) a Selecting the range of the prediction task according to the requirement set by the environment;
2. convolution component:
the LSTNet framework is a deep learning framework, is specially designed for a multivariate time series prediction task, and mixes long-term and short-term modes; the first layer of the LSTNet architecture is a convolutional network without a pool, a short-term mode in a time dimension and local dependency among variables are extracted, the convolutional layer is composed of a plurality of filters with width omega and height n, and the height is set to be the same as the number of the variables; the kth filter scans the input matrix X and generates: h isk=RELU(Wk*X+bk) Wherein: denotes convolution operation, output hkIs a vector, RELU function is RELU (x) max (0, x); each vector h of length T is formed by zero padding of the left hand side of the input matrix XkThe size of the output matrix of the convolutional layer is dcX T of which
dcRepresenting the number of filters;
3. frequent components:
the output of the convolution layer is simultaneously fed into a current component and a current-skip component, wherein the current component is a circulation layer with a gating circulation unit GRU, a RELU function is used as a hiding updating activation function, and the hiding state of a repeating unit at time t is calculated as:
rt=σ(xtWxr+ht-1Wxr+br);
ut=σ(xtWxr+ht-1Wxr+bu);
ct=RELU(xtWxc+rt⊙(ht-1Wxr)+br);
ht=(1-ut)⊙ht-1+ut⊙ct
wherein [ ] is the product of elements, [ sigma ] is the sigmoid function, xtIs the input of the layer at time t; the output of this layer is the hidden state of each timestamp;
4. a recursive skip component:
developing a loop structure with time hopping connections to extend the time span of the information stream and thereby simplify the optimization process, in particular, the skipping connections are added between the current hidden cell and the hidden cells at the same stage during the neighboring period; the update process can be expressed as:
rt=σ(xtWxr+ht-pWhr+br)
ut=σ(xtWxu+ht-pWhu+bu)
ct=RELU(xtWxc+rt⊙(ht-pWhc)+bc)
ht=(1-ut)⊙ht-p+ut⊙ct
the input to this layer is the output of the convolutional layer; p is the number of skipped hidden units, and the value of p can be easily determined for a data set with clear periodic pattern, otherwise, the value is adjusted; good tuning p can significantly improve the performance of the model, and furthermore, LSTNet can be easily extended to variables containing the hop length p;
frequent and frequent skip components of the output using a dense layer combination; the input to the dense layer includes the hidden state of the recursive component at the timestamp t (toRepresentation) and the hidden state of the recursive skip component p of the iterative skip component from timestamp t-p +1 to t, represented asThe output calculation formula of the dense layer is as follows:
wherein the content of the first and second substances,represents the prediction of the (upper) part of the neural network at the time stamp t;
5. time attention layer:
add attention mechanism that learns a weighted combination of implicit representations at each window position of the input matrix, in particular, the attention weight α at the current timestamp tt∈RqThe calculation is as follows:
wherein the content of the first and second substances,for matrix stacking with implicit representation of RNN columns, attncore is some similar function, such as dot product, cosine or parameterized by a simple multi-layered perceptron;
the final output of the temporal attention layer is a weighted context vector ct=HtαtAnd a last window hidden representationAnd a linear projection operation:
6. autoregressive component:
decomposing the final prediction of LSTNet into a linear part, which is mainly focused on local scaling problems, and a non-linear part containing recurring patterns; in the LSTNet architecture, a classical Autoregressive (AR) model is employed as the linear component; all dimensions have the same set of linear parameters, and the AR model is as follows:
then, by integrating the outputs of the neural network part and the AR component, the final prediction of LSTNet is obtained:
wherein the content of the first and second substances,representing the final prediction of the model at time stamp t;
7. establishing an objective function:
the squared error is the default loss function for many prediction tasks, and the corresponding optimization objective is expressed as:
wherein Θ represents a set of parameters of the model; omegaTRAIN;||·||FIs the Frobenius norm;
the objective function of linear support vector regression is:
ξt,i≥0;
wherein C and epsilon are hyper-parameters, excited by the remarkable performance of the linear support vector machine model, and an objective function of the hyper-parameters is taken as a substitute method of square loss and is incorporated into the LSTNET model; suppose ε is 01The above objective function is reduced to the following absolute loss function:
the advantage of the absolute penalty function is that it is more sensitive to anomalies in the real-time sequence data, and in the experimental part, a validation set is used to decide which objective function to use, the squared penalty Eq.7 or the absolute penalty Eq.9.
2. The deep learning-based long-term and short-term traffic flow prediction model construction method according to claim 1, characterized in that: and (3) optimizing the strategy: suppose the time series of inputs is Yt={y1,y2,...,ytDefine an adjustable window size q and re-represent the input t at the timestamp as Xt={yt-q+1,yt-q+2,...ytThen, the problem becomes a solution with a set of characteristic value pairs { X }t,Yt+hThe regression task of (1) can be solved by a random gradient body surface method or a variable thereof.
CN201910855977.4A 2019-09-11 2019-09-11 Long-term and short-term traffic flow prediction model construction method based on deep learning Pending CN110610232A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910855977.4A CN110610232A (en) 2019-09-11 2019-09-11 Long-term and short-term traffic flow prediction model construction method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910855977.4A CN110610232A (en) 2019-09-11 2019-09-11 Long-term and short-term traffic flow prediction model construction method based on deep learning

Publications (1)

Publication Number Publication Date
CN110610232A true CN110610232A (en) 2019-12-24

Family

ID=68892533

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910855977.4A Pending CN110610232A (en) 2019-09-11 2019-09-11 Long-term and short-term traffic flow prediction model construction method based on deep learning

Country Status (1)

Country Link
CN (1) CN110610232A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111179596A (en) * 2020-01-06 2020-05-19 南京邮电大学 Traffic flow prediction method based on group normalization and gridding cooperation
CN111696345A (en) * 2020-05-08 2020-09-22 东南大学 Intelligent coupled large-scale data flow width learning rapid prediction algorithm based on network community detection and GCN
CN112465054A (en) * 2020-12-07 2021-03-09 深圳市检验检疫科学研究院 Multivariate time series data classification method based on FCN
CN112818033A (en) * 2021-01-28 2021-05-18 河北工业大学 Bag breaking intelligent detection method of bag type dust collector based on neural network
CN113052214A (en) * 2021-03-14 2021-06-29 北京工业大学 Heat exchange station ultra-short term heat load prediction method based on long and short term time series network
CN113053113A (en) * 2021-03-11 2021-06-29 湖南交通职业技术学院 PSO-Welsch-Ridge-based anomaly detection method and device
CN113283667A (en) * 2021-06-15 2021-08-20 广东工业大学 Marine industry development trend analysis method
CN113553772A (en) * 2021-08-09 2021-10-26 贵州电网有限责任公司 Icing tension prediction method based on deep hybrid modeling
CN113569479A (en) * 2021-07-27 2021-10-29 天津大学 Long-term multi-step control method, device and storage medium for rock fracture development of stone cave temple
CN114374653A (en) * 2021-12-28 2022-04-19 同济大学 Variable bit rate service scheduling method based on flow prediction
CN115146842A (en) * 2022-06-24 2022-10-04 沈阳建筑大学 Multivariate time series trend prediction method and system based on deep learning
US11681914B2 (en) 2020-05-08 2023-06-20 International Business Machines Corporation Determining multivariate time series data dependencies

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111179596A (en) * 2020-01-06 2020-05-19 南京邮电大学 Traffic flow prediction method based on group normalization and gridding cooperation
US11681914B2 (en) 2020-05-08 2023-06-20 International Business Machines Corporation Determining multivariate time series data dependencies
CN111696345A (en) * 2020-05-08 2020-09-22 东南大学 Intelligent coupled large-scale data flow width learning rapid prediction algorithm based on network community detection and GCN
CN112465054A (en) * 2020-12-07 2021-03-09 深圳市检验检疫科学研究院 Multivariate time series data classification method based on FCN
CN112465054B (en) * 2020-12-07 2023-07-11 深圳市检验检疫科学研究院 FCN-based multivariate time series data classification method
CN112818033A (en) * 2021-01-28 2021-05-18 河北工业大学 Bag breaking intelligent detection method of bag type dust collector based on neural network
CN113053113A (en) * 2021-03-11 2021-06-29 湖南交通职业技术学院 PSO-Welsch-Ridge-based anomaly detection method and device
CN113052214A (en) * 2021-03-14 2021-06-29 北京工业大学 Heat exchange station ultra-short term heat load prediction method based on long and short term time series network
CN113052214B (en) * 2021-03-14 2024-05-28 北京工业大学 Heat exchange station ultra-short-term heat load prediction method based on long-short-term time sequence network
CN113283667A (en) * 2021-06-15 2021-08-20 广东工业大学 Marine industry development trend analysis method
CN113569479A (en) * 2021-07-27 2021-10-29 天津大学 Long-term multi-step control method, device and storage medium for rock fracture development of stone cave temple
CN113569479B (en) * 2021-07-27 2023-11-10 天津大学 Long-term multi-step control method, device and storage medium for rock mass crack development of stone cave temple
CN113553772A (en) * 2021-08-09 2021-10-26 贵州电网有限责任公司 Icing tension prediction method based on deep hybrid modeling
CN114374653A (en) * 2021-12-28 2022-04-19 同济大学 Variable bit rate service scheduling method based on flow prediction
CN114374653B (en) * 2021-12-28 2024-02-27 同济大学 Variable bit rate service scheduling method based on flow prediction
CN115146842A (en) * 2022-06-24 2022-10-04 沈阳建筑大学 Multivariate time series trend prediction method and system based on deep learning

Similar Documents

Publication Publication Date Title
CN110610232A (en) Long-term and short-term traffic flow prediction model construction method based on deep learning
Liang et al. A novel wind speed prediction strategy based on Bi-LSTM, MOOFADA and transfer learning for centralized control centers
Wang et al. Multiple convolutional neural networks for multivariate time series prediction
Zhou et al. A review on global solar radiation prediction with machine learning models in a comprehensive perspective
Lai et al. Modeling long-and short-term temporal patterns with deep neural networks
Xie et al. SNAS: stochastic neural architecture search
Ma et al. A hybrid attention-based deep learning approach for wind power prediction
Liang Bayesian neural networks for nonlinear time series forecasting
Goh et al. A multimodal approach to chaotic renewable energy prediction using meteorological and historical information
CN114004338A (en) Mixed time period mode multivariable time sequence prediction method based on neural network
CN114065996A (en) Traffic flow prediction method based on variational self-coding learning
Wang A combined model for short-term wind speed forecasting based on empirical mode decomposition, feature selection, support vector regression and cross-validated lasso
Dong et al. Accurate combination forecasting of wave energy based on multiobjective optimization and fuzzy information granulation
Sriramulu et al. Adaptive dependency learning graph neural networks
CN114970946A (en) PM2.5 pollution concentration long-term space prediction method based on deep learning model and empirical mode decomposition coupling
Surakhi et al. On the ensemble of recurrent neural network for air pollution forecasting: Issues and challenges
Al-Ja’afreh et al. An enhanced CNN-LSTM based multi-stage framework for PV and load short-term forecasting: DSO scenarios
Cui et al. A VMD-MSMA-LSTM-ARIMA model for precipitation prediction
Zhong et al. Online prediction of noisy time series: Dynamic adaptive sparse kernel recursive least squares from sparse and adaptive tracking perspective
Utku et al. Deep learning based prediction model for the next purchase
Shan et al. A deep‐learning based solar irradiance forecast using missing data
Qu et al. Short-term wind farm cluster power prediction based on dual feature extraction and quadratic decomposition aggregation
CN116434531A (en) Short-time traffic flow prediction method based on Conv1D-LSTM model
CN115564466A (en) Double-layer day-ahead electricity price prediction method based on calibration window integration and coupled market characteristics
Suhartono et al. Hybrid Machine Learning for Forecasting and Monitoring Air Pollution in Surabaya

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191224

RJ01 Rejection of invention patent application after publication