CN112016590A

CN112016590A - Prediction method combining sequence local feature extraction and depth convolution prediction model

Info

Publication number: CN112016590A
Application number: CN202010727541.XA
Authority: CN
Inventors: 金苍宏; 陈天翼; 董腾然; 叶惠波; 李卓蓉; 吴明晖
Original assignee: Zhejiang University City College ZUCC
Current assignee: Zhejiang University City College ZUCC
Priority date: 2020-07-24
Filing date: 2020-07-24
Publication date: 2020-12-01

Abstract

The invention provides a prediction method combining sequence local feature extraction and a depth convolution prediction model, which is based on feature analysis and extraction of the periodicity of a sequence and is fused into each layer of a cavity convolution model through calculation of the periodicity and similarity weight of each point in the sequence. The idea is that different weight values are assigned to sequence points which have no difference originally, so as to distinguish the importance of different positions on target prediction points. The invention provides a sequence point importance algorithm, and combines a hole convolution method, experiments are carried out in a traffic flow sequence data set, and compared with a deep learning method, the method has improved indexes such as MAPE, MAE and RMSE.

Description

Prediction method combining sequence local feature extraction and depth convolution prediction model

Technical Field

The invention belongs to the field of data analysis, and particularly relates to a prediction method combining sequence local feature extraction and a depth convolution prediction model.

Technical Field

Traffic speed prediction is one of the most important tasks of intelligent transportation systems. Although traffic speed prediction has a history of decades, conventional prediction methods based on statistical models or conventional feature Regression models (e.g., Auto Regressive Integrated Moving Average, ARIMA and Support Vector Regression, SVR) cannot accurately predict the traffic due to lack of computational power or data volume, high dimensional and non-linear features of the traffic. In recent years, models based on deep learning have enjoyed great success in many fields such as image recognition and natural language. Therefore, prediction by a deep learning model is a new trend. A Long Short-Term Memory Neural Network (LSTM) is applied to traffic speed prediction, and an Evolutionary Fuzzy Neural Network (EFNN) based on a clustering method is proposed to predict the travel speed of forward steps. A Traffic Graph convolution Long Short Term Memory Neural Network (TGC-LSTM) creates a Traffic Graph convolution based on physical Network topology and combines with the LSTM to improve prediction performance.

Although there are various deep learning solutions available in the traffic prediction problem, it has been found that existing methods either predict poorly in periodic data or require other information beyond the sequence to be incorporated, such as relationships between sequence points, sequence-external correlation properties. Although the effects of the methods are improved, the methods need large additional information and cannot achieve universality.

Disclosure of Invention

The invention aims to analyze massive time series data and provides a prediction method combining sequence local feature extraction and a deep convolution prediction model. The invention adopts the following technical scheme:

the prediction method combining the sequence local feature extraction and the depth convolution prediction model is characterized by comprising the following steps of:

(1) sampling of sequence slices through a sliding window

According to the size of the sliding window, firstly sampling the tail end of the sequence to obtain a target slice with the size consistent with that of the sliding window, and then sliding the tail end of the sequence point by point from the head end to obtain slice samples with the same number as the sequence length;

(2) similarity index calculation

For each given index, respectively calculating the similarity between each slice sample and the target slice to obtain a similarity vector with the same length as the original time sequence under the index;

(3) vector fusion and channel conversion

Superposing vectors obtained by calculating all indexes and the original sequence in the channel direction to obtain a self-adaptive sequence; because the length and the number of channels of the sequence calculated by convolution are changed, the conversion of the number of the channels is required to be completed by causal convolution before and after fusion;

(4) neural network training and prediction

Training a WaveNet or TCN deep convolution model by using a time sequence data set; after local sensitive features of time sequence data are extracted, the time sequence comprises a plurality of channels, cross-channel interaction and information integration are realized through causal convolution, and the sensing domain of the model is enlarged through convolution kernel expansion of a cavity convolution exponential order, so that long-time sequence dependence is captured better;

in the prediction process, time sequence data are used as input, the trained model can automatically extract local sensitive information of the sequence and fuse the information into the original sequence to enable the information to contain local prior period characteristics, so that guidance is provided for causal inference of deep convolution, and prediction values of a plurality of points in the future of the time sequence are output.

For step (1), the slice sample represents self characteristic information through adjacent points during sampling, so that the self point is ignored, and 0 value filling is needed if the sample length is not enough to fill the sliding window.

For the step (2), the index adopts a similarity index based on a numerical value, and the number of the similarity vectors can be arbitrarily expanded as long as a corresponding calculation rule is given. The numerical-based similarity index is one, two or all of a value variance, a value mean deviation and a dot product ratio.

The invention discloses a prediction method combining sequence local feature extraction and a deep convolution prediction model. The idea of the invention is to find the similarity between the context of each point and the context of the predicted target point as the importance of the point, and then to combine the deep learning model for prediction. The method is based on the characteristic analysis and extraction of the periodicity of the sequence, and is fused into each layer of a cavity convolution model through the calculation of the periodicity and the similarity weight of each point position in the sequence, and different weight values are assigned to sequence points which have no difference originally so as to distinguish the importance of different positions on target prediction points. The model prediction method of the invention does not need other external data, but fully excavates the periodic property of the sequence itself, and has good scene application capability. The invention provides a sequence point importance algorithm, and combines a hole convolution method, experiments are carried out in a traffic flow sequence data set, and compared with a deep learning method which is not fused with local sensitive characteristics, indexes such as MAPE, MAE and RMSE are improved.

Drawings

FIG. 1 is a schematic diagram of a sequential sampling process;

FIG. 2 is a schematic diagram of a sample vector and a target vector;

FIG. 3 is a schematic view of locality sensitive feature fusion;

FIG. 4 is a schematic diagram of a locality sensitive feature fusion module;

fig. 5 is a diagram of a locality sensitive WaveNet network architecture.

Detailed Description

The present invention will be described in more detail with reference to the accompanying drawings and examples, but the following examples are only for better understanding of the present invention and are not to be construed as limiting the scope of the present invention.

Reference is made to the accompanying drawings. The prediction method combining the sequence local feature extraction and the depth convolution prediction model comprises the following steps:

1. sampling sequence slices through a sliding window

In slicing a time series, it is first necessary to determine the size W of a sliding window, and S ═ x for a time series of length n₁，x₂，x₃，...，x_nGet the target slice S_tMeanwhile, as the sliding window slides, n time slices can be obtained in total, and the slice of each point does not contain itself but only contains the adjacent points. If the length of a slice is less than W, the slice needs to be filled with a value of 0.

Taking the sliding window size W as 4 as an example, the sampling process is shown in fig. 1. For a time sequence S, firstly, sampling W points at the tail end of the sequence to obtain a target slice S_t. The window is then slid back and sliced, starting from the beginning of the sequence, point by point, and the sampling process only contains its neighbors. Filling with 0 value for the sample with length less than W, x in the figure₁It is shown that 2 0 values need to be filled, and the slice S is obtained after sampling₁For x, the same principle applies₂Only 1 0 value needs to be filled, and the description is omitted in the figure; for samples with length satisfying W, the samples are directly sampled without padding, such as x in the figure₃As shown, a slice S is obtained₃. And the other points are sampled in sequence by the same rule.

2. Similarity index calculation

Calculating the similarity based on numerical values of different slice samples and the target slice respectively, wherein the similarity comprises three indexes:

(a) variance of Value (VSD)

The smaller the value, the more similar the sequence.

(b) Mean difference of values (Value Mean Deviation, VMD)

The smaller the value, the more similar the sequence.

(c) Dot Product Ratio (DPR)

The larger the value, the more similar the sequence, and the range of the value is

In the above 3 formulas, A and B respectively represent two equal-length time sequences, n represents the sequence length, A_iOr B_iRepresenting the ith value in the sequence.

N sample slices S with the length of W can be obtained after the sequence is sampled_i，i∈[1，n]These slices will constitute a sample vector V of dimension n × W_sCan be described as

Simultaneously slicing the target S_tRepeating for n times to obtain target vector

As shown in fig. 2. Therefore, vectorization calculation can be carried out to respectively calculate the three similarity indexes to obtain three similarity sequences with the same length n as the original time sequence.

3. Timing expansion and feature fusion

The extended time sequence S' obtained by splicing all the sequences in the channel direction is shown in fig. 3. Where S is the original time sequence, S_VSDIs composed of V_sAnd V_tThe above-mentioned 2(a) index calculated value variance sequence, S_VMDIs composed of V_sAnd V_tThe mean deviation sequence of values calculated from the index 2(b), S_DPRIs composed of V_sAnd V_tDot product ratio calculated by the index 2(c)And sequences obtained by index calculation are all equal to the original sequence S in length. And then, overlapping the obtained 4 sequences in the channel direction to obtain an extended time sequence S'.

Therefore, three channels can be expanded outwards on the basis of the time sequence, so that the time sequence contains the prior information of the local sensitivity periodicity of the similarity.

The prior information at this time is distributed in different channels of the time sequence, and the model cannot be well guided to capture the local periodic characteristics of the time sequence, so that the prior information and the original time sequence need to be fused; meanwhile, the length of the sequence and the number of channels are changed through one convolution calculation. Therefore, in order to complete the fusion of the local sensitive information and the original sequence, and simultaneously maintain the consistency of the time sequence dimension, the characteristic channel needs to be compressed through causal convolution before the similarity is calculated, and similarly, the channel expansion needs to be performed on the adaptive sequence through causal convolution, so as to complete the deep embedding with the deep convolution model, thereby realizing the end-to-end learning.

The structure of the locally sensitive feature fusion module that accomplishes this process is shown in fig. 4. Firstly, compressing a time sequence channel through causal convolution to reduce the dimension of the time sequence in the channel direction to one dimension. Then, different sensitivity sequences can be obtained through the calculation according to different similarity indexes in the step 2, and the time sequence sensitivity decomposition is completed. And overlapping the obtained sequence and the original sequence in the channel direction, and fusing the local sensitive information of the distribution and different channels and the original sequence through causal convolution to obtain a fused sequence. The batch standardization can accelerate the convergence of the depth model and improve the stability; relu is a nonlinear activation function defined as f (x) max (0, x).

A time-sequential adaptive module is embedded in each layer of Wavenet, and can be trained before the hole convolution layer.

4. Neural network training and prediction

The WaveNet can predict the result of the t-th point according to the first t-1 points of a sequence, and therefore the WaveNet can be widely applied to numerical prediction of time series data.

As shown in fig. 5, first pass through a convolution kernel with a convolution kernel

The convolutional neural network converts the sequence channel to realize the ascending dimension of the number of the convolutional kernel channels. And calculating the similarity after sampling according to the sequence in each layer through a sequence self-adaptive module, expanding a channel, and generating a new characteristic through a cavity convolution effect with a larger perception domain and the sequence. Respectively activating by a nonlinear activation function tanh and sigmoid, multiplying the activated characteristics by an element-wise multiplier, and then carrying out activation by a band

And the causal convolution of the convolution kernel realizes cross-channel interaction and information integration, and output is linearly superposed through skip-connection. Meanwhile, the features subjected to causal convolution are subjected to residual error connection with the original features, and enter the next layer after linear superposition.

Tanh in FIG. 5 is a nonlinear activation function, defined as

σ is a nonlinear activation function defined as

X represents the multiplication of the values of two equal-length time sequences on each bit; + denotes the addition of the values of two equal-length time series on each bit.

And repeating relu activation and causal convolution twice on the output result of linear superposition after the k layer is calculated, and mapping the characteristics into a numerical prediction result of the next target point through the full-connection layer.

And (3) algorithm analysis: the core of the algorithm is to sample point by point from the prior information of the time sequence itself, and because the point bit adjacent to a certain point can describe the characteristics of the point bit itself, the data near the point but not including the point bit itself is sampled according to a certain length. Line 2 of the code first compresses the timing dimension to one dimension by causal convolution, followed by sampling. The head and tail of the sequence can have insufficient length during sampling, so that the 3 rd line of the code performs 0 value filling on the sequence in advance to simplify the next sampling process. Both the sampled sequence slice and the target slice have fixed lengths, so that the sequence slice and the target slice can be stored as vectors, and the calculation speed can be accelerated by utilizing vectorized matrix operation in the calculation of the similarity index, so that the time consumption in end-to-end model training is reduced. Lines 9 through 11 of code may avoid the use of explicit for loops through vector processing. And in the 12 th line of the code, the local sensitive sequence features obtained by calculation of different indexes are superposed, and the prior knowledge in different channels is fused through the causal convolution calculation of the 13 lines, so that the sequence of the local sensitive features of the fused sequence is obtained.

After unsupervised self-adaptive processing is carried out on the sequence, a space-time neural network framework based on a convolutional neural network is provided, the future points of the sequence are subjected to numerical prediction through the prior knowledge carried by the sequence, and the method can be widely applied to the problems of time sequence prediction.

The technical effect of the invention is illustrated by comparing the evaluation of the depth model of the local sensitive feature of the fusion sequence with other various methods.

Run on a common data set PEMS (velocity time series data detected by a fixed detector in california). The indicators evaluated are MAPE (Mean Absolute Percentage error), MAE (Mean Absolute error) and RMSE (root Mean Square error).

LSTM: the Long Short-Term Memory network (LSTM) is a time-cycle neural network, and can solve the Long-Term dependence problem of the general RNN (cycle neural network);

TCN: the time convolution network (Temporal Convolutional Networks) is a network structure capable of processing time sequence data, and can infer new information at a plurality of future time points according to the sequence of the occurrence of each point in a known sequence;

WaveNet: the core of WaveNet is an expanded causal convolutional layer (scaled causal constants), which allows neural networks to handle time order correctly and handle long term dependencies without causing model complexity explosions, mitigating long-term challenges of learning in large time steps;

Local-Sensitive TCN: according to the method of the invention, a time convolution network of sequence prior period characteristics is fused. Except for the sequence of the occurrence of each point position of the known sequence, the similarity characteristics of different point positions are considered in a combined manner, so that the point positions can have different importance in the prediction process;

Local-Sensitive WaveNet: according to the method of the invention, WaveNet with sequence prior period characteristics is fused. The similarity characteristics of different point positions are considered in combination while the time sequence and long-term dependence of the processing sequence are processed, so that the point positions can have different importance in the prediction process.

TABLE 1 time series prediction method comparison

The results of the experiments are shown in Table 1, with the results of the best performance being shown in bold and the results of the second best performance being shown in boldLower molded lineAnd (4) marking. On the native timing prediction model, TCN is more efficient than LSTM, while WaveNet has a larger gap in the case of shorter timing data.

Compared with a native deep learning time sequence prediction model, the neural network can better learn the sequence characteristics after the prior period characteristics of the time sequence are added into the local sensitive deep convolution model, and all indexes are improved. The wavelet Net has obvious promotion effect, can remarkably promote the accuracy of model prediction, and can be slightly superior to traditional models such as TCN and LSTM through the learning performance of the characteristics; the accuracy of the Local-Sensitive TCN added with the prior periodic characteristics is improved relatively and is better than that of the Local-Sensitive WaveNet.

Claims

1. A prediction method combining sequence local feature extraction and a depth convolution prediction model is characterized by comprising the following steps:

(1) sampling of sequence slices through a sliding window

(2) similarity index calculation

(3) vector fusion and channel conversion

Superposing vectors obtained by calculating all indexes and the original sequence in the channel direction to obtain a self-adaptive sequence; the conversion of the channel number is completed by causal convolution before and after fusion;

(4) neural network training and prediction

2. The prediction method of claim 1, wherein the slice samples represent their own feature information by neighboring points during sampling, so that the own point locations are ignored, and 0-valued padding is needed if the sample length is not long enough to fill the sliding window.

3. The prediction method combining sequence local feature extraction and deep convolution prediction model as claimed in claim 1, wherein for step (2), the index uses a similarity index based on numerical value.

4. The prediction method of claim 3, wherein the numerical similarity indicator is one, two or all of a value variance, a mean deviation and a dot product ratio.