CN113033776A

CN113033776A - Time sequence prediction method combining global cavity convolution and local identification characterization

Info

Publication number: CN113033776A
Application number: CN202110262391.4A
Authority: CN
Inventors: 陈天翼; 金苍宏; 董腾然; 吴明晖
Original assignee: Zhejiang University City College ZUCC
Current assignee: Zhejiang University City College ZUCC
Priority date: 2021-03-10
Filing date: 2021-03-10
Publication date: 2021-06-25

Abstract

The invention discloses a time sequence prediction method combining global cavity convolution and local identification representation, which comprises the following steps: extracting the characteristics of the time sequence segments to generate corresponding representative characteristic vectors; based on the obtained vectors, a clustering label is given to each time sequence segment through a clustering method so as to identify different types of time sequences; then, extracting local identification representations (Shapelets) corresponding to different time sequence categories through time sequence representation learning, performing multiple identifiable context fusion processing on each time sequence segment and the obtained representations, and stacking the time sequence segments and the obtained representations in a channel direction to enable a native time sequence to be advanced into an identifiable context; and performing time sequence prediction training on the obtained context through time hole convolution with residual connection. The invention integrates time convolution and provides a novel time sequence representation extraction and fusion mode to enhance the influence of different bit importance in time sequence on the prediction result, so that the precision of the prediction result is effectively improved.

Description

Time sequence prediction method combining global cavity convolution and local identification characterization

Technical Field

The invention belongs to the field of data analysis, and relates to a time series prediction method combining global cavity convolution and local identification characterization.

Background

The time sequence data is used for describing the characteristics of the change of the object along with the time, and the research on the time sequence can help people to know the historical development mode of the object and predict the future change trend of the object, such as stock price fluctuation, road and vehicle flow prediction, user behavior analysis and the like. The essential characteristics of the sequence can be drawn according to the mean, variance, covariance and the like of the sequence, and the sequence is divided into a stationary time sequence, a non-stationary time sequence and the like. Deep learning methods such as recurrent neural networks and time convolution networks cannot take into account the problems of perception domain limitation and local feature loss. Therefore, there is a need to provide a new fusion method that can overcome the above-mentioned difficulties by fusing local features into the hole convolution.

Disclosure of Invention

In order to overcome the defects of the existing method, one of the purposes of the invention is to provide a time series prediction method combining global hole convolution and local identification characterization, which can improve the prediction accuracy and reduce the prediction error.

One of the purposes of the invention is realized by adopting the following technical scheme:

a time series prediction method combining global cavity convolution and local identification characterization is characterized by comprising the following steps:

and extracting the characteristics of the time sequence segments to obtain representative characteristic vector expressions corresponding to different time sequence segments, wherein the extracted characteristics comprise but are not limited to autocorrelation coefficients, mean value changes and average second derivative centers. The obtained features are extracted to describe the features of repetitive patterns, sequence fluctuation, numerical mutation, information from the future and the past, and the like contained in the time sequence, so that feature vector representations of different time sequences are obtained.

Further, dividing the time sequence into a plurality of different clusters by an unsupervised clustering method according to the obtained feature vector so as to realize the division of different time sequence categories;

further, the time sequence characteristics are learned according to the obtained time sequence category labels, generalization corresponding to different time sequence categories is extracted through optimization of the objective function, and the generalization is used as a recognizable similarity measurement standard;

further, a sliding window with a fixed size is set to sample the time sequence to obtain the micro context, and similarity calculation is performed according to different criteria with each previously obtained similarity metric, including a value square deviation, a value average deviation and a dot product ratio. A higher similarity indicates that more distinguishable information is contained in the context. Splicing the three similarity vectors obtained by calculation with the original time sequence in the channel direction, so that the identifiable characteristic information of the original time sequence is distributed into a plurality of different channels;

further, the perceptron of the model is increased by the stacked several layers of time convolution, so as to ensure that the model can capture the context distinguishing information for a longer time in the training process. By the micro context importance feature, the model can have the capability of predicting based on the weights of different points, so that the importance of the context is enhanced. Hole convolution can be interpreted as the convolution of a down-sampled version of the lower layer features, reducing the resolution of combining information from distant history and the future. By increasing the expansion rate of the hole convolution, it can be made to use more information more efficiently.

Further, according to the time series prediction model, the processed time series data is predicted and a result is output, and the method comprises the following steps:

according to a univariate time sequence, clustering is carried out through time sequence feature extraction to obtain a corresponding clustering label, local identification representations of the time sequence are obtained through the label through unsupervised training, then the representations are used as measurement standards, similarity is calculated according to different standards with micro-context obtained through sliding window sampling, importance features of different point positions are obtained, superposition is carried out in a channel direction, finally fusion and learning are carried out on the features through time hole convolution with residual connection, and therefore results are predicted and output.

Compared with the prior art, the invention has the beneficial effects that:

the invention integrates a point location weight obtaining and fusing method based on micro-context, can systematically realize the local identification representation of point locations through local identification representation, thereby solving the problem that the specific point location environment is difficult to identify in time sequence prediction; the model uses the time convolution with the holes to enhance the global capturing capability of the time sequence, and meanwhile, the model is combined with the point importance, so that not only can the overall trend be considered, but also the local characteristics can be adapted, the prediction result is more accurate, and meanwhile, the model has better robustness and interpretability.

Drawings

FIG. 1 is a schematic diagram of a model structure of a time series prediction method combining global hole convolution and local identification characterization according to the present invention.

Detailed Description

The present invention will now be described in more detail with reference to the accompanying drawings, in which the description of the invention is given by way of illustration and not of limitation. The various embodiments may be combined with each other to form other embodiments not shown in the following description.

As shown in fig. 1, the method for predicting a time series by combining global hole convolution and local identification representation according to the present invention includes the following steps:

performing feature extraction on the time sequence segments so as to generate a corresponding representative feature vector for each time sequence segment;

based on the obtained vectors, a clustering label is given to each time sequence segment through a clustering method so as to identify different types of time sequences;

extracting local identification representations (Shapelets) corresponding to different time sequence categories through time sequence representation learning, performing multi-identification context fusion and fusion on each time sequence segment and the obtained representations, and stacking the time sequence segments and the representations in a channel direction to enable a native time sequence to be advanced into an identification context;

performing the processing according to the time sequence segments in the training set and the verification set respectively, and performing time sequence prediction training on the obtained context through time hole convolution with residual connection;

and importing the test set into a pre-training model, predicting the processed time sequence segment and outputting a result.

The following describes the method for predicting a time series by combining global hole convolution and local identification representation according to the present invention in further detail with reference to the following embodiments.

Extracting the features of the time sequence segments to obtain representative feature vector expressions corresponding to different time sequence segments, wherein the extracted features are as follows:

autocorrelation coefficient:

wherein L represents the total length of the time series T; t is_t,t∈[1,L]The value of the time sequence at the time t is obtained; sigma²Is the variance of the time series; μ is the mean of the time series; l is a hysteresis coefficient. The autocorrelation coefficients can help to find repetitive patterns contained in the timing, such as periodic signals that are masked by noise, and the like.

Mean change:

wherein L represents the total length of the time series T; t is_t,t∈[1,L]The value of the time sequence at the time t is taken. Mean change was calculated from the mean of the differences between subsequences, and sequence fluctuations could be captured and mutations identified in sequence values.

Center of average second derivative:

wherein L represents the total length of the time series T; t is_t,t∈[1,L]The value of the time sequence at the time t is taken. Providing a more accurate approximation of the derivative by averaging the second derivative center approximations facilitates summarizing information from both the future and the past contained in the time series.

The extracted features are used as a representative feature vector of the original time series, so that the features of a certain time series can be expressed as:

wherein, V_iRepresenting a characteristic vector corresponding to the ith time sequence; e.g. of the type_ijRepresenting the jth characteristic corresponding to the ith time sequence; m represents the number of extracted features.

In this embodiment, N time series are divided into K different clusters C by a K-Means clustering method according to the obtained feature vector, so as to obtain cluster labels with different time sequences. The optimization target is as follows:

wherein, mu_jIs the average vector of all samples in cluster C; n is the total number of time series.

In this embodiment, the local identification representations corresponding to different time sequence categories are extracted through the obtained category labels, and are used as the identifiable similarity measurement standard. The regularized time series characterization learning objective function is defined as f, the goal of time series characterization learning is to learn several optimal locally recognized characterizations P and a linear hyperplane H to minimize the objective f:

wherein λ is_HIs a regularization coefficient; n is_fThe number of the extracted time series characteristics; eta_jCorresponding to the jth feature in the extracted features of the current time sequence; xi_jJ-th feature in the features extracted corresponding to the residual time sequence;

in this embodiment, the micro contexts are obtained by sampling the time sequence according to a fixed sliding window, and then the distinguishable similarity metric is respectively calculated for each micro context.

First of all, for a time sequence T ═ T according to a given sliding window size W₁,T₂,...,T_L-1,T_LSampling is carried out, and a micro context can be defined as

Where | W | is the size of the sliding window. Similarity calculation is performed with the previously obtained metrics during the sliding sampling, and when the similarity between the micro context and the local identification representation is higher, the micro context contains more identifiable information. Different criteria are thus required to obtain a characterization of micro-context importance:

value squared offset (VSD):

wherein, C_iA micro context obtained by sampling a certain section in the time sequence; p_iE.P is the ith local identification representation serving as the measurement standard; i C_i-P_i||₂Is a vector C_i-P_iThe second norm. The value squared offset is used to measure the mean squared deviation between two vectors.

Mean shift of Value (VMD):

wherein, | | C_i-P_i||₁Is a vector C_i-P_iThe first norm. The mean deviation of the values is used to measure the mean absolute deviation between the two vectors.

Dot Product Ratio (DPR):

wherein, C_ijAs a micro context C_iThe jth value of (d); p_ijA jth value characterizing an ith local identification; and | W | is the size of the sliding window. The dot product ratio is used to measure the ratio of dot products between two vectors. Then, toThe over-stacking way will splice the calculated vectors:

wherein the content of the first and second substances,

representing a vector splicing operation in the channel direction;

representing a time series after the superposition;

representing a squared offset vector of values;

representing a value average offset vector;

representing a dot product ratio vector. Thus, the distinguishable feature information of the original time series is distributed into a plurality of different channels.

This embodiment employs a time-hole convolution with residual concatenation. Hole convolution can be interpreted as the convolution of a down-sampled version of the lower layer features, reducing the resolution of combining information from distant history and the future. By increasing the expansion rate of the hole convolution, it can be made to use more information more efficiently. The hole convolution is defined as:

wherein the content of the first and second substances,

represents the intermediate state of the convolutional layer at time t; is a convolution operation; w is represented atFixed weights for filters of layer l;

presentation pair

Carrying out down-sampling operation; d_lRepresents the expansion rate of the hole convolution;

is the coefficient of expansion; k is the maximum length of the vector.

Wherein l is a layer in the calculation in the time convolution; ShapeletJoint is the local identification characterization fusion step described previously; d_lConvolving the dilation rate in the current layer with time;

weights in the ith layer for time convolution; ReLU is a nonlinear activation function that can be expressed as:

in the algorithm, the receptive field of the model is increased by stacking the layers of the time convolution, so that the model can capture longer-time context distinguishing information in the training process. By the micro context importance feature, the model can have the capability of predicting based on the weights of different points, so that the importance of the context is enhanced.

The present embodiment trains the model using a quantile loss function. Selecting a quantile penalty instead of a squared error enables a gradient descent based learning algorithm to learn a specified quantile instead of a mean, which would be beneficial for obtaining better robustness against outliers in the prediction process. Meanwhile, quantile loss may help to better explain variables other than the mean of the data, thereby helping to understand the results of the non-normal distribution and the non-linear relationship with the predictive variables.

For a given quantile ρ ∈ (0; 1), the real value y of the time series_tAnd rho-quantile prediction

The ρ -quantile loss at this time can be defined as:

wherein the content of the first and second substances,

is defined as:

and finally, performing the processing according to the time sequence segments in the training set and the verification set respectively, and performing time sequence prediction training on the obtained context through time hole convolution with residual connection. After training is finished, the test set can be led into a pre-training model, processed time sequence segments are predicted, and results are output.

Various other modifications and changes may be made by those skilled in the art based on the above-described technical solutions and concepts, and all such modifications and changes should fall within the scope of the claims of the present invention.

The technical effects of the present invention will be described below by comparing the global cavity convolution and local identification representation time series depth model with other various methods.

Common data set: electric (hourly Electricity time series for 370 customers), Solar Energy (Solar data sampled every 10 minutes in alabama in 2016), Traffic (hourly road occupancy for the san francisco highway, between 0 and 1), Taxi (30 minutes records the number of taxis in different blocks of new york city), Wikipedia (records daily online visits per ten thousand Wikipedia pages per day), M4 (from M4 contests and contains different time aggregation lengths, day, month, quarter, and year, corresponding to M4-D, M4-M, M4-Q, and M4-Y, respectively).

Other comparison methods:

ARIMA: the differential integration moving average autoregressive model is a typical time series model and consists of three parts: AR model (autoregressive model) and MA model (moving average model), and order I of the difference;

NTPS: the prediction of the next timestamp is the previous value of the time series, and sampling is carried out according to exponential distribution;

MLP: the forward propagation neural network comprises 3 hidden layers;

WaveNet: the core is an expanded causal convolutional layer which allows neural networks to correctly process time sequences and process long-term dependencies without causing model complexity explosion, alleviating long-term challenges of learning in a large number of time steps;

LSTM: a time cycle neural network can solve the long-term dependence problem of the general cycle neural network;

MQCNN: a sequence-to-sequence neural network architecture using LSTM as an encoder and decoder;

transformer: a sequence-to-sequence neural network architecture with a self-attention mechanism;

TCN: a network structure capable of processing time sequence data can deduce new information at a plurality of time points in the future according to the sequence of each point position in a known sequence;

DeepAR: the time sequence prediction method based on deep learning can conveniently take additional characteristics into consideration, and the prediction target is the probability distribution of the value of the sequence at each time step

NBeats: deep neural networks with interpretability are implemented based on forward and backward residual linkage and a very deep full-link layer stack.

The present embodiment employs a normalized quantile loss sum as an evaluation index that can reduce bias in making error measurements, thereby facilitating comparisons between different data sets or methods of different scales. It is defined as:

wherein y is the true value of the time sequence;

is the predicted value output by the model. We calculate the results when ρ is 0.5 and 0.9, which are denoted as P50QL and P90QL, respectively.

TABLE 1 time series prediction method comparison

The results of the experiment are shown in table 1, with results that are best in performance being shown in bold and results that are suboptimal in performance being underlined. The method of the present invention is superior to other comparison methods in most cases.

Claims

1. A time series prediction method combining global cavity convolution and local identification characterization is characterized by comprising the following steps:

extracting the characteristics of the time sequence segments to obtain representative characteristic vector expressions corresponding to different time sequence segments, and marking cluster labels by a clustering method;

extracting local identification representations (Shapelets) corresponding to different time sequence categories through time sequence representation learning, and taking the local identification representations (Shapelets) as identifiable similarity measurement standards; then, performing multiple distinguishable context fusion processing on the obtained characterization and sequence, and stacking in the channel direction to evolve into distinguishable contexts;

performing the processing according to the training set and the time sequence segments in the verification set respectively, and performing time sequence prediction training on the obtained context through global hole convolution, namely time hole convolution with residual connection;

2. The method for predicting time series by combining convolution of global hole and local identification characterization according to claim 1, wherein the extracting of the time series segment feature includes the following indexes:

autocorrelation coefficient:

wherein L represents the total length of the time series T; t is_t，t∈[1，L]The value of the time sequence at the time t is obtained; sigma²Is the variance of the time series; μ is the mean of the time series; l is a hysteresis coefficient; the autocorrelation coefficients can help to find the repetitive patterns contained in the timing;

mean change:

wherein L represents the total length of the time series T; t is_t，t∈[1，L]The value of the time sequence at the time t is obtained; the mean change is obtained by calculating the mean of the differences between the subsequences, and can capture sequence fluctuation and identify the mutation on the sequence value;

center of average second derivative:

wherein L represents the total length of the time series T; t is_t，t∈[1，L]The value of the time sequence at the time t is obtained; providing a more accurate approximation of the derivative by averaging the second derivative center approximations facilitates summarizing information from both the future and the past contained in the time series.

3. The method of claim 2, wherein the global hole convolution and local identification characterization are combined to predict the time series, and the method comprises: in order to realize the identification of the clustering labels of the segments of different time sequences, the extracted features are used as a representative feature vector of the original time sequence, so that the features of a certain time sequence can be expressed as follows through the vector:

wherein, V_iRepresenting the feature vector corresponding to the ith time series, e_ijRepresenting the jth feature corresponding to the ith time sequence, wherein m represents the number of extracted features;

dividing N time sequences into K different clusters C by a K-Means clustering method according to the obtained feature vectors to obtain cluster labels with different time sequences, wherein the optimization goal is as follows:

4. The method of claim 1, wherein the global hole convolution and local identification characterization are combined to predict the time series, and the method comprises: extracting local identification representations corresponding to different time sequence categories through time sequence representation learning, taking the local identification representations as distinguishable similarity measurement standards, defining a regularized time sequence representation learning target function as gamma, and learning a plurality of optimal local identification representations P and a linear hyperplane H to minimize the target gamma:

wherein λ is_HFor regularizing coefficients, n_fThe number of the extracted time series characteristics; eta_jCorresponding to the jth feature in the extracted features of the current time sequence; xi_jAnd j-th feature in the extracted features corresponding to the residual time sequence.

5. The method of claim 1, wherein the global hole convolution and local identification characterization are combined to predict the time series, and the method comprises: sampling according to a fixed sliding window in a time sequence to obtain micro contexts, and then performing multiple distinguishable context fusion processing on the obtained local identification representations and each micro context respectively, wherein the multiple distinguishable context fusion processing specifically comprises the following steps:

first of all, for a time sequence T ═ T according to a given sliding window size W₁，T₂，...，T_L-1，T_LSampling is carried out, and a micro context can be defined as

Wherein | W | is the size of the sliding window; during the sliding sampling, similarity calculation is carried out with the previously obtained measurement standard, when the similarity of the micro context and the local identification representation is higher, the micro context contains more identifiable information, and therefore different standards are needed to obtain the characteristic representation of the micro context importance:

value squared offset (VSD):

wherein, C_iA micro context obtained by sampling a certain section in the time sequence; p_iE.P is the ith local identification representation serving as the measurement standard; i C_i-P_i||₂Is a vector C_i-P_iA second norm of; the value square offset is used to measure the mean square deviation between two vectors;

mean shift of Value (VMD):

wherein, | | C_i-P_i||₁Is a vector C_i-P_iThe mean deviation of the values used to measure the mean absolute deviation between the two vectors;

dot Product Ratio (DPR):

wherein, C_ijAs a micro context C_iThe jth value of (d); p_ijA jth value characterizing an ith local identification; l W is the size of the sliding window; the dot product ratio is used for measuring the ratio of dot products between two vectors; then, the calculated vectors are spliced by means of superposition:

wherein the content of the first and second substances,

representing a vector splicing operation in the channel direction;

representing a time series after the superposition;

representing a squared offset vector of values;

representing a value average offset vector;

representing a dot product ratio vector, the distinguishable feature information of the original time series is distributed into a plurality of different channels.

6. The method of claim 1, wherein the global hole convolution and local identification characterization are combined to predict the time series, and the method comprises: global hole convolution can be interpreted as convolution of a downsampled version of the lower layer features, reducing the resolution of combining information from distant histories and the future; by increasing the expansion rate of the global hole convolution, more information can be more effectively utilized; the global hole convolution is defined as:

wherein the content of the first and second substances,

indicating the convolutional layer at time t

An intermediate state of (1); is a convolution operation; w is represented at

Fixed weights for filters of a layer;

presentation pair

Carrying out down-sampling operation; d_lRepresents the expansion rate of the global hole convolution; τ is the coefficient of expansion; k is the maximum length of the vector;

increasing the receptive field of the model by stacking the number of layers of the time convolution so as to ensure that the model can capture longer-time context distinguishing information in the training process; by the micro context importance feature, the model can have the capability of predicting based on the weights of different points, so that the importance of the context is enhanced.

7. The method of claim 6, wherein the global hole convolution and local identification characterization are combined to predict the time series, and the method comprises: according to the time sequence of the univariate, clustering is carried out through time sequence characteristic extraction to obtain a corresponding clustering label, and therefore the label obtains local identification representation of the time sequence through unsupervised training; then, taking the representations as measurement standards, calculating similarity with a micro context obtained by sampling through a sliding window according to different standards, acquiring importance characteristics of different point positions, and superposing the importance characteristics in a channel direction; finally, the features are fused and learned through time hole convolution with residual connection, and therefore prediction and result output are conducted.