CN113177633B - Depth decoupling time sequence prediction method - Google Patents

Depth decoupling time sequence prediction method Download PDF

Info

Publication number
CN113177633B
CN113177633B CN202110426703.0A CN202110426703A CN113177633B CN 113177633 B CN113177633 B CN 113177633B CN 202110426703 A CN202110426703 A CN 202110426703A CN 113177633 B CN113177633 B CN 113177633B
Authority
CN
China
Prior art keywords
road traffic
term
representation
encoder
short
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110426703.0A
Other languages
Chinese (zh)
Other versions
CN113177633A (en
Inventor
陈岭
陈纬奇
张友东
文波
杨成虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202110426703.0A priority Critical patent/CN113177633B/en
Publication of CN113177633A publication Critical patent/CN113177633A/en
Application granted granted Critical
Publication of CN113177633B publication Critical patent/CN113177633B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a depth decoupling time sequence prediction method, which comprises the following steps: 1) Preprocessing given time sequence data to construct a training data set; 2) Capturing a global change pattern shared by a plurality of time sequences by using a vector quantization global feature coder; 3) Capturing a local variation pattern specific to a single time series by using a local feature encoder, wherein each time series has a set of specific local feature encoder parameters, and the local variation pattern is generated by an adaptive parameter generation module; 4) The outputs of the global and local feature encoders are input to a decoder for prediction. According to the invention, the dynamic property of the time sequence is decoupled into the global change mode and the local change mode, and the modeling is performed respectively, so that the problems that the existing model cannot fully utilize the shared knowledge in the data set and cannot fully model the specific local change mode of the single time sequence are solved, the prediction precision is further improved, and the method has wide application prospects in the fields of traffic prediction, supply chain management, financial investment and the like.

Description

Depth decoupling time sequence prediction method
Technical Field
The invention relates to the field of time sequence data prediction, in particular to a depth decoupling time sequence prediction method.
Background
Time series are widely used in the fields of transportation, electricity, medical and financial. Time series prediction (i.e., predicting the observations at a time from observations over a historical period of time) is an important topic of research in data mining. In today's big data age, a single time series often does not exist in isolation, and a data set will typically contain multiple time series with correlation that have global (multiple time series sharing) and local (single time series specific) patterns of variation. As shown in fig. 1, the road usage time series of all roads has the same period (24 hours) and has an early peak and a late peak, i.e., a global change pattern; road 1 has a slight early-late peak, road 2 has a pronounced early peak, no late peak, road 3 has a slight early peak and a pronounced late peak, and road 4 has a strong early-late peak, i.e., a locally varying pattern. A good time series prediction model should capture both modes of variation simultaneously.
Training and prediction on a single time series based on statistical machine-learned time series prediction models, such as AR, ARIMA, exponential smoothing, and linear state space models, cannot model common patterns of variation in a multi-variable time series dataset and therefore cannot benefit from such global knowledge.
Classical deep learning models, such as RNN, TCN and transducer based predictive models, are currently the most widely used class of methods in this field. Such models use all data of the dataset to train a set of shared model parameters and equally use all time series information, however, this way of capturing global information by simple parameter sharing is inadequate because the model only takes as input a single time series of historical data at the time of prediction, for which global information or other related series of information cannot be explicitly introduced.
Some recent approaches attempt to represent the original time series as linear combinations of k potential time series (k is much smaller than the number of time series in the dataset) using matrix decomposition, by capturing common patterns in the multi-variable time series. However, matrix decomposition acts on the feature space, failing to capture complex global patterns of variation.
Disclosure of Invention
In view of the foregoing, an object of the present invention is to provide a depth decoupling time series prediction method, which improves the prediction accuracy of time series data while reducing the calculation consumption by effectively modeling the global and local variation patterns of the time series.
In order to achieve the above object, the present invention provides the following technical solutions:
a depth decoupling time sequence prediction method is applied to prediction of time sequence data in traffic fields, electric power fields, medical fields and financial fields, and comprises the following steps:
collecting a time sequence, and preprocessing the time sequence to obtain a preprocessed time sequence;
the method comprises the steps of constructing a time sequence prediction model, wherein the time sequence prediction model comprises a global feature encoder, an adaptive parameter generation module, a local feature encoder and a decoder, the global feature encoder is used for encoding a time sequence into a global feature representation, the adaptive parameter generation module is used for generating local feature encoder parameters according to the time sequence, the local feature encoder is used for encoding the time sequence into a local feature representation based on the loaded local feature encoder parameters, and the decoder is used for decoding a result of splicing the global feature representation and the local feature representation and outputting predicted time sequence data;
and carrying out parameter optimization on the time sequence prediction model by utilizing the time sequence data, and using the time sequence prediction model with the parameter optimization for the prediction of the time sequence.
Preferably, the preprocessing includes outlier detection and removal, missing value replenishment, and normalization.
Preferably, the global feature encoder comprises a short-term feature extractor constructed by a convolutional neural network, a vector quantization module and a transducer encoder formed by stacking a plurality of attention modules, wherein the short-term feature extractor is used for extracting short-term features of an input time sequence to obtain a short-term representation of the time sequence, and the vector quantization module is used for vectorizing and encoding the input short-term representation to obtain an encoded vector; the transform encoder is used for establishing long-term dependency relationship in the whole time sequence based on the encoded vector and outputting global characteristic representation of the time sequence.
Preferably, the adaptive parameter generation module adopts a multi-field contrast coding mode to realize coding of a time sequence and output local feature encoder parameters.
Preferably, the adaptive parameter generation module comprises a context recognition network and a parameter generation network, wherein the context recognition network comprises a convolution module, a transducer encoder and an LSTM aggregator which are sequentially connected, the convolution module, the transducer encoder and the LSTM aggregator are used for mapping the time sequence into the context hidden variables, and the parameter generation network consists of a fully connected network and is used for generating the parameters of the local feature encoder according to the context hidden variables.
Preferably, the parameters of the local feature encoder do not participate in training, and are generated by the adaptive parameter generation module, the local feature encoder comprises a short-term feature extractor and a transducer encoder formed by stacking a plurality of attention modules, wherein the short-term feature extractor is used for extracting short-term features of an input time sequence to obtain a short-term representation of the time sequence, and the transducer encoder is used for modeling long-term dependency in the whole time sequence based on the short-term representation and outputting the local feature representation of the time sequence.
Preferably, the decoder comprises a convolution module and a plurality of identical attention modules, wherein the convolution module is used for carrying out convolution operation on the spliced result of the input global characteristic representation and the local characteristic representation, and the attention module is used for carrying out connection calculation based on the convolution result and outputting predicted time series data.
Preferably, the loss function is used in parameter optimization of the time series prediction model
Figure BDA0003029855260000042
The method comprises the following steps:
Figure BDA0003029855260000041
compared with the prior art, the invention has the beneficial effects that at least the following steps are included:
according to the depth decoupling time sequence prediction method provided by the invention, the dynamic decoupling of the time sequences is a global and local change mode, and the global and local feature encoders are used for modeling respectively, so that the vector quantization global encoder is used for learning the encoding table for representing the global change mode, the global change mode is modeled by fully utilizing the shared knowledge in the data set, the self-adaptive parameter generation module is used for generating the specific local feature encoder parameters of each time sequence, and the heterogeneous local change mode is effectively modeled. The prediction accuracy of the time series is improved based on the global and local variation patterns.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a road usage time series diagram in the background art;
FIG. 2 is an overall flowchart of a depth decoupling time series prediction method provided by an embodiment;
FIG. 3 is an overall block diagram of a depth decoupling time series prediction method provided by an embodiment;
FIG. 4 is a learning process of a global representation and a short-term representation provided by an embodiment;
fig. 5 is a schematic diagram of a convolutional transducer decoder according to an embodiment.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the scope of the invention.
In order to improve the prediction accuracy of the time sequence, the embodiment provides a depth decoupling time sequence prediction method, which performs time sequence prediction based on two change modes after decoupling the dynamics of the time sequence into a global change mode and a local change mode and respectively modeling. The deep decoupling time sequence prediction method can be applied to the traffic field, the electric power field, the medical field and the financial field, namely, the time sequence can be data such as road traffic flow, user electricity consumption, stock price and the like.
FIG. 2 is an overall flowchart of a depth decoupling time series prediction method provided by an embodiment; fig. 3 is an overall block diagram of a depth decoupling time series prediction method provided by an embodiment. As shown in fig. 2 and 3, the depth decoupling time series prediction method provided by the embodiment includes the following steps:
step 1, collecting time sequence data, carrying out outlier elimination processing and normalization processing on the collected time sequence, and dividing the processed data by utilizing a sliding time window to obtain a training data set.
Outlier detection and removal is performed on a given time series, and invalid values (such as values out of normal range and missing values) therein are filled in using a linear interpolation method. Performing min-max normalization processing on all values in the time sequence, so that each value after processing is normalized to be within the range of [ -1,1], and the conversion formula is as follows:
Figure BDA0003029855260000051
wherein X is a value in the original time series, X min X is the minimum value in the time series max For the maximum value in the time series, x' is the value after normalization.
And according to the empirical human set time window size T, dividing the normalized data by utilizing the sliding step length with fixed length to obtain a training data set.
Step 2, the training data set is batched according to the fixed batch size, and the total number of batches is N.
The training data set is batched according to an empirically artificially set batch size M, with the total number of batches being N. The specific calculation mode is as follows:
Figure BDA0003029855260000061
wherein NSamples Is the total number of samples in the training dataset.
And 3, sequentially selecting a batch of training samples with an index of k from the training data set, wherein k is {0,1, & gt, N }. Steps 4-10 are repeated for each training sample in the batch.
Step 4, time-series the samples
Figure BDA0003029855260000062
Input to the vector quantized global feature encoder, output the global feature representation +.>
Figure BDA0003029855260000063
And calculates the vector quantization constraint objective function +.>
Figure BDA0003029855260000064
Where T represents the input step size. />
The global feature encoder as shown in fig. 3 (b) is used for modeling the global variation pattern of the time sequence, and specifically comprises the following steps:
first, the sample time sequence x 1:T Input to a short-term feature extractor consisting of a multi-layer 1D convolutional neural network to obtain a short-term representation z of each instant in the sequence 1:T . The sliding step of the convolution layer is set to be 1, a padding (padding) mechanism is used, so that the step of output is consistent with the input, and a convolution kernel with a small size is used, so that a short-term change mode in a sequence can be captured.
Then, the short term is denoted z 1:T An input Vector Quantization (VQ) module q, which maintains a coding table representing global change patterns
Figure BDA0003029855260000065
It contains F d-dimensional vectors, shared among all sequences, representing the global pattern of change of the sequence, the vectors in the encoding table e being called global representations. The VQ module represents the short term z for each instant 1:T Mapping into vectors in the coding table e, respectively, to obtain +.>
Figure BDA0003029855260000066
In particular, it is generalReplacing the original short-term representation z by means of nearest neighbor search
Figure BDA0003029855260000067
The calculation formula is as follows: i=argmin j ||z-e j || 2, wherein ,ei Representing the ith vector in the encoding table e. It should be noted that the arg min operation is not differentiable, and therefore is directly related to +.>
Figure BDA0003029855260000068
Instead of the gradient with respect to z. Specifically, during forward propagation, nearest global representation +.>
Figure BDA0003029855260000069
The input downstream network, during the back propagation,
Figure BDA00030298552600000610
pass back unchanged to the upstream convolutional network. In such a training mode, if only the predictive objective function is used, the global representation e in the code table is encoded i No updates are obtained, so the invention introduces vector quantization constraint objective functions for learning the global representation:
Figure BDA0003029855260000071
wherein sg () is a gradient cut-off operation, satisfying sg (z) ≡z,
Figure BDA0003029855260000072
gamma is an adjustable hyper-parameter. As shown by the dark grey arrow in FIG. 4, in the prediction objective function +.>
Figure BDA0003029855260000073
and />
Figure BDA0003029855260000074
Under the combined action of (a), the short term representation z will be chosen to be appropriateGlobal representation, and constraint z and +.>
Figure BDA0003029855260000075
The difference is as small as possible; as indicated by the light grey arrow in figure 4,
Figure BDA0003029855260000076
the global representation in the coding table is driven towards the original short-term representation z, which tends towards the cluster center of these original representations when there are multiple original short-term representations mapped to the same global representation, which also makes the global representation learned by the VQ module more representative.
Finally, modeling long-term dependencies in the whole sequence by using a transducer encoder, and outputting long-term representations at each time of the sequence, namely global feature representations
Figure BDA00030298552600000714
The transducer encoder is composed of a stack of attention modules, a single module comprising a multi-headed self-attention layer and a feed-forward network of two fully-connected layers (the first layer using a ReLU activation function and the second layer using a linear activation function). Where the attention mechanism may be represented as a mapping from a query and a set of key/value pairs to an output. The matching degree of the query and each key is calculated, a weight coefficient is given to the value corresponding to each key, and finally the weighted sum of the values is used as output. The calculation mode is as follows:
Figure BDA0003029855260000078
wherein ,
Figure BDA0003029855260000079
representing inquiry->
Figure BDA00030298552600000710
Indicating key(s)>
Figure BDA00030298552600000711
The representation value is calculated in three steps: the first step is to calculate the vector inner product of the query and each key and divide by a factor +.>
Figure BDA00030298552600000712
Obtaining an unnormalized weight coefficient, wherein the factor +.>
Figure BDA00030298552600000713
Plays a role in regulation, and prevents the gradient of the SoftMax function from disappearing; the second step is to normalize the weight coefficient by SoftMax function; and thirdly, carrying out weighted summation on the values by using the normalized weight coefficients to obtain a final output. The multi-head attention mechanism actually calculates a plurality of groups of attention mechanisms, and the plurality of groups of results are spliced to obtain output, so that the multi-head attention mechanism is used for capturing the correlation of different types existing in the data. The calculation mode is as follows:
Figure BDA0003029855260000081
wherein Concat represents a tensor stitching operation,
Figure BDA0003029855260000082
and />
Figure BDA0003029855260000083
The original query, key and value are mapped to corresponding spaces for the mapping matrix of the ith set of attention mechanisms, respectively. In the multi-head self-attention layer used in the invention, the query Q, the key K and the value V are consistent, and the first layer is the output of the vector quantization module +.>
Figure BDA0003029855260000084
The follow-up layers are all the outputs of the attention module of the previous layer. />
Figure BDA0003029855260000085
Long term representation +.>
Figure BDA00030298552600000811
The feed forward network is a fixed substructure in the transducer model, mainly playing a role in spatial mapping.
Step 5, sample time sequence x 1:T Input to a self-adaptive parameter generation module based on multi-field contrast coding, generate a local feature encoder parameter phi and calculate a multi-field contrast coding objective function
Figure BDA0003029855260000087
As shown in fig. 3 (a), the adaptive parameter generation module based on multi-view contrast coding includes a context recognition network and a parameter generation network, and the specific steps of generating the local feature encoder parameter Φ are as follows:
first, x is 1:T Input to a context recognition network comprising a convolution module, a transducer encoder, and an LSTM aggregator, the context recognition network mapping the input sequence into context hidden variables
Figure BDA0003029855260000088
Multi-view contrast coding (CMC) method and corresponding KL divergence regularities can enable context hidden variables ++>
Figure BDA0003029855260000089
The information of the sequence local variation pattern can be fully preserved and the global information filtered out (as it is already modeled in the global feature encoder). CMC maximizes the output context hidden variable of LSTM aggregator in context recognition network by contrast learning method>
Figure BDA00030298552600000810
Short term representation v with convolution module output (sh) And a long-term representation v of the output of the transducer encoder (lo) Mutual information between them, so that the context hidden variable +.>
Figure BDA00030298552600000916
The method can effectively capture the specific information (specific long/short-term variation mode) in the original sequence.
CMC solves two contrasting learning tasks, respectively requiring hidden variables in known situations
Figure BDA00030298552600000918
In the case of (1) selecting the correct short-term representation and long-term representation from the interference term, the corresponding objective functions are +.>
Figure BDA0003029855260000092
and />
Figure BDA0003029855260000093
Besides, regular term->
Figure BDA0003029855260000094
Can ensure->
Figure BDA0003029855260000095
The global information is filtered out.
Taking the comparative task of short-term representation as an example, the output of a given context recognition network
Figure BDA0003029855260000096
Sum set
Figure BDA0003029855260000097
Which contains a short-term representation of the moment of the input time sequence t +.>
Figure BDA0003029855260000098
(positive sample) and K interference terms->
Figure BDA0003029855260000099
(negative sample), the model needs to be derived from set V (sh) The correct short-term representation is identified. The interference term is evenly sampled from short-term representations of the other sample time series at each instant in the same batch. The short term representation versus objective function is defined as:
Figure BDA00030298552600000910
wherein ,f1 To evaluate the function, the structure is two layers of MLP (the first layer is a ReLU activation function and the second layer is a linear activation function), and the situation hidden variables are used for the method
Figure BDA00030298552600000911
And short term representation +.>
Figure BDA00030298552600000917
After splicing, the input is used for measuring the matching degree between the two representations, epsilon is a temperature parameter of SoftMax, and u (T) is uniform sampling at the moments 1,2, … and T.
Similarly, give
Figure BDA00030298552600000913
And set->
Figure BDA00030298552600000914
The long term representation contrast objective function is: />
Figure BDA00030298552600000915
f 2 To evaluate the function, its structure and f 1 The same, the properties according to the comparative learning are:
Figure BDA0003029855260000101
wherein ,
Figure BDA0003029855260000102
representation->
Figure BDA0003029855260000103
Mutual information with x, it can be seen that the minimization +.>
Figure BDA0003029855260000104
Figure BDA0003029855260000105
Can maximize +.>
Figure BDA0003029855260000106
Thereby making the context hidden variable +.>
Figure BDA0003029855260000107
The local (unique) information of sequence x is fully preserved.
In order to make
Figure BDA0003029855260000108
The global information can be filtered out, and the invention uses KL divergence regularization to enable +.>
Figure BDA00030298552600001021
Filtering out global information, wherein the calculation mode is as follows:
Figure BDA00030298552600001010
wherein ,
Figure BDA00030298552600001011
Figure BDA00030298552600001012
outputting +.>
Figure BDA00030298552600001013
Is a gaussian posterior distribution of (c). The regular term is->
Figure BDA00030298552600001014
A priori is introduced with the aim of constraint->
Figure BDA00030298552600001015
The amount of information in (2) is as small as possible. Therefore, in maximizing mutual information and minimizing +.>
Figure BDA00030298552600001016
Under the combined action of two targets, < ->
Figure BDA00030298552600001017
The global information is automatically filtered out while keeping the possible +.>
Figure BDA00030298552600001018
The maximum local information and the final multi-field contrast objective function are calculated as follows:
Figure BDA00030298552600001019
wherein α is an adjustable hyper-parameter.
Finally, the situation hidden variables
Figure BDA00030298552600001020
The parameters input to the parameter generation network MLP (hidden layer using ReLU activation function and output layer using linear activation function) consisting of multiple fully connected layers are mapped to parameters phi of the local feature encoder.
And 6, loading the parameter phi generated by the adaptive parameter generation module into the local feature encoder.
Step 7, sample time sequence x 1:T Input to a local feature encoder, output of a local feature representation
Figure BDA0003029855260000111
The structure of the local feature encoder is consistent with that of the global feature encoder (the global feature encoder does not comprise the VQ module q), and the parameter phi of the encoder does not participate in back propagation adjustment and is directly generated by the adaptive parameter generation module.
The specific process comprises the following steps: first, the sample is takenThe present time series x 1:T Input to a short-term feature extractor consisting of a multi-layer 1D convolutional neural network to obtain a short-term representation z of each instant in the sequence 1:T The method comprises the steps of carrying out a first treatment on the surface of the Then, the short term is denoted z 1:T Inputting the local characteristic representation of each time point of the output sequence into a transducer encoder, modeling long-term dependency relationship in the whole sequence by using the transducer encoder, and outputting the local characteristic representation of each time point of the sequence
Figure BDA0003029855260000112
Step 8, the global characteristic representation and the local characteristic representation are spliced and then input into a convolution transform decoder to obtain a prediction output
Figure BDA00030298552600001113
Where τ represents the prediction step size.
The convolution transducer decoder is formed by stacking one convolution module and a plurality of identical attention modules, and the specific structure is shown in fig. 5, wherein the convolution module structure is consistent with the convolution module in the encoder, and the attention modules comprise: one layer facing the masked multi-headed self-attention layer output by the decoder, introducing a backward masking mechanism to prevent the following data from being seen when predicting the data at a certain moment, one layer facing the multi-headed attention layer output by the encoder, and a feed-forward network consisting of two fully connected layers (the first layer using a ReLU activation function and the second layer using a linear activation function). The last attention module outputs the predicted value of the future tau step
Figure BDA0003029855260000114
Step 9, calculating a prediction objective function
Figure BDA0003029855260000118
I.e. the real value x corresponding to the time series of samples T+1:T+τ And the predicted value of the actual output +.>
Figure BDA0003029855260000115
Errors between them.
In the present inventionUsing average absolute error as prediction objective function
Figure BDA0003029855260000119
The calculation formula is as follows:
Figure BDA0003029855260000116
step 10, calculating a prediction objective function
Figure BDA00030298552600001111
Multi-field contrast coding objective function>
Figure BDA00030298552600001110
And vector quantization constraint objective function->
Figure BDA0003029855260000117
Sum->
Figure BDA00030298552600001112
Step 11, according to the loss of all samples in the batch
Figure BDA0003029855260000123
And adjusting network parameters in the whole model.
Loss of all samples in the batch
Figure BDA0003029855260000124
Figure BDA0003029855260000121
According to the loss
Figure BDA0003029855260000125
And adjusting the learnable parameters in the whole model. The update formula is as follows:
Figure BDA0003029855260000122
wherein η is the learning rate.
Step 12, repeating steps 3-11 until all batches of the training dataset are involved in model training.
Step 13, repeating steps 3-12 until the specified iteration number is reached.
And step 14, inputting the sample time sequence to be predicted into the trained model to obtain a prediction result.
The foregoing detailed description of the preferred embodiments and advantages of the invention will be appreciated that the foregoing description is merely illustrative of the presently preferred embodiments of the invention, and that no changes, additions, substitutions and equivalents of those embodiments are intended to be included within the scope of the invention.

Claims (8)

1. The depth decoupling time sequence prediction method is characterized by being applied to the prediction of road traffic flow in the traffic field and comprising the following steps of:
collecting time sequence data, wherein the time sequence data is road traffic flow, and preprocessing the road traffic flow to obtain preprocessed road traffic flow;
the method comprises the steps of constructing a time sequence prediction model, wherein the time sequence prediction model comprises a global feature encoder, a self-adaptive parameter generation module, a local feature encoder and a decoder, the global feature encoder is used for encoding road traffic into global feature representation, the self-adaptive parameter generation module is used for generating local feature encoder parameters according to the road traffic, the local feature encoder is used for encoding the road traffic into local feature representation based on the loaded local feature encoder parameters, and the decoder is used for decoding the result of splicing the global feature representation and the local feature representation and outputting predicted road traffic;
and carrying out parameter optimization on the time sequence prediction model by using the road traffic flow, and using the time sequence prediction model with the parameter optimization for predicting the road traffic flow.
2. The depth decoupling time series prediction method of claim 1, wherein the preprocessing comprises outlier detection and removal, missing value replenishment, and normalization.
3. The depth decoupling time series prediction method of claim 1, wherein the global feature encoder comprises a short-term feature extractor constructed from a convolutional neural network, a vector quantization module, and a transducer encoder comprising a stack of attention modules, wherein the short-term feature extractor is configured to extract short-term features of an input road traffic to obtain a short-term representation of the road traffic, and the vector quantization module is configured to vector-encode the input short-term representation to obtain an encoded vector; the transducer encoder is used for modeling long-term dependence in the whole road traffic data based on the encoded vector and outputting global characteristic representation of the road traffic.
4. The method for predicting depth decoupling time series according to claim 1, wherein the adaptive parameter generating module adopts a multi-view contrast coding mode to realize the coding of the road traffic and output the local feature encoder parameters.
5. The depth decoupling time series prediction method of claim 4, wherein the adaptive parameter generation module comprises a context recognition network and a parameter generation network, wherein the context recognition network comprises a convolution module, a transducer encoder, and an LSTM aggregator connected in sequence for mapping road traffic into context hidden variables, and wherein the parameter generation network comprises a fully connected network for generating parameters of the local feature encoder from the context hidden variables.
6. The depth decoupling time series prediction method of claim 1, wherein the parameters of the local feature encoder are not involved in training, are generated by an adaptive parameter generation module, the local feature encoder comprises a short-term feature extractor, and a transform encoder comprising a stack of attention modules, wherein the short-term feature extractor is configured to extract short-term features of the input road traffic to obtain a short-term representation of the road traffic, and the transform encoder is configured to model long-term dependencies in the whole road traffic data based on the short-term representation, and output the local feature representation of the road traffic.
7. The depth decoupling time series prediction method of claim 1, wherein the decoder comprises a convolution module and a plurality of identical attention modules, wherein the convolution module is configured to perform a convolution operation on a result of the concatenation of the input global feature representation and the local feature representation, and the attention module is configured to perform a connection calculation based on the convolution result, and output a predicted road traffic.
8. The depth decoupling time series prediction method of claim 1, wherein a loss function is employed in parameter optimization of the time series prediction model
Figure FDA0003862798070000021
The method comprises the following steps:
Figure FDA0003862798070000022
wherein ,
Figure FDA0003862798070000023
to predict the objective function, it is expressed as: />
Figure FDA0003862798070000024
Figure FDA0003862798070000031
Encoding an objective function for multi-field contrast, expressed as:
Figure FDA0003862798070000032
Figure FDA0003862798070000033
constraint objective function for vector quantization, expressed as:
Figure FDA0003862798070000034
wherein ,xT+t
Figure FDA0003862798070000035
The true value and the predicted value of the road traffic flow are represented by a future τ -step with respect to the time T, τ represents the predicted step length, and f is f 1 and f2 For comparison of learned evaluation functions +.>
Figure FDA0003862798070000036
Representing context hidden variables,>
Figure FDA0003862798070000037
representing the short term representation generated by the adaptive parameter generation module, < >>
Figure FDA0003862798070000038
Representing the disturbance term of the short term representation, ε being the temperature parameter of SoftMax, (T) being the uniform sampling of instants 1,2, …, T, +.>
Figure FDA0003862798070000039
Figure FDA00038627980700000310
Outputting +.>
Figure FDA00038627980700000311
Gaussian posterior distribution of->
Figure FDA00038627980700000312
Representing mathematical expectations, V (lo) Representing a long-term representation set, V (sh) Representing a short term representation set, alpha being an adjustable hyper-parameter,
Figure FDA00038627980700000313
indicating KL divergence, sg () is a gradient cut-off operation, satisfy sg (z) ≡z->
Figure FDA00038627980700000314
Gamma is an adjustable super parameter and z represents the short term representation generated by the global feature encoder,/->
Figure FDA00038627980700000315
The vectorized encoding result for z is shown. />
CN202110426703.0A 2021-04-20 2021-04-20 Depth decoupling time sequence prediction method Active CN113177633B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110426703.0A CN113177633B (en) 2021-04-20 2021-04-20 Depth decoupling time sequence prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110426703.0A CN113177633B (en) 2021-04-20 2021-04-20 Depth decoupling time sequence prediction method

Publications (2)

Publication Number Publication Date
CN113177633A CN113177633A (en) 2021-07-27
CN113177633B true CN113177633B (en) 2023-04-25

Family

ID=76924167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110426703.0A Active CN113177633B (en) 2021-04-20 2021-04-20 Depth decoupling time sequence prediction method

Country Status (1)

Country Link
CN (1) CN113177633B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762356B (en) * 2021-08-17 2023-06-16 中山大学 Cluster load prediction method and system based on clustering and attention mechanism
CN114021803A (en) * 2021-10-29 2022-02-08 华能酒泉风电有限责任公司 Wind power prediction method, system and equipment based on convolution transform architecture
CN114239718B (en) * 2021-12-15 2024-03-01 杭州电子科技大学 High-precision long-term time sequence prediction method based on multi-element time sequence data analysis
CN114297379A (en) * 2021-12-16 2022-04-08 中电信数智科技有限公司 Text binary classification method based on Transformer
CN114580710B (en) * 2022-01-28 2024-04-30 西安电子科技大学 Environmental monitoring method based on transducer time sequence prediction
CN114936723B (en) * 2022-07-21 2023-04-14 中国电子科技集团公司第三十研究所 Social network user attribute prediction method and system based on data enhancement
CN115659852B (en) * 2022-12-26 2023-03-21 浙江大学 Layout generation method and device based on discrete potential representation
CN116153089B (en) * 2023-04-24 2023-06-27 云南大学 Traffic flow prediction system and method based on space-time convolution and dynamic diagram
CN116776228B (en) * 2023-08-17 2023-10-20 合肥工业大学 Power grid time sequence data decoupling self-supervision pre-training method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110718301A (en) * 2019-09-26 2020-01-21 东北大学 Alzheimer disease auxiliary diagnosis device and method based on dynamic brain function network
CN111243269A (en) * 2019-12-10 2020-06-05 福州市联创智云信息科技有限公司 Traffic flow prediction method based on depth network integrating space-time characteristics

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11144683B2 (en) * 2016-12-06 2021-10-12 General Electric Company Real-time adaptation of system high fidelity model in feature space

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110718301A (en) * 2019-09-26 2020-01-21 东北大学 Alzheimer disease auxiliary diagnosis device and method based on dynamic brain function network
CN111243269A (en) * 2019-12-10 2020-06-05 福州市联创智云信息科技有限公司 Traffic flow prediction method based on depth network integrating space-time characteristics

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张志超 ; 史治宇 ; 张杰 ; .基于改进MCPP的时变系统模态参数识别.低温建筑技术.2016,(12),全文. *

Also Published As

Publication number Publication date
CN113177633A (en) 2021-07-27

Similar Documents

Publication Publication Date Title
CN113177633B (en) Depth decoupling time sequence prediction method
Liu et al. Remaining useful life prediction using a novel feature-attention-based end-to-end approach
Yin et al. Deep forest regression for short-term load forecasting of power systems
Ding et al. Point and interval forecasting for wind speed based on linear component extraction
Ibrahim et al. Short‐Time Wind Speed Forecast Using Artificial Learning‐Based Algorithms
CN111079989B (en) DWT-PCA-LSTM-based water supply amount prediction device for water supply company
CN113128113B (en) Lean information building load prediction method based on deep learning and transfer learning
CN113159389A (en) Financial time sequence prediction method based on deep forest generation countermeasure network
CN112434891A (en) Method for predicting solar irradiance time sequence based on WCNN-ALSTM
CN115204035A (en) Generator set operation parameter prediction method and device based on multi-scale time sequence data fusion model and storage medium
CN116643949A (en) Multi-model edge cloud load prediction method and device based on VaDE clustering
CN115840893A (en) Multivariable time series prediction method and device
Lyu et al. Dynamic feature selection for solar irradiance forecasting based on deep reinforcement learning
CN116014722A (en) Sub-solar photovoltaic power generation prediction method and system based on seasonal decomposition and convolution network
Wei et al. A three-stage multi-objective heterogeneous integrated model with decomposition-reconstruction mechanism and adaptive segmentation error correction method for ship motion multi-step prediction
CN117094451B (en) Power consumption prediction method, device and terminal
Vogt et al. Wind power forecasting based on deep neural networks and transfer learning
Ziyabari et al. Multi-branch resnet-transformer for short-term spatio-temporal solar irradiance forecasting
CN116245259B (en) Photovoltaic power generation prediction method and device based on depth feature selection and electronic equipment
CN117251705A (en) Daily natural gas load prediction method
Qi et al. Using stacked auto-encoder and bi-directional LSTM for batch process quality prediction
Xu et al. A hybrid model for multi-step wind speed forecasting based on secondary decomposition, deep learning, and error correction algorithms
CN116402194A (en) Multi-time scale load prediction method based on hybrid neural network
CN115860232A (en) Steam load prediction method, system, electronic device and medium
CN115544890A (en) Short-term power load prediction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant