CN113177633A - Deep decoupling time sequence prediction method - Google Patents

Deep decoupling time sequence prediction method Download PDF

Info

Publication number
CN113177633A
CN113177633A CN202110426703.0A CN202110426703A CN113177633A CN 113177633 A CN113177633 A CN 113177633A CN 202110426703 A CN202110426703 A CN 202110426703A CN 113177633 A CN113177633 A CN 113177633A
Authority
CN
China
Prior art keywords
time series
term
representation
encoder
short
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110426703.0A
Other languages
Chinese (zh)
Other versions
CN113177633B (en
Inventor
陈岭
陈纬奇
张友东
文波
杨成虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202110426703.0A priority Critical patent/CN113177633B/en
Publication of CN113177633A publication Critical patent/CN113177633A/en
Application granted granted Critical
Publication of CN113177633B publication Critical patent/CN113177633B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses a deep decoupling time series prediction method, which comprises the following steps: 1) preprocessing given time series data to construct a training data set; 2) capturing a global change mode shared by a plurality of time sequences by using a vector quantization global feature encoder; 3) capturing a local variation mode specific to a single time sequence by using a local feature encoder, wherein each time sequence has a group of specific local feature encoder parameters and is generated by an adaptive parameter generation module; 4) the outputs of the global and local feature encoders are input to a decoder for prediction. The invention decouples the dynamics of the time sequence into global and local change modes, and respectively models, can solve the problems that the existing model can not fully utilize the knowledge shared in data set and can not fully model the specific local change mode of a single time sequence, thereby improving the prediction precision, and having wide application prospect in the fields of traffic prediction, supply chain management, financial investment and the like.

Description

Deep decoupling time sequence prediction method
Technical Field
The invention relates to the field of time series data prediction, in particular to a deep decoupling time series prediction method.
Background
Time series is widely available in the fields of traffic, electricity, medical treatment, finance and the like. Time series prediction (i.e., predicting the observed value at a next time based on the observed values over a period of historical time) is an important research topic in data mining. In today's big data era, a single time series often does not exist in isolation, and a data set usually contains a plurality of time series with correlation, and the time series has global (shared by a plurality of time series) and local (specific to a single time series) change modes. As shown in fig. 1, the road usage time series of all roads have the same period (24 hours), and have an early peak and a late peak, i.e., a global change pattern; road 1 has a slight morning-evening peak, road 2 has a distinct early peak, no late peak, road 3 has a slight early peak and a distinct late peak, and road 4 has a strong morning-evening peak, i.e. a local change pattern. A good time series prediction model should capture both variation patterns simultaneously.
Statistical machine learning-based time series prediction models, such as AR, ARIMA, exponential smoothing, and linear state space models, trained and predicted on a single time series, cannot model common patterns of variation in a multivariate time series dataset, and therefore cannot benefit from this global knowledge.
Classical deep learning models, such as prediction models based on RNN, TCN and Transformer, are currently the most widely used class of methods in this field. Such models use all data of a data set to train a set of shared model parameters, equally using information of all time series, however, this way of capturing global information through simple parameter sharing is not sufficient, because the model only uses historical data of a single time series as input in prediction, and cannot explicitly introduce global information or other information of related sequences.
Some recent approaches attempt to represent the original time series as a linear combination of k potential time series (k is much smaller than the number of time series in the dataset) using matrix decomposition, capturing common patterns in multivariate time series through the potential time series. However, matrix decomposition works on feature space and cannot capture complex global change patterns.
Disclosure of Invention
In view of the foregoing, an object of the present invention is to provide a deep decoupling time series prediction method, which can improve the prediction accuracy of time series data while reducing the computation consumption by effectively modeling global and local variation patterns of time series.
In order to achieve the purpose, the invention provides the following technical scheme:
a deep decoupling time series prediction method is applied to prediction of time series data in the traffic field, the electric power field, the medical field and the financial field, and comprises the following steps:
collecting a time sequence, and preprocessing the time sequence to obtain a preprocessed time sequence;
the method comprises the steps of constructing a time sequence prediction model, wherein the time sequence prediction model comprises a global feature encoder, an adaptive parameter generation module, a local feature encoder and a decoder, the global feature encoder is used for encoding a time sequence into a global feature representation, the adaptive parameter generation module is used for generating local feature encoder parameters according to the time sequence, the local feature encoder encodes the time sequence into a local feature representation based on the loaded local feature encoder parameters, and the decoder is used for decoding the splicing result of the global feature representation and the local feature representation and outputting predicted time sequence data;
and performing parameter optimization on the time series prediction model by using the time series data, and using the time series prediction model with the optimized parameters for prediction of the time series.
Preferably, the preprocessing includes outlier detection and removal, missing value supplementation, and normalization processing.
Preferably, the global feature encoder includes a short-term feature extractor constructed by a convolutional neural network, a vector quantization module, and a Transformer encoder composed of a plurality of attention modules stacked, where the short-term feature extractor is configured to perform short-term feature extraction on an input time sequence to obtain a short-term representation of the time sequence, and the vector quantization module is configured to perform vectorization encoding on the input short-term representation to obtain an encoded vector; and the Transformer encoder is used for establishing a long-term dependency relationship in the whole time sequence based on the encoded vector and outputting a global feature representation of the time sequence.
Preferably, the adaptive parameter generating module implements coding of the time sequence by using a multi-view contrast-based coding mode, and outputs the local feature coder parameters.
Preferably, the adaptive parameter generating module includes a context identification network and a parameter generating network, wherein the context identification network includes a convolution module, a transform encoder and an LSTM aggregator, which are connected in sequence, and is configured to map the time sequence into a context hidden variable, and the parameter generating network includes a fully-connected network and is configured to generate a parameter of the local feature encoder according to the context hidden variable.
Preferably, the parameters of the local feature encoder are not involved in training and are generated by an adaptive parameter generation module, the local feature encoder includes a short-term feature extractor and a Transformer encoder composed of a plurality of attention modules stacked, wherein the short-term feature extractor is configured to perform short-term feature extraction on the input time series to obtain a short-term representation of the time series, and the Transformer encoder is configured to model a long-term dependency relationship in the entire time series based on the short-term representation and output a local feature representation of the time series.
Preferably, the decoder comprises a convolution module and a plurality of same attention modules, wherein the convolution module is used for performing convolution operation on the result of splicing the input global feature representation and the local feature representation, and the attention module is used for performing connection calculation based on the convolution result and outputting the predicted time series data.
Preferably, the loss function is used for optimizing the parameters of the time series prediction model
Figure BDA0003029855260000042
Comprises the following steps:
Figure BDA0003029855260000041
compared with the prior art, the invention has the beneficial effects that at least:
according to the deep decoupling time sequence prediction method provided by the invention, the dynamic decoupling of the time sequence is a global change mode and a local change mode, and the global change mode and the local change mode are respectively modeled by using a global feature encoder and a local feature encoder, so that a vector quantization global encoder is used for learning an encoding table representing the global change mode, the knowledge shared in a data set is fully utilized for modeling the global change mode, a self-adaptive parameter generation module is used for generating a specific local feature encoder parameter for each time sequence, and the heterogeneous local change mode is effectively modeled. And improving the prediction precision of the time series based on the global and local change modes.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a time-series diagram of road utilization in the background art;
FIG. 2 is an overall flowchart of a deep decoupling time series prediction method provided by the embodiment;
FIG. 3 is an overall block diagram of a deep decoupled time series prediction method provided by an embodiment;
FIG. 4 is a learning process of a global representation and a short-term representation provided by an embodiment;
fig. 5 is a schematic structural diagram of a convolutional Transformer decoder according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
In order to improve the prediction accuracy of the time sequence, the embodiment provides a deep decoupling time sequence prediction method, dynamic decoupling of the time sequence is divided into a global change mode and a local change mode, and after modeling is respectively carried out, time sequence prediction is carried out based on the two change modes. The deep decoupling time series prediction method can be applied to the traffic field, the electric power field, the medical field and the financial field, namely the time series can be data such as road traffic flow, user electricity consumption, stock price and the like.
FIG. 2 is an overall flowchart of a deep decoupling time series prediction method provided by the embodiment; fig. 3 is an overall block diagram of the depth decoupling time series prediction method provided by the embodiment. As shown in fig. 2 and fig. 3, the depth decoupling time series prediction method provided by the embodiment includes the following steps:
step 1, acquiring time sequence data, performing abnormal value elimination processing and normalization processing on the acquired time sequence, and dividing the processed data by using a sliding time window to obtain a training data set.
Outlier detection and removal are performed for a given time series, and invalid values (such as values out of the normal range and missing values) therein are filled using a linear interpolation method. All values in the time series are subjected to min-max normalization processing, so that each value after the processing is normalized to the range of [ -1, 1], and the conversion formula is as follows:
Figure BDA0003029855260000051
where X is the value in the original time series, XminIs the minimum value in the time series, XmaxIs the maximum value in the time series, and x' is the value after normalization.
And manually setting the size T of the time window according to experience, and dividing the normalized data by using sliding step lengths with fixed lengths to obtain a training data set.
And 2, batching the training data set according to a fixed batch size, wherein the total number of batches is N.
The training data set is batched according to an empirically artificially set batch size M, and the total number of batches is N. The specific calculation method is as follows:
Figure BDA0003029855260000061
wherein NSamplesIs the total number of samples in the training dataset.
And 3, sequentially selecting a batch of training samples with the index of k from the training data set, wherein k belongs to {0, 1. Steps 4-10 are repeated for each training sample in the batch.
Step 4, time-sequencing the samples
Figure BDA0003029855260000062
Input to a vector quantized global feature encoder, output global feature representation
Figure BDA0003029855260000063
And calculating a vector quantization constraint objective function
Figure BDA0003029855260000064
Where T represents the input step size.
The global feature encoder shown in fig. 3(b) is used for modeling a global change pattern of a time sequence, and specifically includes the following steps:
first, time-series x samples1:TInputting the short-term representation z to a short-term feature extractor consisting of a multilayer 1D convolutional neural network to obtain a short-term representation z of each time in the sequence1:T. The convolution layer has a sliding step size of 1, a padding mechanism is used to make the output step size consistent with the input, and a small-sized convolution kernel is used to capture the short-term change pattern in the sequence.
Then, the short term representation z1:TAn input Vector Quantization (VQ) module q, the VQ module maintaining a representationCoding table of global change mode
Figure BDA0003029855260000065
It contains F d-dimensional vectors, shared among all sequences, to represent the global change pattern of the sequences, the vectors in the encoding table e being called global representation. The VQ module represents z in a short term at each time1:TRespectively mapped into vectors in a coding table e to obtain
Figure BDA0003029855260000066
In particular, it replaces the original short-term representation z by means of a nearest neighbor search
Figure BDA0003029855260000067
The calculation formula is as follows: i ═ arg minj||z-ej||2, wherein ,eiRepresenting the ith vector in the encoding table e. It should be noted that the arg min operation is not differentiable, and therefore is directly related to the objective function
Figure BDA0003029855260000068
Replaces the gradient with respect to z. In particular, in the forward propagation process, the nearest neighbor global representation
Figure BDA0003029855260000069
The input to the downstream network is, in the reverse propagation process,
Figure BDA00030298552600000610
passed back unchanged to the upstream convolutional network. Under such a training mode, if only the prediction objective function is used, the global representation e in the coding table is encodediIt is not updated, so the invention introduces a vector quantization constraint objective function for learning the global representation:
Figure BDA0003029855260000071
wherein sg () is a gradient truncation operation satisfying sg (z) ≡ z,
Figure BDA0003029855260000072
gamma is an adjustable hyper-parameter. As indicated by the dark grey arrows in fig. 4, in predicting the objective function
Figure BDA0003029855260000073
And
Figure BDA0003029855260000074
under the combined action of z, the short-term representation z selects a suitable global representation, and z and
Figure BDA0003029855260000075
the difference is as small as possible; as indicated by the light grey arrows in figure 4,
Figure BDA0003029855260000076
driving the global representation in the encoding table to move towards the original short-term representation z, when there are multiple original short-term representations mapped to the same global representation, the global representation tends towards the cluster center of these original representations, which also makes the global representation learned by the VQ module more representative.
Finally, modeling long-term dependency relationship in the whole sequence by using a Transformer encoder, and outputting a long-term representation of each time moment of the sequence, namely a global feature representation
Figure BDA00030298552600000714
The Transformer encoder is composed of a plurality of attention modules stacked, and a single module comprises a multi-head self-attention layer and a feedforward network formed by two fully-connected layers (the first layer uses a ReLU activating function, and the second layer uses a linear activating function). Where the attention mechanism may be expressed as a mapping from a query and a set of key/value pairs to an output. It calculates the matching degree of the query and each key, gives a weight coefficient to the value corresponding to each key, and finally takes the weighted sum of the values as output. The calculation method is as follows:
Figure BDA0003029855260000078
wherein ,
Figure BDA0003029855260000079
a query is represented that is,
Figure BDA00030298552600000710
a key is represented that is a key of the key,
Figure BDA00030298552600000711
the value is represented, and the calculation process is divided into three steps: the first step is to compute the vector inner product of the query and each key and divide by a factor
Figure BDA00030298552600000712
Obtaining non-normalized weight coefficients, wherein the factor
Figure BDA00030298552600000713
The adjustment function is realized, and the gradient of the SoftMax function is prevented from disappearing; the second step is to utilize a SoftMax function to normalize the weight coefficient; and thirdly, carrying out weighted summation on the values by utilizing the normalized weight coefficient to obtain final output. The multi-head attention mechanism actually calculates a plurality of groups of attention mechanisms, and the result of the plurality of groups is spliced to obtain output. The calculation method is as follows:
Figure BDA0003029855260000081
wherein Concat represents tensor splicing operations,
Figure BDA0003029855260000082
and
Figure BDA0003029855260000083
for the mapping matrix of the ith set of attention mechanisms, the original query, key, and value are mapped to the corresponding spaces, respectively. In the inventionIn the multi-head self-attention layer, the query Q, the key K and the value V are consistent, and the output of the vector quantization module is arranged in the first layer
Figure BDA0003029855260000084
The subsequent layers are all the output of the previous layer attention module.
Figure BDA0003029855260000085
Long-term representation processed by a Transformer encoder
Figure BDA00030298552600000811
The feedforward network is a fixed substructure in the Transformer model and mainly plays a role in spatial mapping.
Step 5, time sequence x of samples1:TInputting the parameters into an adaptive parameter generation module based on multi-view contrast coding, generating a local feature encoder parameter phi, and calculating a multi-view contrast coding objective function
Figure BDA0003029855260000087
As shown in fig. 3(a), the adaptive parameter generation module based on multi-view contrast coding includes a context identification network and a parameter generation network, and the specific steps of generating the local feature encoder parameter Φ are as follows:
firstly, x is1:TInput to a context recognition network comprising a convolution module, a Transformer encoder and an LSTM aggregator, the context recognition network mapping the input sequence to context hidden variables
Figure BDA0003029855260000088
A multi-view contrast coding (CMC) method and its corresponding KL divergence regularization may enable contextually hidden variables
Figure BDA0003029855260000089
It is possible to fully retain information of the sequence local variation pattern and filter out global information (since it has been modeled in the global feature encoder). CMC utilization contrast learning method maximizationOutput context hidden variables of LSTM aggregator in context recognition network
Figure BDA00030298552600000810
With short-term representation v of the convolution module output(sh)And long-term representation v of the transform encoder output(lo)Mutual information between them, thereby making the situation hidden variable
Figure BDA00030298552600000916
The specific information (specific long/short term variation pattern) in the original sequence can be captured effectively.
The CMC solves two comparative learning tasks, and requires hidden variables in known situations respectively
Figure BDA00030298552600000918
In the case of (2), the correct short-term representation and long-term representation are selected from the interference terms, the corresponding objective functions being
Figure BDA0003029855260000092
And
Figure BDA0003029855260000093
in addition to this, the regularization term
Figure BDA0003029855260000094
Can ensure that
Figure BDA0003029855260000095
Global information is filtered out.
Given the output of the context recognition network, for example, a short-term representation of the comparison task
Figure BDA0003029855260000096
And collections
Figure BDA0003029855260000097
Including a short-term representation of the input time series at time t
Figure BDA0003029855260000098
(Positive samples) and K interference terms
Figure BDA0003029855260000099
(negative examples), the model needs to be taken from the set V(sh)Where the correct short-term representation is identified. The interference term is obtained by uniformly sampling short-term representations of other sample time series in the same batch at various moments. The short-term representation versus the objective function is defined as:
Figure BDA00030298552600000910
wherein ,f1For evaluating the function, the structure is two layers of MLP (the first layer is ReLU activation function, the second layer is linear activation function), and the situation is hidden by the variable
Figure BDA00030298552600000911
And short-term representation
Figure BDA00030298552600000917
After stitching, the function is to measure the degree of match between the two representations, epsilon being the temperature parameter of SoftMax, u (T) being the uniform sampling of time 1,2, …, T.
Similarly, give
Figure BDA00030298552600000913
And collections
Figure BDA00030298552600000914
The long-term representation versus objective function is:
Figure BDA00030298552600000915
f2for evaluating the function, its structure and1similarly, the properties of learning from comparison are:
Figure BDA0003029855260000101
wherein ,
Figure BDA0003029855260000102
to represent
Figure BDA0003029855260000103
Mutual information with x, minimization can be seen
Figure BDA0003029855260000104
Figure BDA0003029855260000105
Can maximize
Figure BDA0003029855260000106
So as to make the situation hidden variable
Figure BDA0003029855260000107
Sufficiently retaining local (unique) information of the sequence x.
To make it possible to
Figure BDA0003029855260000108
The invention can filter global information, and uses KL divergence regularization to ensure that
Figure BDA00030298552600001021
Filtering out global information, and calculating as follows:
Figure BDA00030298552600001010
wherein ,
Figure BDA00030298552600001011
Figure BDA00030298552600001012
identifying network output for context
Figure BDA00030298552600001013
Gaussian posterior distribution. The regularization term is
Figure BDA00030298552600001014
Introducing a priori, the goal being constraint
Figure BDA00030298552600001015
The amount of information of (a) is as small as possible. Therefore, in maximizing mutual information and minimizing
Figure BDA00030298552600001016
Under the combined action of the two objects of information quantity,
Figure BDA00030298552600001017
global information is automatically filtered out, and the reservation can cause
Figure BDA00030298552600001018
The final multi-view contrast objective function calculation mode of the maximized local information is as follows:
Figure BDA00030298552600001019
where α is an adjustable hyper-parameter.
Finally, the situation is hidden variable
Figure BDA00030298552600001020
The parameter is input to a parameter generation network MLP (hidden layer uses the ReLU activation function, output layer uses the linear activation function) composed of multiple fully connected layers and is mapped to the parameter phi of the local feature encoder.
And 6, loading the parameter phi generated by the self-adaptive parameter generation module into a local feature encoder.
Step 7, time sequence x of samples1:TInput to a local feature encoder, and output a local feature representation
Figure BDA0003029855260000111
The structure of the local feature encoder is consistent with that of the global feature encoder (the VQ module q is not included), and the parameter phi of the encoder does not participate in back propagation adjustment, but is directly generated by the adaptive parameter generation module.
The specific process comprises the following steps: first, time-series x samples1:TInputting the short-term representation z to a short-term feature extractor consisting of a multilayer 1D convolutional neural network to obtain a short-term representation z of each time in the sequence1:T(ii) a Then, the short term representation z1:TInputting the data into a Transformer encoder, modeling long-term dependence in the whole sequence by using the Transformer encoder, and outputting local feature representation of each moment of the sequence
Figure BDA0003029855260000112
Step 8, splicing the global feature representation and the local feature representation, and inputting the spliced global feature representation and local feature representation into a convolutional Transformer decoder to obtain prediction output
Figure BDA00030298552600001113
Where τ denotes the prediction step size.
The convolutional Transformer decoder is formed by stacking a convolutional module and a plurality of same attention modules, and the specific structure is shown in fig. 5, wherein the structure of the convolutional module is consistent with that of a convolutional module in an encoder, and the attention module comprises: a layer of mask multi-headed self-attention layer facing the output of the decoder, a backward mask mechanism is introduced to prevent the following data from being seen when predicting data at a certain moment, a layer of multi-headed attention layer facing the output of the encoder, and a feed-forward network consisting of two fully-connected layers (the first layer uses a ReLU activation function, the second layer uses a linear activation function). The last attention module outputs the predicted value of the future tau step
Figure BDA0003029855260000114
Step 9, calculating a prediction objective function
Figure BDA0003029855260000118
I.e. the true value x corresponding to the sample time seriesT+1:T+τAnd predicted value of actual output
Figure BDA0003029855260000115
The error between.
The present invention utilizes the average absolute error as the prediction objective function
Figure BDA0003029855260000119
The calculation formula is as follows:
Figure BDA0003029855260000116
step 10, calculating a prediction objective function
Figure BDA00030298552600001111
Multi-view contrast coding objective function
Figure BDA00030298552600001110
Sum vector quantization constraint objective function
Figure BDA0003029855260000117
Sum of
Figure BDA00030298552600001112
Step 11, according to the loss of all samples in the batch
Figure BDA0003029855260000123
Network parameters in the entire model are adjusted.
Loss of all samples in the batch
Figure BDA0003029855260000124
Figure BDA0003029855260000121
According to the loss
Figure BDA0003029855260000125
Parameters that can be learned in the entire model are adjusted. The update formula is as follows:
Figure BDA0003029855260000122
wherein η is the learning rate.
And 12, repeating the steps 3-11 until all batches of the training data set participate in model training.
And step 13, repeating the steps 3-12 until the specified iteration number is reached.
And step 14, inputting the time sequence of the sample to be predicted into the trained model to obtain a prediction result.
The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims (8)

1. A deep decoupling time series prediction method is applied to prediction of time series data in the traffic field, the power field, the medical field and the financial field, and comprises the following steps:
collecting a time sequence, and preprocessing the time sequence to obtain a preprocessed time sequence;
the method comprises the steps of constructing a time sequence prediction model, wherein the time sequence prediction model comprises a global feature encoder, an adaptive parameter generation module, a local feature encoder and a decoder, the global feature encoder is used for encoding a time sequence into a global feature representation, the adaptive parameter generation module is used for generating local feature encoder parameters according to the time sequence, the local feature encoder encodes the time sequence into a local feature representation based on the loaded local feature encoder parameters, and the decoder is used for decoding the splicing result of the global feature representation and the local feature representation and outputting predicted time sequence data;
and performing parameter optimization on the time series prediction model by using the time series data, and using the time series prediction model with the optimized parameters for prediction of the time series.
2. The method of deep decoupled time series prediction of claim 1, wherein the preprocessing comprises outlier detection and removal, missing value supplementation, and normalization processing.
3. The method for predicting the deeply decoupled time series according to claim 1, wherein the global feature encoder comprises a short-term feature extractor constructed by a convolutional neural network, a vector quantization module, and a Transformer encoder composed of a plurality of attention modules stacked, wherein the short-term feature extractor is used for performing short-term feature extraction on an input time series to obtain a short-term representation of the time series, and the vector quantization module is used for performing vectorization encoding on the input short-term representation to obtain an encoded vector; and the Transformer encoder is used for modeling long-term dependence in the whole time sequence based on the encoded vector and outputting the global characteristic representation of the time sequence.
4. The method of claim 1, wherein the adaptive parameter generating module is configured to encode the time sequence and output local feature encoder parameters by using a multi-view contrast based encoding method.
5. The method of claim 4, wherein the adaptive parameter generation module comprises a context identification network and a parameter generation network, wherein the context identification network comprises a convolution module, a transform encoder and an LSTM aggregator connected in sequence for mapping the time sequence into context hidden variables, and the parameter generation network comprises a fully connected network for generating parameters of the local feature encoder according to the context hidden variables.
6. The method for predicting the deeply decoupled time series according to claim 1, wherein the parameters of the local feature encoder are not involved in training and are generated by an adaptive parameter generation module, the local feature encoder comprises a short-term feature extractor and a Transformer encoder consisting of a plurality of attention modules stacked together, wherein the short-term feature extractor is used for performing short-term feature extraction on the input time series to obtain a short-term representation of the time series, and the Transformer encoder is used for modeling long-term dependency relationships in the whole time series based on the short-term representation and outputting the local feature representation of the time series.
7. The method of deep decoupled time series prediction of claim 1, wherein the decoder comprises a convolution module for performing a convolution operation on the result of the concatenation of the input global feature representation and the local feature representation and a plurality of identical attention modules for performing a concatenation calculation based on the convolution result and outputting the predicted time series data.
8. The method of deep decoupled time series prediction according to claim 1, characterized by the loss function used in the parameter optimization of the time series prediction model
Figure FDA0003029855250000021
Comprises the following steps:
Figure FDA0003029855250000022
wherein ,
Figure FDA0003029855250000023
for the prediction objective function, it is expressed as:
Figure FDA0003029855250000024
Figure FDA0003029855250000025
encoding an objective function for multi-view contrast, expressed as:
Figure FDA0003029855250000031
Figure FDA0003029855250000032
the objective function is constrained for vector quantization, expressed as:
Figure FDA0003029855250000033
wherein ,xT+t
Figure FDA0003029855250000034
Respectively representing the real and predicted values of a time series of future steps tau with respect to time T, tau representing the prediction step length, f being f1 and f2For the purpose of the evaluation function of the comparative learning,
Figure FDA0003029855250000035
a context-hidden variable is represented that is,
Figure FDA0003029855250000036
represents the short-term representation produced by the adaptive parameter generation module,
Figure FDA0003029855250000037
representing a short-term representation of the disturbance term,. epsilon.is the temperature parameter of SoftMax, and (T) is the uniformity of time 1,2, …, TThe sampling is carried out by sampling the sample,
Figure FDA0003029855250000038
identifying network output for context
Figure FDA0003029855250000039
The gaussian posterior distribution of (a) is,
Figure FDA00030298552500000310
representing a mathematical expectation, V(lo)Representing a set of long-term representations, V(sh)Representing a set of short-term representations, alpha being an adjustable hyper-parameter,
Figure FDA00030298552500000311
representing KL divergence, sg () is a gradient truncation operation, satisfying sg (z) z,
Figure FDA00030298552500000312
gamma is an adjustable hyper-parameter, z represents a short-term representation produced by a global feature encoder,
Figure FDA00030298552500000313
representing the vectorized encoding result for z.
CN202110426703.0A 2021-04-20 2021-04-20 Depth decoupling time sequence prediction method Active CN113177633B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110426703.0A CN113177633B (en) 2021-04-20 2021-04-20 Depth decoupling time sequence prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110426703.0A CN113177633B (en) 2021-04-20 2021-04-20 Depth decoupling time sequence prediction method

Publications (2)

Publication Number Publication Date
CN113177633A true CN113177633A (en) 2021-07-27
CN113177633B CN113177633B (en) 2023-04-25

Family

ID=76924167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110426703.0A Active CN113177633B (en) 2021-04-20 2021-04-20 Depth decoupling time sequence prediction method

Country Status (1)

Country Link
CN (1) CN113177633B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762356A (en) * 2021-08-17 2021-12-07 中山大学 Cluster load prediction method and system based on clustering and attention mechanism
CN114239718A (en) * 2021-12-15 2022-03-25 杭州电子科技大学 High-precision long-term time sequence prediction method based on multivariate time sequence data analysis
CN114297379A (en) * 2021-12-16 2022-04-08 中电信数智科技有限公司 Text binary classification method based on Transformer
CN114580710A (en) * 2022-01-28 2022-06-03 西安电子科技大学 Environment monitoring method based on Transformer time sequence prediction
CN114936723A (en) * 2022-07-21 2022-08-23 中国电子科技集团公司第三十研究所 Social network user attribute prediction method and system based on data enhancement
CN115659852A (en) * 2022-12-26 2023-01-31 浙江大学 Layout generation method and device based on discrete potential representation
WO2023070960A1 (en) * 2021-10-29 2023-05-04 中国华能集团清洁能源技术研究院有限公司 Wind power prediction method based on convolutional transformer architecture, and system and device
CN116153089A (en) * 2023-04-24 2023-05-23 云南大学 Traffic flow prediction system and method based on space-time convolution and dynamic diagram
CN116776228A (en) * 2023-08-17 2023-09-19 合肥工业大学 Power grid time sequence data decoupling self-supervision pre-training method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180157771A1 (en) * 2016-12-06 2018-06-07 General Electric Company Real-time adaptation of system high fidelity model in feature space
CN110718301A (en) * 2019-09-26 2020-01-21 东北大学 Alzheimer disease auxiliary diagnosis device and method based on dynamic brain function network
CN111243269A (en) * 2019-12-10 2020-06-05 福州市联创智云信息科技有限公司 Traffic flow prediction method based on depth network integrating space-time characteristics

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180157771A1 (en) * 2016-12-06 2018-06-07 General Electric Company Real-time adaptation of system high fidelity model in feature space
CN110718301A (en) * 2019-09-26 2020-01-21 东北大学 Alzheimer disease auxiliary diagnosis device and method based on dynamic brain function network
CN111243269A (en) * 2019-12-10 2020-06-05 福州市联创智云信息科技有限公司 Traffic flow prediction method based on depth network integrating space-time characteristics

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张志超;史治宇;张杰;: "基于改进MCPP的时变系统模态参数识别" *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762356A (en) * 2021-08-17 2021-12-07 中山大学 Cluster load prediction method and system based on clustering and attention mechanism
CN113762356B (en) * 2021-08-17 2023-06-16 中山大学 Cluster load prediction method and system based on clustering and attention mechanism
WO2023070960A1 (en) * 2021-10-29 2023-05-04 中国华能集团清洁能源技术研究院有限公司 Wind power prediction method based on convolutional transformer architecture, and system and device
CN114239718A (en) * 2021-12-15 2022-03-25 杭州电子科技大学 High-precision long-term time sequence prediction method based on multivariate time sequence data analysis
CN114239718B (en) * 2021-12-15 2024-03-01 杭州电子科技大学 High-precision long-term time sequence prediction method based on multi-element time sequence data analysis
CN114297379A (en) * 2021-12-16 2022-04-08 中电信数智科技有限公司 Text binary classification method based on Transformer
CN114580710A (en) * 2022-01-28 2022-06-03 西安电子科技大学 Environment monitoring method based on Transformer time sequence prediction
CN114580710B (en) * 2022-01-28 2024-04-30 西安电子科技大学 Environmental monitoring method based on transducer time sequence prediction
CN114936723B (en) * 2022-07-21 2023-04-14 中国电子科技集团公司第三十研究所 Social network user attribute prediction method and system based on data enhancement
CN114936723A (en) * 2022-07-21 2022-08-23 中国电子科技集团公司第三十研究所 Social network user attribute prediction method and system based on data enhancement
CN115659852A (en) * 2022-12-26 2023-01-31 浙江大学 Layout generation method and device based on discrete potential representation
CN116153089A (en) * 2023-04-24 2023-05-23 云南大学 Traffic flow prediction system and method based on space-time convolution and dynamic diagram
CN116153089B (en) * 2023-04-24 2023-06-27 云南大学 Traffic flow prediction system and method based on space-time convolution and dynamic diagram
CN116776228A (en) * 2023-08-17 2023-09-19 合肥工业大学 Power grid time sequence data decoupling self-supervision pre-training method and system
CN116776228B (en) * 2023-08-17 2023-10-20 合肥工业大学 Power grid time sequence data decoupling self-supervision pre-training method and system

Also Published As

Publication number Publication date
CN113177633B (en) 2023-04-25

Similar Documents

Publication Publication Date Title
CN113177633A (en) Deep decoupling time sequence prediction method
CN111027772B (en) Multi-factor short-term load prediction method based on PCA-DBILSTM
CN110163429B (en) Short-term load prediction method based on similarity day optimization screening
CN112990556A (en) User power consumption prediction method based on Prophet-LSTM model
CN111832825B (en) Wind power prediction method and system integrating long-term memory network and extreme learning machine
CN111079989B (en) DWT-PCA-LSTM-based water supply amount prediction device for water supply company
CN113128113B (en) Lean information building load prediction method based on deep learning and transfer learning
Bokde et al. PSF: Introduction to R package for pattern sequence based forecasting algorithm
CN108876044B (en) Online content popularity prediction method based on knowledge-enhanced neural network
CN115688579A (en) Basin multi-point water level prediction early warning method based on generation of countermeasure network
CN112884236B (en) Short-term load prediction method and system based on VDM decomposition and LSTM improvement
CN115204035A (en) Generator set operation parameter prediction method and device based on multi-scale time sequence data fusion model and storage medium
CN112434891A (en) Method for predicting solar irradiance time sequence based on WCNN-ALSTM
CN114925767A (en) Scene generation method and device based on variational self-encoder
CN114817773A (en) Time sequence prediction system and method based on multi-stage decomposition and fusion
CN116681152A (en) Short-term load prediction method based on SOM-BP neural network improved Prophet model
CN116432697A (en) Time sequence prediction method integrating long-term memory network and attention mechanism
CN114880538A (en) Attribute graph community detection method based on self-supervision
Vogt et al. Wind power forecasting based on deep neural networks and transfer learning
CN115860232A (en) Steam load prediction method, system, electronic device and medium
CN115081551A (en) RVM line loss model building method and system based on K-Means clustering and optimization
CN115204467A (en) Power load prediction method, device and storage medium
CN111724277A (en) New energy and multi-element load value matching method and system
CN113743668B (en) Household electricity-oriented short-term load prediction method
Han et al. Online aware synapse weighted autoencoder for recovering random missing data in wastewater treatment process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant