CN113177633A - Deep decoupling time sequence prediction method - Google Patents
Deep decoupling time sequence prediction method Download PDFInfo
- Publication number
- CN113177633A CN113177633A CN202110426703.0A CN202110426703A CN113177633A CN 113177633 A CN113177633 A CN 113177633A CN 202110426703 A CN202110426703 A CN 202110426703A CN 113177633 A CN113177633 A CN 113177633A
- Authority
- CN
- China
- Prior art keywords
- time series
- term
- representation
- encoder
- short
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 239000013598 vector Substances 0.000 claims abstract description 22
- 230000003044 adaptive effect Effects 0.000 claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 14
- 238000013139 quantization Methods 0.000 claims abstract description 12
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 230000006870 function Effects 0.000 claims description 27
- 230000007774 longterm Effects 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 5
- 238000013527 convolutional neural network Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 238000005457 optimization Methods 0.000 claims description 3
- 238000013450 outlier detection Methods 0.000 claims description 3
- 230000000052 comparative effect Effects 0.000 claims description 2
- 230000009469 supplementation Effects 0.000 claims description 2
- 238000011156 evaluation Methods 0.000 claims 1
- 230000008859 change Effects 0.000 abstract description 22
- 238000013068 supply chain management Methods 0.000 abstract 1
- 230000004913 activation Effects 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 2
- 238000007792 addition Methods 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- YHXISWVBGDMDLQ-UHFFFAOYSA-N moclobemide Chemical compound C1=CC(Cl)=CC=C1C(=O)NCCN1CCOCC1 YHXISWVBGDMDLQ-UHFFFAOYSA-N 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention discloses a deep decoupling time series prediction method, which comprises the following steps: 1) preprocessing given time series data to construct a training data set; 2) capturing a global change mode shared by a plurality of time sequences by using a vector quantization global feature encoder; 3) capturing a local variation mode specific to a single time sequence by using a local feature encoder, wherein each time sequence has a group of specific local feature encoder parameters and is generated by an adaptive parameter generation module; 4) the outputs of the global and local feature encoders are input to a decoder for prediction. The invention decouples the dynamics of the time sequence into global and local change modes, and respectively models, can solve the problems that the existing model can not fully utilize the knowledge shared in data set and can not fully model the specific local change mode of a single time sequence, thereby improving the prediction precision, and having wide application prospect in the fields of traffic prediction, supply chain management, financial investment and the like.
Description
Technical Field
The invention relates to the field of time series data prediction, in particular to a deep decoupling time series prediction method.
Background
Time series is widely available in the fields of traffic, electricity, medical treatment, finance and the like. Time series prediction (i.e., predicting the observed value at a next time based on the observed values over a period of historical time) is an important research topic in data mining. In today's big data era, a single time series often does not exist in isolation, and a data set usually contains a plurality of time series with correlation, and the time series has global (shared by a plurality of time series) and local (specific to a single time series) change modes. As shown in fig. 1, the road usage time series of all roads have the same period (24 hours), and have an early peak and a late peak, i.e., a global change pattern; road 1 has a slight morning-evening peak, road 2 has a distinct early peak, no late peak, road 3 has a slight early peak and a distinct late peak, and road 4 has a strong morning-evening peak, i.e. a local change pattern. A good time series prediction model should capture both variation patterns simultaneously.
Statistical machine learning-based time series prediction models, such as AR, ARIMA, exponential smoothing, and linear state space models, trained and predicted on a single time series, cannot model common patterns of variation in a multivariate time series dataset, and therefore cannot benefit from this global knowledge.
Classical deep learning models, such as prediction models based on RNN, TCN and Transformer, are currently the most widely used class of methods in this field. Such models use all data of a data set to train a set of shared model parameters, equally using information of all time series, however, this way of capturing global information through simple parameter sharing is not sufficient, because the model only uses historical data of a single time series as input in prediction, and cannot explicitly introduce global information or other information of related sequences.
Some recent approaches attempt to represent the original time series as a linear combination of k potential time series (k is much smaller than the number of time series in the dataset) using matrix decomposition, capturing common patterns in multivariate time series through the potential time series. However, matrix decomposition works on feature space and cannot capture complex global change patterns.
Disclosure of Invention
In view of the foregoing, an object of the present invention is to provide a deep decoupling time series prediction method, which can improve the prediction accuracy of time series data while reducing the computation consumption by effectively modeling global and local variation patterns of time series.
In order to achieve the purpose, the invention provides the following technical scheme:
a deep decoupling time series prediction method is applied to prediction of time series data in the traffic field, the electric power field, the medical field and the financial field, and comprises the following steps:
collecting a time sequence, and preprocessing the time sequence to obtain a preprocessed time sequence;
the method comprises the steps of constructing a time sequence prediction model, wherein the time sequence prediction model comprises a global feature encoder, an adaptive parameter generation module, a local feature encoder and a decoder, the global feature encoder is used for encoding a time sequence into a global feature representation, the adaptive parameter generation module is used for generating local feature encoder parameters according to the time sequence, the local feature encoder encodes the time sequence into a local feature representation based on the loaded local feature encoder parameters, and the decoder is used for decoding the splicing result of the global feature representation and the local feature representation and outputting predicted time sequence data;
and performing parameter optimization on the time series prediction model by using the time series data, and using the time series prediction model with the optimized parameters for prediction of the time series.
Preferably, the preprocessing includes outlier detection and removal, missing value supplementation, and normalization processing.
Preferably, the global feature encoder includes a short-term feature extractor constructed by a convolutional neural network, a vector quantization module, and a Transformer encoder composed of a plurality of attention modules stacked, where the short-term feature extractor is configured to perform short-term feature extraction on an input time sequence to obtain a short-term representation of the time sequence, and the vector quantization module is configured to perform vectorization encoding on the input short-term representation to obtain an encoded vector; and the Transformer encoder is used for establishing a long-term dependency relationship in the whole time sequence based on the encoded vector and outputting a global feature representation of the time sequence.
Preferably, the adaptive parameter generating module implements coding of the time sequence by using a multi-view contrast-based coding mode, and outputs the local feature coder parameters.
Preferably, the adaptive parameter generating module includes a context identification network and a parameter generating network, wherein the context identification network includes a convolution module, a transform encoder and an LSTM aggregator, which are connected in sequence, and is configured to map the time sequence into a context hidden variable, and the parameter generating network includes a fully-connected network and is configured to generate a parameter of the local feature encoder according to the context hidden variable.
Preferably, the parameters of the local feature encoder are not involved in training and are generated by an adaptive parameter generation module, the local feature encoder includes a short-term feature extractor and a Transformer encoder composed of a plurality of attention modules stacked, wherein the short-term feature extractor is configured to perform short-term feature extraction on the input time series to obtain a short-term representation of the time series, and the Transformer encoder is configured to model a long-term dependency relationship in the entire time series based on the short-term representation and output a local feature representation of the time series.
Preferably, the decoder comprises a convolution module and a plurality of same attention modules, wherein the convolution module is used for performing convolution operation on the result of splicing the input global feature representation and the local feature representation, and the attention module is used for performing connection calculation based on the convolution result and outputting the predicted time series data.
Preferably, the loss function is used for optimizing the parameters of the time series prediction modelComprises the following steps:
compared with the prior art, the invention has the beneficial effects that at least:
according to the deep decoupling time sequence prediction method provided by the invention, the dynamic decoupling of the time sequence is a global change mode and a local change mode, and the global change mode and the local change mode are respectively modeled by using a global feature encoder and a local feature encoder, so that a vector quantization global encoder is used for learning an encoding table representing the global change mode, the knowledge shared in a data set is fully utilized for modeling the global change mode, a self-adaptive parameter generation module is used for generating a specific local feature encoder parameter for each time sequence, and the heterogeneous local change mode is effectively modeled. And improving the prediction precision of the time series based on the global and local change modes.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a time-series diagram of road utilization in the background art;
FIG. 2 is an overall flowchart of a deep decoupling time series prediction method provided by the embodiment;
FIG. 3 is an overall block diagram of a deep decoupled time series prediction method provided by an embodiment;
FIG. 4 is a learning process of a global representation and a short-term representation provided by an embodiment;
fig. 5 is a schematic structural diagram of a convolutional Transformer decoder according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
In order to improve the prediction accuracy of the time sequence, the embodiment provides a deep decoupling time sequence prediction method, dynamic decoupling of the time sequence is divided into a global change mode and a local change mode, and after modeling is respectively carried out, time sequence prediction is carried out based on the two change modes. The deep decoupling time series prediction method can be applied to the traffic field, the electric power field, the medical field and the financial field, namely the time series can be data such as road traffic flow, user electricity consumption, stock price and the like.
FIG. 2 is an overall flowchart of a deep decoupling time series prediction method provided by the embodiment; fig. 3 is an overall block diagram of the depth decoupling time series prediction method provided by the embodiment. As shown in fig. 2 and fig. 3, the depth decoupling time series prediction method provided by the embodiment includes the following steps:
step 1, acquiring time sequence data, performing abnormal value elimination processing and normalization processing on the acquired time sequence, and dividing the processed data by using a sliding time window to obtain a training data set.
Outlier detection and removal are performed for a given time series, and invalid values (such as values out of the normal range and missing values) therein are filled using a linear interpolation method. All values in the time series are subjected to min-max normalization processing, so that each value after the processing is normalized to the range of [ -1, 1], and the conversion formula is as follows:
where X is the value in the original time series, XminIs the minimum value in the time series, XmaxIs the maximum value in the time series, and x' is the value after normalization.
And manually setting the size T of the time window according to experience, and dividing the normalized data by using sliding step lengths with fixed lengths to obtain a training data set.
And 2, batching the training data set according to a fixed batch size, wherein the total number of batches is N.
The training data set is batched according to an empirically artificially set batch size M, and the total number of batches is N. The specific calculation method is as follows:
wherein NSamplesIs the total number of samples in the training dataset.
And 3, sequentially selecting a batch of training samples with the index of k from the training data set, wherein k belongs to {0, 1. Steps 4-10 are repeated for each training sample in the batch.
Step 4, time-sequencing the samplesInput to a vector quantized global feature encoder, output global feature representationAnd calculating a vector quantization constraint objective functionWhere T represents the input step size.
The global feature encoder shown in fig. 3(b) is used for modeling a global change pattern of a time sequence, and specifically includes the following steps:
first, time-series x samples1:TInputting the short-term representation z to a short-term feature extractor consisting of a multilayer 1D convolutional neural network to obtain a short-term representation z of each time in the sequence1:T. The convolution layer has a sliding step size of 1, a padding mechanism is used to make the output step size consistent with the input, and a small-sized convolution kernel is used to capture the short-term change pattern in the sequence.
Then, the short term representation z1:TAn input Vector Quantization (VQ) module q, the VQ module maintaining a representationCoding table of global change modeIt contains F d-dimensional vectors, shared among all sequences, to represent the global change pattern of the sequences, the vectors in the encoding table e being called global representation. The VQ module represents z in a short term at each time1:TRespectively mapped into vectors in a coding table e to obtainIn particular, it replaces the original short-term representation z by means of a nearest neighbor searchThe calculation formula is as follows: i ═ arg minj||z-ej||2, wherein ,eiRepresenting the ith vector in the encoding table e. It should be noted that the arg min operation is not differentiable, and therefore is directly related to the objective functionReplaces the gradient with respect to z. In particular, in the forward propagation process, the nearest neighbor global representationThe input to the downstream network is, in the reverse propagation process,passed back unchanged to the upstream convolutional network. Under such a training mode, if only the prediction objective function is used, the global representation e in the coding table is encodediIt is not updated, so the invention introduces a vector quantization constraint objective function for learning the global representation:
wherein sg () is a gradient truncation operation satisfying sg (z) ≡ z,gamma is an adjustable hyper-parameter. As indicated by the dark grey arrows in fig. 4, in predicting the objective functionAndunder the combined action of z, the short-term representation z selects a suitable global representation, and z andthe difference is as small as possible; as indicated by the light grey arrows in figure 4,driving the global representation in the encoding table to move towards the original short-term representation z, when there are multiple original short-term representations mapped to the same global representation, the global representation tends towards the cluster center of these original representations, which also makes the global representation learned by the VQ module more representative.
Finally, modeling long-term dependency relationship in the whole sequence by using a Transformer encoder, and outputting a long-term representation of each time moment of the sequence, namely a global feature representationThe Transformer encoder is composed of a plurality of attention modules stacked, and a single module comprises a multi-head self-attention layer and a feedforward network formed by two fully-connected layers (the first layer uses a ReLU activating function, and the second layer uses a linear activating function). Where the attention mechanism may be expressed as a mapping from a query and a set of key/value pairs to an output. It calculates the matching degree of the query and each key, gives a weight coefficient to the value corresponding to each key, and finally takes the weighted sum of the values as output. The calculation method is as follows:
wherein ,a query is represented that is,a key is represented that is a key of the key,the value is represented, and the calculation process is divided into three steps: the first step is to compute the vector inner product of the query and each key and divide by a factorObtaining non-normalized weight coefficients, wherein the factorThe adjustment function is realized, and the gradient of the SoftMax function is prevented from disappearing; the second step is to utilize a SoftMax function to normalize the weight coefficient; and thirdly, carrying out weighted summation on the values by utilizing the normalized weight coefficient to obtain final output. The multi-head attention mechanism actually calculates a plurality of groups of attention mechanisms, and the result of the plurality of groups is spliced to obtain output. The calculation method is as follows:
wherein Concat represents tensor splicing operations,andfor the mapping matrix of the ith set of attention mechanisms, the original query, key, and value are mapped to the corresponding spaces, respectively. In the inventionIn the multi-head self-attention layer, the query Q, the key K and the value V are consistent, and the output of the vector quantization module is arranged in the first layerThe subsequent layers are all the output of the previous layer attention module.Long-term representation processed by a Transformer encoderThe feedforward network is a fixed substructure in the Transformer model and mainly plays a role in spatial mapping.
Step 5, time sequence x of samples1:TInputting the parameters into an adaptive parameter generation module based on multi-view contrast coding, generating a local feature encoder parameter phi, and calculating a multi-view contrast coding objective function
As shown in fig. 3(a), the adaptive parameter generation module based on multi-view contrast coding includes a context identification network and a parameter generation network, and the specific steps of generating the local feature encoder parameter Φ are as follows:
firstly, x is1:TInput to a context recognition network comprising a convolution module, a Transformer encoder and an LSTM aggregator, the context recognition network mapping the input sequence to context hidden variablesA multi-view contrast coding (CMC) method and its corresponding KL divergence regularization may enable contextually hidden variablesIt is possible to fully retain information of the sequence local variation pattern and filter out global information (since it has been modeled in the global feature encoder). CMC utilization contrast learning method maximizationOutput context hidden variables of LSTM aggregator in context recognition networkWith short-term representation v of the convolution module output(sh)And long-term representation v of the transform encoder output(lo)Mutual information between them, thereby making the situation hidden variableThe specific information (specific long/short term variation pattern) in the original sequence can be captured effectively.
The CMC solves two comparative learning tasks, and requires hidden variables in known situations respectivelyIn the case of (2), the correct short-term representation and long-term representation are selected from the interference terms, the corresponding objective functions beingAndin addition to this, the regularization termCan ensure thatGlobal information is filtered out.
Given the output of the context recognition network, for example, a short-term representation of the comparison taskAnd collectionsIncluding a short-term representation of the input time series at time t(Positive samples) and K interference terms(negative examples), the model needs to be taken from the set V(sh)Where the correct short-term representation is identified. The interference term is obtained by uniformly sampling short-term representations of other sample time series in the same batch at various moments. The short-term representation versus the objective function is defined as:
wherein ,f1For evaluating the function, the structure is two layers of MLP (the first layer is ReLU activation function, the second layer is linear activation function), and the situation is hidden by the variableAnd short-term representationAfter stitching, the function is to measure the degree of match between the two representations, epsilon being the temperature parameter of SoftMax, u (T) being the uniform sampling of time 1,2, …, T.
f2for evaluating the function, its structure and1similarly, the properties of learning from comparison are:
wherein ,to representMutual information with x, minimization can be seen Can maximizeSo as to make the situation hidden variableSufficiently retaining local (unique) information of the sequence x.
To make it possible toThe invention can filter global information, and uses KL divergence regularization to ensure thatFiltering out global information, and calculating as follows:
wherein , identifying network output for contextGaussian posterior distribution. The regularization term isIntroducing a priori, the goal being constraintThe amount of information of (a) is as small as possible. Therefore, in maximizing mutual information and minimizingUnder the combined action of the two objects of information quantity,global information is automatically filtered out, and the reservation can causeThe final multi-view contrast objective function calculation mode of the maximized local information is as follows:
where α is an adjustable hyper-parameter.
Finally, the situation is hidden variableThe parameter is input to a parameter generation network MLP (hidden layer uses the ReLU activation function, output layer uses the linear activation function) composed of multiple fully connected layers and is mapped to the parameter phi of the local feature encoder.
And 6, loading the parameter phi generated by the self-adaptive parameter generation module into a local feature encoder.
Step 7, time sequence x of samples1:TInput to a local feature encoder, and output a local feature representation
The structure of the local feature encoder is consistent with that of the global feature encoder (the VQ module q is not included), and the parameter phi of the encoder does not participate in back propagation adjustment, but is directly generated by the adaptive parameter generation module.
The specific process comprises the following steps: first, time-series x samples1:TInputting the short-term representation z to a short-term feature extractor consisting of a multilayer 1D convolutional neural network to obtain a short-term representation z of each time in the sequence1:T(ii) a Then, the short term representation z1:TInputting the data into a Transformer encoder, modeling long-term dependence in the whole sequence by using the Transformer encoder, and outputting local feature representation of each moment of the sequence
Step 8, splicing the global feature representation and the local feature representation, and inputting the spliced global feature representation and local feature representation into a convolutional Transformer decoder to obtain prediction outputWhere τ denotes the prediction step size.
The convolutional Transformer decoder is formed by stacking a convolutional module and a plurality of same attention modules, and the specific structure is shown in fig. 5, wherein the structure of the convolutional module is consistent with that of a convolutional module in an encoder, and the attention module comprises: a layer of mask multi-headed self-attention layer facing the output of the decoder, a backward mask mechanism is introduced to prevent the following data from being seen when predicting data at a certain moment, a layer of multi-headed attention layer facing the output of the encoder, and a feed-forward network consisting of two fully-connected layers (the first layer uses a ReLU activation function, the second layer uses a linear activation function). The last attention module outputs the predicted value of the future tau step
Step 9, calculating a prediction objective functionI.e. the true value x corresponding to the sample time seriesT+1:T+τAnd predicted value of actual outputThe error between.
The present invention utilizes the average absolute error as the prediction objective functionThe calculation formula is as follows:
step 10, calculating a prediction objective functionMulti-view contrast coding objective functionSum vector quantization constraint objective functionSum of
Step 11, according to the loss of all samples in the batchNetwork parameters in the entire model are adjusted.
According to the lossParameters that can be learned in the entire model are adjusted. The update formula is as follows:
wherein η is the learning rate.
And 12, repeating the steps 3-11 until all batches of the training data set participate in model training.
And step 13, repeating the steps 3-12 until the specified iteration number is reached.
And step 14, inputting the time sequence of the sample to be predicted into the trained model to obtain a prediction result.
The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.
Claims (8)
1. A deep decoupling time series prediction method is applied to prediction of time series data in the traffic field, the power field, the medical field and the financial field, and comprises the following steps:
collecting a time sequence, and preprocessing the time sequence to obtain a preprocessed time sequence;
the method comprises the steps of constructing a time sequence prediction model, wherein the time sequence prediction model comprises a global feature encoder, an adaptive parameter generation module, a local feature encoder and a decoder, the global feature encoder is used for encoding a time sequence into a global feature representation, the adaptive parameter generation module is used for generating local feature encoder parameters according to the time sequence, the local feature encoder encodes the time sequence into a local feature representation based on the loaded local feature encoder parameters, and the decoder is used for decoding the splicing result of the global feature representation and the local feature representation and outputting predicted time sequence data;
and performing parameter optimization on the time series prediction model by using the time series data, and using the time series prediction model with the optimized parameters for prediction of the time series.
2. The method of deep decoupled time series prediction of claim 1, wherein the preprocessing comprises outlier detection and removal, missing value supplementation, and normalization processing.
3. The method for predicting the deeply decoupled time series according to claim 1, wherein the global feature encoder comprises a short-term feature extractor constructed by a convolutional neural network, a vector quantization module, and a Transformer encoder composed of a plurality of attention modules stacked, wherein the short-term feature extractor is used for performing short-term feature extraction on an input time series to obtain a short-term representation of the time series, and the vector quantization module is used for performing vectorization encoding on the input short-term representation to obtain an encoded vector; and the Transformer encoder is used for modeling long-term dependence in the whole time sequence based on the encoded vector and outputting the global characteristic representation of the time sequence.
4. The method of claim 1, wherein the adaptive parameter generating module is configured to encode the time sequence and output local feature encoder parameters by using a multi-view contrast based encoding method.
5. The method of claim 4, wherein the adaptive parameter generation module comprises a context identification network and a parameter generation network, wherein the context identification network comprises a convolution module, a transform encoder and an LSTM aggregator connected in sequence for mapping the time sequence into context hidden variables, and the parameter generation network comprises a fully connected network for generating parameters of the local feature encoder according to the context hidden variables.
6. The method for predicting the deeply decoupled time series according to claim 1, wherein the parameters of the local feature encoder are not involved in training and are generated by an adaptive parameter generation module, the local feature encoder comprises a short-term feature extractor and a Transformer encoder consisting of a plurality of attention modules stacked together, wherein the short-term feature extractor is used for performing short-term feature extraction on the input time series to obtain a short-term representation of the time series, and the Transformer encoder is used for modeling long-term dependency relationships in the whole time series based on the short-term representation and outputting the local feature representation of the time series.
7. The method of deep decoupled time series prediction of claim 1, wherein the decoder comprises a convolution module for performing a convolution operation on the result of the concatenation of the input global feature representation and the local feature representation and a plurality of identical attention modules for performing a concatenation calculation based on the convolution result and outputting the predicted time series data.
8. The method of deep decoupled time series prediction according to claim 1, characterized by the loss function used in the parameter optimization of the time series prediction modelComprises the following steps:
wherein ,xT+t、Respectively representing the real and predicted values of a time series of future steps tau with respect to time T, tau representing the prediction step length, f being f1 and f2For the purpose of the evaluation function of the comparative learning,a context-hidden variable is represented that is,represents the short-term representation produced by the adaptive parameter generation module,representing a short-term representation of the disturbance term,. epsilon.is the temperature parameter of SoftMax, and (T) is the uniformity of time 1,2, …, TThe sampling is carried out by sampling the sample,identifying network output for contextThe gaussian posterior distribution of (a) is,representing a mathematical expectation, V(lo)Representing a set of long-term representations, V(sh)Representing a set of short-term representations, alpha being an adjustable hyper-parameter,representing KL divergence, sg () is a gradient truncation operation, satisfying sg (z) z,gamma is an adjustable hyper-parameter, z represents a short-term representation produced by a global feature encoder,representing the vectorized encoding result for z.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110426703.0A CN113177633B (en) | 2021-04-20 | 2021-04-20 | Depth decoupling time sequence prediction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110426703.0A CN113177633B (en) | 2021-04-20 | 2021-04-20 | Depth decoupling time sequence prediction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113177633A true CN113177633A (en) | 2021-07-27 |
CN113177633B CN113177633B (en) | 2023-04-25 |
Family
ID=76924167
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110426703.0A Active CN113177633B (en) | 2021-04-20 | 2021-04-20 | Depth decoupling time sequence prediction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113177633B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113762356A (en) * | 2021-08-17 | 2021-12-07 | 中山大学 | Cluster load prediction method and system based on clustering and attention mechanism |
CN114239718A (en) * | 2021-12-15 | 2022-03-25 | 杭州电子科技大学 | High-precision long-term time sequence prediction method based on multivariate time sequence data analysis |
CN114297379A (en) * | 2021-12-16 | 2022-04-08 | 中电信数智科技有限公司 | Text binary classification method based on Transformer |
CN114580710A (en) * | 2022-01-28 | 2022-06-03 | 西安电子科技大学 | Environment monitoring method based on Transformer time sequence prediction |
CN114936723A (en) * | 2022-07-21 | 2022-08-23 | 中国电子科技集团公司第三十研究所 | Social network user attribute prediction method and system based on data enhancement |
CN115659852A (en) * | 2022-12-26 | 2023-01-31 | 浙江大学 | Layout generation method and device based on discrete potential representation |
WO2023070960A1 (en) * | 2021-10-29 | 2023-05-04 | 中国华能集团清洁能源技术研究院有限公司 | Wind power prediction method based on convolutional transformer architecture, and system and device |
CN116153089A (en) * | 2023-04-24 | 2023-05-23 | 云南大学 | Traffic flow prediction system and method based on space-time convolution and dynamic diagram |
CN116776228A (en) * | 2023-08-17 | 2023-09-19 | 合肥工业大学 | Power grid time sequence data decoupling self-supervision pre-training method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180157771A1 (en) * | 2016-12-06 | 2018-06-07 | General Electric Company | Real-time adaptation of system high fidelity model in feature space |
CN110718301A (en) * | 2019-09-26 | 2020-01-21 | 东北大学 | Alzheimer disease auxiliary diagnosis device and method based on dynamic brain function network |
CN111243269A (en) * | 2019-12-10 | 2020-06-05 | 福州市联创智云信息科技有限公司 | Traffic flow prediction method based on depth network integrating space-time characteristics |
-
2021
- 2021-04-20 CN CN202110426703.0A patent/CN113177633B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180157771A1 (en) * | 2016-12-06 | 2018-06-07 | General Electric Company | Real-time adaptation of system high fidelity model in feature space |
CN110718301A (en) * | 2019-09-26 | 2020-01-21 | 东北大学 | Alzheimer disease auxiliary diagnosis device and method based on dynamic brain function network |
CN111243269A (en) * | 2019-12-10 | 2020-06-05 | 福州市联创智云信息科技有限公司 | Traffic flow prediction method based on depth network integrating space-time characteristics |
Non-Patent Citations (1)
Title |
---|
张志超;史治宇;张杰;: "基于改进MCPP的时变系统模态参数识别" * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113762356A (en) * | 2021-08-17 | 2021-12-07 | 中山大学 | Cluster load prediction method and system based on clustering and attention mechanism |
CN113762356B (en) * | 2021-08-17 | 2023-06-16 | 中山大学 | Cluster load prediction method and system based on clustering and attention mechanism |
WO2023070960A1 (en) * | 2021-10-29 | 2023-05-04 | 中国华能集团清洁能源技术研究院有限公司 | Wind power prediction method based on convolutional transformer architecture, and system and device |
CN114239718A (en) * | 2021-12-15 | 2022-03-25 | 杭州电子科技大学 | High-precision long-term time sequence prediction method based on multivariate time sequence data analysis |
CN114239718B (en) * | 2021-12-15 | 2024-03-01 | 杭州电子科技大学 | High-precision long-term time sequence prediction method based on multi-element time sequence data analysis |
CN114297379A (en) * | 2021-12-16 | 2022-04-08 | 中电信数智科技有限公司 | Text binary classification method based on Transformer |
CN114580710A (en) * | 2022-01-28 | 2022-06-03 | 西安电子科技大学 | Environment monitoring method based on Transformer time sequence prediction |
CN114580710B (en) * | 2022-01-28 | 2024-04-30 | 西安电子科技大学 | Environmental monitoring method based on transducer time sequence prediction |
CN114936723B (en) * | 2022-07-21 | 2023-04-14 | 中国电子科技集团公司第三十研究所 | Social network user attribute prediction method and system based on data enhancement |
CN114936723A (en) * | 2022-07-21 | 2022-08-23 | 中国电子科技集团公司第三十研究所 | Social network user attribute prediction method and system based on data enhancement |
CN115659852A (en) * | 2022-12-26 | 2023-01-31 | 浙江大学 | Layout generation method and device based on discrete potential representation |
CN116153089A (en) * | 2023-04-24 | 2023-05-23 | 云南大学 | Traffic flow prediction system and method based on space-time convolution and dynamic diagram |
CN116153089B (en) * | 2023-04-24 | 2023-06-27 | 云南大学 | Traffic flow prediction system and method based on space-time convolution and dynamic diagram |
CN116776228A (en) * | 2023-08-17 | 2023-09-19 | 合肥工业大学 | Power grid time sequence data decoupling self-supervision pre-training method and system |
CN116776228B (en) * | 2023-08-17 | 2023-10-20 | 合肥工业大学 | Power grid time sequence data decoupling self-supervision pre-training method and system |
Also Published As
Publication number | Publication date |
---|---|
CN113177633B (en) | 2023-04-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113177633A (en) | Deep decoupling time sequence prediction method | |
CN111027772B (en) | Multi-factor short-term load prediction method based on PCA-DBILSTM | |
CN110163429B (en) | Short-term load prediction method based on similarity day optimization screening | |
CN112990556A (en) | User power consumption prediction method based on Prophet-LSTM model | |
CN111832825B (en) | Wind power prediction method and system integrating long-term memory network and extreme learning machine | |
CN111079989B (en) | DWT-PCA-LSTM-based water supply amount prediction device for water supply company | |
CN113128113B (en) | Lean information building load prediction method based on deep learning and transfer learning | |
Bokde et al. | PSF: Introduction to R package for pattern sequence based forecasting algorithm | |
CN108876044B (en) | Online content popularity prediction method based on knowledge-enhanced neural network | |
CN115688579A (en) | Basin multi-point water level prediction early warning method based on generation of countermeasure network | |
CN112884236B (en) | Short-term load prediction method and system based on VDM decomposition and LSTM improvement | |
CN115204035A (en) | Generator set operation parameter prediction method and device based on multi-scale time sequence data fusion model and storage medium | |
CN112434891A (en) | Method for predicting solar irradiance time sequence based on WCNN-ALSTM | |
CN114925767A (en) | Scene generation method and device based on variational self-encoder | |
CN114817773A (en) | Time sequence prediction system and method based on multi-stage decomposition and fusion | |
CN116681152A (en) | Short-term load prediction method based on SOM-BP neural network improved Prophet model | |
CN116432697A (en) | Time sequence prediction method integrating long-term memory network and attention mechanism | |
CN114880538A (en) | Attribute graph community detection method based on self-supervision | |
Vogt et al. | Wind power forecasting based on deep neural networks and transfer learning | |
CN115860232A (en) | Steam load prediction method, system, electronic device and medium | |
CN115081551A (en) | RVM line loss model building method and system based on K-Means clustering and optimization | |
CN115204467A (en) | Power load prediction method, device and storage medium | |
CN111724277A (en) | New energy and multi-element load value matching method and system | |
CN113743668B (en) | Household electricity-oriented short-term load prediction method | |
Han et al. | Online aware synapse weighted autoencoder for recovering random missing data in wastewater treatment process |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |