CN113177633B - Depth decoupling time sequence prediction method - Google Patents
Depth decoupling time sequence prediction method Download PDFInfo
- Publication number
- CN113177633B CN113177633B CN202110426703.0A CN202110426703A CN113177633B CN 113177633 B CN113177633 B CN 113177633B CN 202110426703 A CN202110426703 A CN 202110426703A CN 113177633 B CN113177633 B CN 113177633B
- Authority
- CN
- China
- Prior art keywords
- road traffic
- term
- representation
- encoder
- short
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 239000013598 vector Substances 0.000 claims abstract description 22
- 238000012549 training Methods 0.000 claims abstract description 15
- 230000003044 adaptive effect Effects 0.000 claims abstract description 13
- 238000013139 quantization Methods 0.000 claims abstract description 12
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 230000006870 function Effects 0.000 claims description 28
- 230000007774 longterm Effects 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 7
- 238000005457 optimization Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 5
- 238000013527 convolutional neural network Methods 0.000 claims description 4
- 238000013450 outlier detection Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 2
- 238000011156 evaluation Methods 0.000 claims 1
- 238000005111 flow chemistry technique Methods 0.000 claims 1
- 230000008859 change Effects 0.000 abstract description 15
- 238000013068 supply chain management Methods 0.000 abstract 1
- 230000004913 activation Effects 0.000 description 8
- 230000007246 mechanism Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- YHXISWVBGDMDLQ-UHFFFAOYSA-N moclobemide Chemical compound C1=CC(Cl)=CC=C1C(=O)NCCN1CCOCC1 YHXISWVBGDMDLQ-UHFFFAOYSA-N 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a depth decoupling time sequence prediction method, which comprises the following steps: 1) Preprocessing given time sequence data to construct a training data set; 2) Capturing a global change pattern shared by a plurality of time sequences by using a vector quantization global feature coder; 3) Capturing a local variation pattern specific to a single time series by using a local feature encoder, wherein each time series has a set of specific local feature encoder parameters, and the local variation pattern is generated by an adaptive parameter generation module; 4) The outputs of the global and local feature encoders are input to a decoder for prediction. According to the invention, the dynamic property of the time sequence is decoupled into the global change mode and the local change mode, and the modeling is performed respectively, so that the problems that the existing model cannot fully utilize the shared knowledge in the data set and cannot fully model the specific local change mode of the single time sequence are solved, the prediction precision is further improved, and the method has wide application prospects in the fields of traffic prediction, supply chain management, financial investment and the like.
Description
Technical Field
The invention relates to the field of time sequence data prediction, in particular to a depth decoupling time sequence prediction method.
Background
Time series are widely used in the fields of transportation, electricity, medical and financial. Time series prediction (i.e., predicting the observations at a time from observations over a historical period of time) is an important topic of research in data mining. In today's big data age, a single time series often does not exist in isolation, and a data set will typically contain multiple time series with correlation that have global (multiple time series sharing) and local (single time series specific) patterns of variation. As shown in fig. 1, the road usage time series of all roads has the same period (24 hours) and has an early peak and a late peak, i.e., a global change pattern; road 1 has a slight early-late peak, road 2 has a pronounced early peak, no late peak, road 3 has a slight early peak and a pronounced late peak, and road 4 has a strong early-late peak, i.e., a locally varying pattern. A good time series prediction model should capture both modes of variation simultaneously.
Training and prediction on a single time series based on statistical machine-learned time series prediction models, such as AR, ARIMA, exponential smoothing, and linear state space models, cannot model common patterns of variation in a multi-variable time series dataset and therefore cannot benefit from such global knowledge.
Classical deep learning models, such as RNN, TCN and transducer based predictive models, are currently the most widely used class of methods in this field. Such models use all data of the dataset to train a set of shared model parameters and equally use all time series information, however, this way of capturing global information by simple parameter sharing is inadequate because the model only takes as input a single time series of historical data at the time of prediction, for which global information or other related series of information cannot be explicitly introduced.
Some recent approaches attempt to represent the original time series as linear combinations of k potential time series (k is much smaller than the number of time series in the dataset) using matrix decomposition, by capturing common patterns in the multi-variable time series. However, matrix decomposition acts on the feature space, failing to capture complex global patterns of variation.
Disclosure of Invention
In view of the foregoing, an object of the present invention is to provide a depth decoupling time series prediction method, which improves the prediction accuracy of time series data while reducing the calculation consumption by effectively modeling the global and local variation patterns of the time series.
In order to achieve the above object, the present invention provides the following technical solutions:
a depth decoupling time sequence prediction method is applied to prediction of time sequence data in traffic fields, electric power fields, medical fields and financial fields, and comprises the following steps:
collecting a time sequence, and preprocessing the time sequence to obtain a preprocessed time sequence;
the method comprises the steps of constructing a time sequence prediction model, wherein the time sequence prediction model comprises a global feature encoder, an adaptive parameter generation module, a local feature encoder and a decoder, the global feature encoder is used for encoding a time sequence into a global feature representation, the adaptive parameter generation module is used for generating local feature encoder parameters according to the time sequence, the local feature encoder is used for encoding the time sequence into a local feature representation based on the loaded local feature encoder parameters, and the decoder is used for decoding a result of splicing the global feature representation and the local feature representation and outputting predicted time sequence data;
and carrying out parameter optimization on the time sequence prediction model by utilizing the time sequence data, and using the time sequence prediction model with the parameter optimization for the prediction of the time sequence.
Preferably, the preprocessing includes outlier detection and removal, missing value replenishment, and normalization.
Preferably, the global feature encoder comprises a short-term feature extractor constructed by a convolutional neural network, a vector quantization module and a transducer encoder formed by stacking a plurality of attention modules, wherein the short-term feature extractor is used for extracting short-term features of an input time sequence to obtain a short-term representation of the time sequence, and the vector quantization module is used for vectorizing and encoding the input short-term representation to obtain an encoded vector; the transform encoder is used for establishing long-term dependency relationship in the whole time sequence based on the encoded vector and outputting global characteristic representation of the time sequence.
Preferably, the adaptive parameter generation module adopts a multi-field contrast coding mode to realize coding of a time sequence and output local feature encoder parameters.
Preferably, the adaptive parameter generation module comprises a context recognition network and a parameter generation network, wherein the context recognition network comprises a convolution module, a transducer encoder and an LSTM aggregator which are sequentially connected, the convolution module, the transducer encoder and the LSTM aggregator are used for mapping the time sequence into the context hidden variables, and the parameter generation network consists of a fully connected network and is used for generating the parameters of the local feature encoder according to the context hidden variables.
Preferably, the parameters of the local feature encoder do not participate in training, and are generated by the adaptive parameter generation module, the local feature encoder comprises a short-term feature extractor and a transducer encoder formed by stacking a plurality of attention modules, wherein the short-term feature extractor is used for extracting short-term features of an input time sequence to obtain a short-term representation of the time sequence, and the transducer encoder is used for modeling long-term dependency in the whole time sequence based on the short-term representation and outputting the local feature representation of the time sequence.
Preferably, the decoder comprises a convolution module and a plurality of identical attention modules, wherein the convolution module is used for carrying out convolution operation on the spliced result of the input global characteristic representation and the local characteristic representation, and the attention module is used for carrying out connection calculation based on the convolution result and outputting predicted time series data.
Preferably, the loss function is used in parameter optimization of the time series prediction modelThe method comprises the following steps:
compared with the prior art, the invention has the beneficial effects that at least the following steps are included:
according to the depth decoupling time sequence prediction method provided by the invention, the dynamic decoupling of the time sequences is a global and local change mode, and the global and local feature encoders are used for modeling respectively, so that the vector quantization global encoder is used for learning the encoding table for representing the global change mode, the global change mode is modeled by fully utilizing the shared knowledge in the data set, the self-adaptive parameter generation module is used for generating the specific local feature encoder parameters of each time sequence, and the heterogeneous local change mode is effectively modeled. The prediction accuracy of the time series is improved based on the global and local variation patterns.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a road usage time series diagram in the background art;
FIG. 2 is an overall flowchart of a depth decoupling time series prediction method provided by an embodiment;
FIG. 3 is an overall block diagram of a depth decoupling time series prediction method provided by an embodiment;
FIG. 4 is a learning process of a global representation and a short-term representation provided by an embodiment;
fig. 5 is a schematic diagram of a convolutional transducer decoder according to an embodiment.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the scope of the invention.
In order to improve the prediction accuracy of the time sequence, the embodiment provides a depth decoupling time sequence prediction method, which performs time sequence prediction based on two change modes after decoupling the dynamics of the time sequence into a global change mode and a local change mode and respectively modeling. The deep decoupling time sequence prediction method can be applied to the traffic field, the electric power field, the medical field and the financial field, namely, the time sequence can be data such as road traffic flow, user electricity consumption, stock price and the like.
FIG. 2 is an overall flowchart of a depth decoupling time series prediction method provided by an embodiment; fig. 3 is an overall block diagram of a depth decoupling time series prediction method provided by an embodiment. As shown in fig. 2 and 3, the depth decoupling time series prediction method provided by the embodiment includes the following steps:
step 1, collecting time sequence data, carrying out outlier elimination processing and normalization processing on the collected time sequence, and dividing the processed data by utilizing a sliding time window to obtain a training data set.
Outlier detection and removal is performed on a given time series, and invalid values (such as values out of normal range and missing values) therein are filled in using a linear interpolation method. Performing min-max normalization processing on all values in the time sequence, so that each value after processing is normalized to be within the range of [ -1,1], and the conversion formula is as follows:
wherein X is a value in the original time series, X min X is the minimum value in the time series max For the maximum value in the time series, x' is the value after normalization.
And according to the empirical human set time window size T, dividing the normalized data by utilizing the sliding step length with fixed length to obtain a training data set.
Step 2, the training data set is batched according to the fixed batch size, and the total number of batches is N.
The training data set is batched according to an empirically artificially set batch size M, with the total number of batches being N. The specific calculation mode is as follows:
wherein NSamples Is the total number of samples in the training dataset.
And 3, sequentially selecting a batch of training samples with an index of k from the training data set, wherein k is {0,1, & gt, N }. Steps 4-10 are repeated for each training sample in the batch.
Step 4, time-series the samplesInput to the vector quantized global feature encoder, output the global feature representation +.>And calculates the vector quantization constraint objective function +.>Where T represents the input step size. />
The global feature encoder as shown in fig. 3 (b) is used for modeling the global variation pattern of the time sequence, and specifically comprises the following steps:
first, the sample time sequence x 1:T Input to a short-term feature extractor consisting of a multi-layer 1D convolutional neural network to obtain a short-term representation z of each instant in the sequence 1:T . The sliding step of the convolution layer is set to be 1, a padding (padding) mechanism is used, so that the step of output is consistent with the input, and a convolution kernel with a small size is used, so that a short-term change mode in a sequence can be captured.
Then, the short term is denoted z 1:T An input Vector Quantization (VQ) module q, which maintains a coding table representing global change patternsIt contains F d-dimensional vectors, shared among all sequences, representing the global pattern of change of the sequence, the vectors in the encoding table e being called global representations. The VQ module represents the short term z for each instant 1:T Mapping into vectors in the coding table e, respectively, to obtain +.>In particular, it is generalReplacing the original short-term representation z by means of nearest neighbor searchThe calculation formula is as follows: i=argmin j ||z-e j || 2, wherein ,ei Representing the ith vector in the encoding table e. It should be noted that the arg min operation is not differentiable, and therefore is directly related to +.>Instead of the gradient with respect to z. Specifically, during forward propagation, nearest global representation +.>The input downstream network, during the back propagation,pass back unchanged to the upstream convolutional network. In such a training mode, if only the predictive objective function is used, the global representation e in the code table is encoded i No updates are obtained, so the invention introduces vector quantization constraint objective functions for learning the global representation:
wherein sg () is a gradient cut-off operation, satisfying sg (z) ≡z,gamma is an adjustable hyper-parameter. As shown by the dark grey arrow in FIG. 4, in the prediction objective function +.> and />Under the combined action of (a), the short term representation z will be chosen to be appropriateGlobal representation, and constraint z and +.>The difference is as small as possible; as indicated by the light grey arrow in figure 4,the global representation in the coding table is driven towards the original short-term representation z, which tends towards the cluster center of these original representations when there are multiple original short-term representations mapped to the same global representation, which also makes the global representation learned by the VQ module more representative.
Finally, modeling long-term dependencies in the whole sequence by using a transducer encoder, and outputting long-term representations at each time of the sequence, namely global feature representationsThe transducer encoder is composed of a stack of attention modules, a single module comprising a multi-headed self-attention layer and a feed-forward network of two fully-connected layers (the first layer using a ReLU activation function and the second layer using a linear activation function). Where the attention mechanism may be represented as a mapping from a query and a set of key/value pairs to an output. The matching degree of the query and each key is calculated, a weight coefficient is given to the value corresponding to each key, and finally the weighted sum of the values is used as output. The calculation mode is as follows:
wherein ,representing inquiry->Indicating key(s)>The representation value is calculated in three steps: the first step is to calculate the vector inner product of the query and each key and divide by a factor +.>Obtaining an unnormalized weight coefficient, wherein the factor +.>Plays a role in regulation, and prevents the gradient of the SoftMax function from disappearing; the second step is to normalize the weight coefficient by SoftMax function; and thirdly, carrying out weighted summation on the values by using the normalized weight coefficients to obtain a final output. The multi-head attention mechanism actually calculates a plurality of groups of attention mechanisms, and the plurality of groups of results are spliced to obtain output, so that the multi-head attention mechanism is used for capturing the correlation of different types existing in the data. The calculation mode is as follows:
wherein Concat represents a tensor stitching operation, and />The original query, key and value are mapped to corresponding spaces for the mapping matrix of the ith set of attention mechanisms, respectively. In the multi-head self-attention layer used in the invention, the query Q, the key K and the value V are consistent, and the first layer is the output of the vector quantization module +.>The follow-up layers are all the outputs of the attention module of the previous layer. />Long term representation +.>The feed forward network is a fixed substructure in the transducer model, mainly playing a role in spatial mapping.
Step 5, sample time sequence x 1:T Input to a self-adaptive parameter generation module based on multi-field contrast coding, generate a local feature encoder parameter phi and calculate a multi-field contrast coding objective function
As shown in fig. 3 (a), the adaptive parameter generation module based on multi-view contrast coding includes a context recognition network and a parameter generation network, and the specific steps of generating the local feature encoder parameter Φ are as follows:
first, x is 1:T Input to a context recognition network comprising a convolution module, a transducer encoder, and an LSTM aggregator, the context recognition network mapping the input sequence into context hidden variablesMulti-view contrast coding (CMC) method and corresponding KL divergence regularities can enable context hidden variables ++>The information of the sequence local variation pattern can be fully preserved and the global information filtered out (as it is already modeled in the global feature encoder). CMC maximizes the output context hidden variable of LSTM aggregator in context recognition network by contrast learning method>Short term representation v with convolution module output (sh) And a long-term representation v of the output of the transducer encoder (lo) Mutual information between them, so that the context hidden variable +.>The method can effectively capture the specific information (specific long/short-term variation mode) in the original sequence.
CMC solves two contrasting learning tasks, respectively requiring hidden variables in known situationsIn the case of (1) selecting the correct short-term representation and long-term representation from the interference term, the corresponding objective functions are +.> and />Besides, regular term->Can ensure->The global information is filtered out.
Taking the comparative task of short-term representation as an example, the output of a given context recognition networkSum setWhich contains a short-term representation of the moment of the input time sequence t +.>(positive sample) and K interference terms->(negative sample), the model needs to be derived from set V (sh) The correct short-term representation is identified. The interference term is evenly sampled from short-term representations of the other sample time series at each instant in the same batch. The short term representation versus objective function is defined as:
wherein ,f1 To evaluate the function, the structure is two layers of MLP (the first layer is a ReLU activation function and the second layer is a linear activation function), and the situation hidden variables are used for the methodAnd short term representation +.>After splicing, the input is used for measuring the matching degree between the two representations, epsilon is a temperature parameter of SoftMax, and u (T) is uniform sampling at the moments 1,2, … and T.
f 2 To evaluate the function, its structure and f 1 The same, the properties according to the comparative learning are:
wherein ,representation->Mutual information with x, it can be seen that the minimization +.> Can maximize +.>Thereby making the context hidden variable +.>The local (unique) information of sequence x is fully preserved.
In order to makeThe global information can be filtered out, and the invention uses KL divergence regularization to enable +.>Filtering out global information, wherein the calculation mode is as follows:
wherein , outputting +.>Is a gaussian posterior distribution of (c). The regular term is->A priori is introduced with the aim of constraint->The amount of information in (2) is as small as possible. Therefore, in maximizing mutual information and minimizing +.>Under the combined action of two targets, < ->The global information is automatically filtered out while keeping the possible +.>The maximum local information and the final multi-field contrast objective function are calculated as follows:
wherein α is an adjustable hyper-parameter.
Finally, the situation hidden variablesThe parameters input to the parameter generation network MLP (hidden layer using ReLU activation function and output layer using linear activation function) consisting of multiple fully connected layers are mapped to parameters phi of the local feature encoder.
And 6, loading the parameter phi generated by the adaptive parameter generation module into the local feature encoder.
Step 7, sample time sequence x 1:T Input to a local feature encoder, output of a local feature representation
The structure of the local feature encoder is consistent with that of the global feature encoder (the global feature encoder does not comprise the VQ module q), and the parameter phi of the encoder does not participate in back propagation adjustment and is directly generated by the adaptive parameter generation module.
The specific process comprises the following steps: first, the sample is takenThe present time series x 1:T Input to a short-term feature extractor consisting of a multi-layer 1D convolutional neural network to obtain a short-term representation z of each instant in the sequence 1:T The method comprises the steps of carrying out a first treatment on the surface of the Then, the short term is denoted z 1:T Inputting the local characteristic representation of each time point of the output sequence into a transducer encoder, modeling long-term dependency relationship in the whole sequence by using the transducer encoder, and outputting the local characteristic representation of each time point of the sequence
Step 8, the global characteristic representation and the local characteristic representation are spliced and then input into a convolution transform decoder to obtain a prediction outputWhere τ represents the prediction step size.
The convolution transducer decoder is formed by stacking one convolution module and a plurality of identical attention modules, and the specific structure is shown in fig. 5, wherein the convolution module structure is consistent with the convolution module in the encoder, and the attention modules comprise: one layer facing the masked multi-headed self-attention layer output by the decoder, introducing a backward masking mechanism to prevent the following data from being seen when predicting the data at a certain moment, one layer facing the multi-headed attention layer output by the encoder, and a feed-forward network consisting of two fully connected layers (the first layer using a ReLU activation function and the second layer using a linear activation function). The last attention module outputs the predicted value of the future tau step
Step 9, calculating a prediction objective functionI.e. the real value x corresponding to the time series of samples T+1:T+τ And the predicted value of the actual output +.>Errors between them.
In the present inventionUsing average absolute error as prediction objective functionThe calculation formula is as follows:
step 10, calculating a prediction objective functionMulti-field contrast coding objective function>And vector quantization constraint objective function->Sum->
Step 11, according to the loss of all samples in the batchAnd adjusting network parameters in the whole model.
According to the lossAnd adjusting the learnable parameters in the whole model. The update formula is as follows:
wherein η is the learning rate.
Step 12, repeating steps 3-11 until all batches of the training dataset are involved in model training.
Step 13, repeating steps 3-12 until the specified iteration number is reached.
And step 14, inputting the sample time sequence to be predicted into the trained model to obtain a prediction result.
The foregoing detailed description of the preferred embodiments and advantages of the invention will be appreciated that the foregoing description is merely illustrative of the presently preferred embodiments of the invention, and that no changes, additions, substitutions and equivalents of those embodiments are intended to be included within the scope of the invention.
Claims (8)
1. The depth decoupling time sequence prediction method is characterized by being applied to the prediction of road traffic flow in the traffic field and comprising the following steps of:
collecting time sequence data, wherein the time sequence data is road traffic flow, and preprocessing the road traffic flow to obtain preprocessed road traffic flow;
the method comprises the steps of constructing a time sequence prediction model, wherein the time sequence prediction model comprises a global feature encoder, a self-adaptive parameter generation module, a local feature encoder and a decoder, the global feature encoder is used for encoding road traffic into global feature representation, the self-adaptive parameter generation module is used for generating local feature encoder parameters according to the road traffic, the local feature encoder is used for encoding the road traffic into local feature representation based on the loaded local feature encoder parameters, and the decoder is used for decoding the result of splicing the global feature representation and the local feature representation and outputting predicted road traffic;
and carrying out parameter optimization on the time sequence prediction model by using the road traffic flow, and using the time sequence prediction model with the parameter optimization for predicting the road traffic flow.
2. The depth decoupling time series prediction method of claim 1, wherein the preprocessing comprises outlier detection and removal, missing value replenishment, and normalization.
3. The depth decoupling time series prediction method of claim 1, wherein the global feature encoder comprises a short-term feature extractor constructed from a convolutional neural network, a vector quantization module, and a transducer encoder comprising a stack of attention modules, wherein the short-term feature extractor is configured to extract short-term features of an input road traffic to obtain a short-term representation of the road traffic, and the vector quantization module is configured to vector-encode the input short-term representation to obtain an encoded vector; the transducer encoder is used for modeling long-term dependence in the whole road traffic data based on the encoded vector and outputting global characteristic representation of the road traffic.
4. The method for predicting depth decoupling time series according to claim 1, wherein the adaptive parameter generating module adopts a multi-view contrast coding mode to realize the coding of the road traffic and output the local feature encoder parameters.
5. The depth decoupling time series prediction method of claim 4, wherein the adaptive parameter generation module comprises a context recognition network and a parameter generation network, wherein the context recognition network comprises a convolution module, a transducer encoder, and an LSTM aggregator connected in sequence for mapping road traffic into context hidden variables, and wherein the parameter generation network comprises a fully connected network for generating parameters of the local feature encoder from the context hidden variables.
6. The depth decoupling time series prediction method of claim 1, wherein the parameters of the local feature encoder are not involved in training, are generated by an adaptive parameter generation module, the local feature encoder comprises a short-term feature extractor, and a transform encoder comprising a stack of attention modules, wherein the short-term feature extractor is configured to extract short-term features of the input road traffic to obtain a short-term representation of the road traffic, and the transform encoder is configured to model long-term dependencies in the whole road traffic data based on the short-term representation, and output the local feature representation of the road traffic.
7. The depth decoupling time series prediction method of claim 1, wherein the decoder comprises a convolution module and a plurality of identical attention modules, wherein the convolution module is configured to perform a convolution operation on a result of the concatenation of the input global feature representation and the local feature representation, and the attention module is configured to perform a connection calculation based on the convolution result, and output a predicted road traffic.
8. The depth decoupling time series prediction method of claim 1, wherein a loss function is employed in parameter optimization of the time series prediction modelThe method comprises the following steps:
wherein ,xT+t 、The true value and the predicted value of the road traffic flow are represented by a future τ -step with respect to the time T, τ represents the predicted step length, and f is f 1 and f2 For comparison of learned evaluation functions +.>Representing context hidden variables,>representing the short term representation generated by the adaptive parameter generation module, < >>Representing the disturbance term of the short term representation, ε being the temperature parameter of SoftMax, (T) being the uniform sampling of instants 1,2, …, T, +.> Outputting +.>Gaussian posterior distribution of->Representing mathematical expectations, V (lo) Representing a long-term representation set, V (sh) Representing a short term representation set, alpha being an adjustable hyper-parameter,indicating KL divergence, sg () is a gradient cut-off operation, satisfy sg (z) ≡z->Gamma is an adjustable super parameter and z represents the short term representation generated by the global feature encoder,/->The vectorized encoding result for z is shown. />
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110426703.0A CN113177633B (en) | 2021-04-20 | 2021-04-20 | Depth decoupling time sequence prediction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110426703.0A CN113177633B (en) | 2021-04-20 | 2021-04-20 | Depth decoupling time sequence prediction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113177633A CN113177633A (en) | 2021-07-27 |
CN113177633B true CN113177633B (en) | 2023-04-25 |
Family
ID=76924167
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110426703.0A Active CN113177633B (en) | 2021-04-20 | 2021-04-20 | Depth decoupling time sequence prediction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113177633B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113762356B (en) * | 2021-08-17 | 2023-06-16 | 中山大学 | Cluster load prediction method and system based on clustering and attention mechanism |
CN114021803A (en) * | 2021-10-29 | 2022-02-08 | 华能酒泉风电有限责任公司 | Wind power prediction method, system and equipment based on convolution transform architecture |
CN114239718B (en) * | 2021-12-15 | 2024-03-01 | 杭州电子科技大学 | High-precision long-term time sequence prediction method based on multi-element time sequence data analysis |
CN114297379A (en) * | 2021-12-16 | 2022-04-08 | 中电信数智科技有限公司 | Text binary classification method based on Transformer |
CN114580710B (en) * | 2022-01-28 | 2024-04-30 | 西安电子科技大学 | Environmental monitoring method based on transducer time sequence prediction |
CN114936723B (en) * | 2022-07-21 | 2023-04-14 | 中国电子科技集团公司第三十研究所 | Social network user attribute prediction method and system based on data enhancement |
CN115659852B (en) * | 2022-12-26 | 2023-03-21 | 浙江大学 | Layout generation method and device based on discrete potential representation |
CN116153089B (en) * | 2023-04-24 | 2023-06-27 | 云南大学 | Traffic flow prediction system and method based on space-time convolution and dynamic diagram |
CN116776228B (en) * | 2023-08-17 | 2023-10-20 | 合肥工业大学 | Power grid time sequence data decoupling self-supervision pre-training method and system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110718301A (en) * | 2019-09-26 | 2020-01-21 | 东北大学 | Alzheimer disease auxiliary diagnosis device and method based on dynamic brain function network |
CN111243269A (en) * | 2019-12-10 | 2020-06-05 | 福州市联创智云信息科技有限公司 | Traffic flow prediction method based on depth network integrating space-time characteristics |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11144683B2 (en) * | 2016-12-06 | 2021-10-12 | General Electric Company | Real-time adaptation of system high fidelity model in feature space |
-
2021
- 2021-04-20 CN CN202110426703.0A patent/CN113177633B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110718301A (en) * | 2019-09-26 | 2020-01-21 | 东北大学 | Alzheimer disease auxiliary diagnosis device and method based on dynamic brain function network |
CN111243269A (en) * | 2019-12-10 | 2020-06-05 | 福州市联创智云信息科技有限公司 | Traffic flow prediction method based on depth network integrating space-time characteristics |
Non-Patent Citations (1)
Title |
---|
张志超 ; 史治宇 ; 张杰 ; .基于改进MCPP的时变系统模态参数识别.低温建筑技术.2016,(12),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN113177633A (en) | 2021-07-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113177633B (en) | Depth decoupling time sequence prediction method | |
Liu et al. | Remaining useful life prediction using a novel feature-attention-based end-to-end approach | |
Yin et al. | Deep forest regression for short-term load forecasting of power systems | |
Ding et al. | Point and interval forecasting for wind speed based on linear component extraction | |
Ibrahim et al. | Short‐Time Wind Speed Forecast Using Artificial Learning‐Based Algorithms | |
CN111079989B (en) | DWT-PCA-LSTM-based water supply amount prediction device for water supply company | |
CN113128113B (en) | Lean information building load prediction method based on deep learning and transfer learning | |
CN113159389A (en) | Financial time sequence prediction method based on deep forest generation countermeasure network | |
CN112434891A (en) | Method for predicting solar irradiance time sequence based on WCNN-ALSTM | |
CN115204035A (en) | Generator set operation parameter prediction method and device based on multi-scale time sequence data fusion model and storage medium | |
CN116643949A (en) | Multi-model edge cloud load prediction method and device based on VaDE clustering | |
CN115840893A (en) | Multivariable time series prediction method and device | |
Lyu et al. | Dynamic feature selection for solar irradiance forecasting based on deep reinforcement learning | |
CN116014722A (en) | Sub-solar photovoltaic power generation prediction method and system based on seasonal decomposition and convolution network | |
Wei et al. | A three-stage multi-objective heterogeneous integrated model with decomposition-reconstruction mechanism and adaptive segmentation error correction method for ship motion multi-step prediction | |
CN117094451B (en) | Power consumption prediction method, device and terminal | |
Vogt et al. | Wind power forecasting based on deep neural networks and transfer learning | |
Ziyabari et al. | Multi-branch resnet-transformer for short-term spatio-temporal solar irradiance forecasting | |
CN116245259B (en) | Photovoltaic power generation prediction method and device based on depth feature selection and electronic equipment | |
CN117251705A (en) | Daily natural gas load prediction method | |
Qi et al. | Using stacked auto-encoder and bi-directional LSTM for batch process quality prediction | |
Xu et al. | A hybrid model for multi-step wind speed forecasting based on secondary decomposition, deep learning, and error correction algorithms | |
CN116402194A (en) | Multi-time scale load prediction method based on hybrid neural network | |
CN115860232A (en) | Steam load prediction method, system, electronic device and medium | |
CN115544890A (en) | Short-term power load prediction method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |