CN114169394A - Multi-variable time series prediction method for multi-scale adaptive graph learning - Google Patents

Multi-variable time series prediction method for multi-scale adaptive graph learning Download PDF

Info

Publication number
CN114169394A
CN114169394A CN202111298623.8A CN202111298623A CN114169394A CN 114169394 A CN114169394 A CN 114169394A CN 202111298623 A CN202111298623 A CN 202111298623A CN 114169394 A CN114169394 A CN 114169394A
Authority
CN
China
Prior art keywords
scale
training sample
initial
sequence
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111298623.8A
Other languages
Chinese (zh)
Inventor
陈岭
陈东辉
张友东
文波
杨成虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202111298623.8A priority Critical patent/CN114169394A/en
Publication of CN114169394A publication Critical patent/CN114169394A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a multivariate time sequence prediction method for multi-scale adaptive graph learning, which comprises the following steps: inputting the training sample set into a multi-scale pyramid network to obtain a multi-scale initial subsequence set of each training sample; obtaining an adjacency matrix of each scale based on an adaptive graph learning module; simultaneously inputting each time sequence and the adjacent matrix of the corresponding scale into a graph neural network to obtain an expression sequence of each time sequence, inputting the expression sequence into a time convolution network to obtain a final subsequence of each scale, inputting the multi-scale final subsequence set into a multi-scale fusion module, and inputting multi-scale fusion data of each training sample into a multi-layer convolution neural network to map to obtain a multivariable time sequence predicted value of each training sample; determining model parameters based on the loss function, thereby determining a multivariate time series prediction model for multi-scale adaptive graph learning; the method can accurately predict the predicted value of the selected scene.

Description

Multi-variable time series prediction method for multi-scale adaptive graph learning
Technical Field
The invention relates to the field of multivariate time sequence prediction, in particular to a multivariate time sequence prediction method for multi-scale adaptive graph learning.
Background
Multivariate time series are ubiquitous in a variety of real-world scenarios, such as urban traffic flow and domestic electricity usage in urban blocks. Multivariate time series prediction is a method for predicting future trends based on a set of historical observation time series, and has been widely studied in recent years. The method has wide application space, for example, a better driving route can be planned in advance according to the predicted traffic flow of each intersection; investment strategies are designed by forecasting the prices of multiple stocks in recent stock markets.
Compared with univariate time series prediction, multivariate time series prediction needs to consider both the time correlation within a single variable and the correlation between variables (i.e. the predicted value of a single variable is influenced by other variables). To address this problem, conventional methods, such as vector autoregressive, time-regularized matrix decomposition, vector autoregressive moving average, and gaussian processes, typically rely on strict stationarity assumptions and are unable to capture the non-linear dependencies between variables. Deep neural networks show superiority in modeling non-stationary and non-linear dependencies. In particular, two variants of Recurrent Neural Networks (RNNs), namely long-term memory networks (LSTM) and gated recurrent neural networks (GRU), and time-convolutional networks (TCNs), achieve good performance in time series prediction. To capture long-term and short-term temporal dependencies, existing work introduces several strategies, such as hopping connections, attention mechanisms, and storage-based networks. These efforts focus on modeling the time dependence, treating multivariate time series as vectors, and assuming that the predicted values of a single variable are affected by all other variables, which is not reasonable in practical applications. For example, the traffic flow of a street is largely influenced by its neighbors, while the influence from distant streets is relatively small. Therefore, it is crucial to model explicitly the correlation between two variables.
A graph is an abstract data type that represents the relationship between nodes. Graph Neural Networks (GNNs) can efficiently capture higher-order representations of nodes while explicitly exploiting pairwise correlations are considered to be a promising approach to processing graph data. Multivariate time series prediction can be considered from a graph modeling perspective. The variables in the multivariate time series can be regarded as nodes in the graph, and the correlations between the paired variables can be regarded as edges. Recently, some studies have modeled multivariate time series using the rich structural information of the graph (i.e., node features and weighted edges) in conjunction with GNNs. These works combine the GNNs and time convolution modules to learn the timing pattern, which has a good effect. However, in the above work, there are still two problems.
First, existing studies only consider temporal correlation on a single scale, and cannot effectively capture multi-scale timing patterns (e.g., daily, weekly, monthly, and other specific periodic patterns). Secondly, the existing research defines the correlation between variables by some kind of a priori knowledge or artificial experience, and the implicit correlation between variables is difficult to be effectively described.
Disclosure of Invention
The invention provides a multivariate time series prediction method for multi-scale adaptive graph learning, which can accurately predict the predicted value of a given scene at the next moment.
A multivariate time series prediction method for multi-scale adaptive graph learning comprises the following steps:
s1: dividing the obtained multivariate time sequence by a sliding time window to obtain a training sample set, inputting the training sample set into a multi-scale pyramid network to obtain a multi-scale initial subsequence set of each training sample, an initial node embedded vector and an initial scale embedded vector of each scale;
s2: inputting the initial node embedding vector and the initial scale embedding vector of each scale into a self-adaptive graph learning module to obtain an adjacent matrix of each scale;
s3: the multi-scale initial subsequence set is composed of a plurality of scale initial subsequences, each scale initial subsequence is divided into a plurality of time values in the time dimension, each time value and an adjacent matrix of a corresponding scale are simultaneously input into a graph neural network to obtain the representation of each time value, the representation of the plurality of time values constructs a representation sequence of each scale, each scale representation sequence is input into a time convolution network to obtain each scale final subsequence, and the plurality of scale final subsequences construct a multi-scale final subsequence set;
s4: inputting the multi-scale final subsequence set into a multi-scale fusion module to perform weighted combination to obtain multi-scale fusion data of each training sample, and inputting the multi-scale fusion data of each training sample into a multi-layer convolutional neural network to map to obtain a multivariate time sequence prediction value of each training sample;
s5: constructing a loss function of each training sample based on a multivariate sequence predicted value and a true value of each training sample, constructing a loss function of a training sample set by a plurality of training sample loss functions, iterating steps S2-S4 until an iteration time threshold is met, and obtaining model parameters through the iterated training sample set loss function so as to determine a multivariate time sequence prediction model for multi-scale adaptive graph learning;
s6: when the method is applied, the multivariate time series, namely the historical observation time series of the household electricity consumption of the urban traffic flow or the urban block, is input into the multivariate time series prediction model to predict the urban traffic flow or the household electricity consumption of the urban block at the next moment.
Firstly, preprocessing time series data, and constructing a training data set through sliding window division; secondly, decomposing the time sequence layer by utilizing a multi-scale pyramid network, and constructing subsequences with different scales on the time dimension; then, introducing a self-adaptive graph learning module, automatically deducing graph structures under all scales from data, and fully mining the correlation among abundant and implicit variables; then, modeling the correlation and the time correlation between variables on each scale by using a graph neural network and a time convolution network; and finally, a multi-scale fusion module is introduced to automatically consider the importance of each scale representation and capture cross-scale correlation.
Dividing the obtained multivariate time sequence by a sliding time window to obtain a training sample set, comprising:
removing abnormal values in the multivariate time sequence, carrying out normalization processing on the multivariate time sequence with the abnormal values removed, setting a time window, and dividing the multivariate time sequence under the time window into a plurality of training samples based on the set time step to obtain a training sample set.
Inputting the training sample set into a multi-scale pyramid network to obtain a multi-scale initial subsequence set of each training sample, wherein the method comprises the following steps:
the multi-scale pyramid network comprises a plurality of pyramid layers, each training sample in the training sample set is input into the multi-scale pyramid network, initial sub-sequences with different scales are obtained through each pyramid layer by gradually stacking the pyramid layers, and finally the multi-scale initial sub-sequence set of each training sample is output.
Each pyramid layer comprises a first convolution network and a second convolution network which are parallel, the initial sub-sequences obtained from the previous layer are respectively input into the first convolution network and the second convolution network, and then bitwise addition is carried out on the outputs of the first convolution network and the second convolution network to obtain the initial sub-sequences of the current layer.
Inputting the initial node embedding vector and the initial scale embedding vector of each scale into an adaptive graph learning module to obtain an adjacency matrix of each scale, wherein the adjacency matrix comprises the following steps:
the self-adaptive graph learning module comprises multi-scale network layers, each scale network layer comprises initial scale embedded vectors of corresponding scales, the initial node embedded vectors are simultaneously input into the multi-scale network layers and are fused with the initial scale embedded vectors of the corresponding scales to obtain initial scale feature vectors, pairwise similarity calculation is carried out on the initial scale feature vectors, and then sparse processing is carried out on calculation results to obtain an adjacency matrix of each scale.
The value of each time instant and the adjacency matrix of the corresponding scale are simultaneously input into the graph neural network to obtain a representation sequence of the value of each time instant, and the representation sequence comprises the following steps:
each graph neural network comprises an incoming degree information capturing module and an outgoing degree information capturing module, the value of each moment and the adjacent matrix of the corresponding scale are respectively input into the incoming degree information capturing module and the outgoing degree information capturing module, and the output result of the incoming degree information capturing module and the output result of the outgoing degree information capturing module are fused to obtain the representation sequence of each moment sequence;
sequence of representations of a sequence of times t of the kth scale
Figure BDA0003337369230000041
Comprises the following steps:
Figure BDA0003337369230000042
wherein the content of the first and second substances,
Figure BDA0003337369230000043
and
Figure BDA0003337369230000044
the training parameters of the in-degree information capture module and the out-degree information capture module of the kth scale are respectively AkFor the adjacency matrix corresponding to the k-th dimension,
Figure BDA0003337369230000045
for the k-th scale of the incoming information capture module,
Figure BDA0003337369230000046
and the k-th scale output information acquisition module.
Inputting the multi-scale final subsequence set into a multi-scale fusion module for weighted combination to obtain multi-scale fusion data of each training sample, wherein the multi-scale fusion data comprises the following steps:
stacking the multi-scale final subsequence set to obtain a multi-scale final subsequence matrix, performing pooling operation on the multi-scale final subsequence matrix to obtain a multi-scale final subsequence one-dimensional vector, inputting the final subsequence one-dimensional vector to a thinning module to obtain a scale importance score vector, and fusing the scale importance score vector and the multi-scale final subsequence set to obtain multi-scale fusion data of each sample.
The scale importance score vector α is:
α1=ReLU(W1hpool+b1)
α=Sigmoid(W2α1+b2)
wherein h ispoolFor multi-scale final subsequence one-dimensional vector, Sigmoid and ReLU are both activation functions, b1And b2As an offset vector, W1And W2Is a weight matrix.
Obtaining a model parameter theta through a training sample set loss function after iterationaComprises the following steps:
Figure BDA0003337369230000047
theta is the model parameter before iterative training, eta is the learning rate,
Figure BDA0003337369230000048
for training the sample set loss function, the model parameter θaThe method comprises a final node embedding vector, a final scale embedding vector, a graph neural network parameter, an adaptive graph learning network parameter and a time convolution network parameter.
Compared with the prior art, the invention has the beneficial effects that:
decomposing the multivariate time sequence into multi-scale subsequences by utilizing a multi-scale pyramid network, and introducing a multi-scale fusion module to automatically consider the importance of each scale expression and capture cross-scale correlation; the method comprises the steps of designing an adaptive graph learning module, automatically deducing graph structures under all scales under an end-to-end framework, fully excavating abundant and implicit correlation among variables, accurately predicting a predicted value of a given scene based on capturing cross-scale correlation and fully excavating abundant and implicit correlation among the variables, inputting urban traffic flow and a historical observation time sequence of household power consumption of an urban block to a multi-variable time sequence prediction model to predict urban traffic flow and household and industrial power consumption of the urban block at the next moment, guiding a driving route based on the urban traffic flow at the next moment to achieve the purpose of saving driving time, and guiding distribution of transmission power based on the household and industrial power consumption at the next moment to achieve the purpose of optimizing power distribution.
Drawings
FIG. 1 is a flowchart of a multivariate time series prediction method for multi-scale adaptive graph learning according to an embodiment;
FIG. 2 is a block diagram of a multivariate time series prediction method for multi-scale adaptive graph learning according to an embodiment;
FIG. 3 is an architecture diagram of an adaptive image learning module according to an embodiment;
fig. 4 is an architecture diagram of a multi-scale fusion module according to an embodiment.
Detailed Description
The invention discloses a multivariate time sequence prediction method for multi-scale adaptive graph learning, which comprises the following specific steps as shown in figures 1 and 2:
step 1: removing abnormal values in the multivariate time series, and carrying out normalization processing on the multivariate time series with the abnormal values removed so that each value after the processing is normalized to the range of [ -1, 1], wherein the conversion formula is as follows:
Figure BDA0003337369230000051
wherein XiIs the value in the original time series of the ith variable, Xi,minIs the minimum value, X, in the original time series of the ith variablei,maxIs the maximum value in the original time series of the ith variable, X'iThe value after normalization for the ith variable.
And manually setting the size T of a time window according to experience, and dividing the normalized data by using sliding step lengths with fixed lengths to obtain a training sample set.
Step 2: and (4) batching the training sample set according to a fixed batch size, wherein the total number of batches is N. And the set batch size M batches the training sample set to obtain a plurality of training samples, and the total number of batches is N. The specific calculation method is as follows:
Figure BDA0003337369230000052
wherein N isSamplesThe total number of training samples in the set of training samples.
Step 3, randomly initializing a learnable initial node embedded vector EnodesAnd an initial scale embedding vector E for each scalescale
Two types of parameters are randomly initialized: node-embedded vector E shared among all scalesnodesAnd an initial scale embedding vector E for each scalescale
And 4, sequentially selecting a batch of training samples with the index i from the training sample set, wherein i belongs to {0,1, …, N +. Steps 5-11 are repeated for each training sample in the batch.
Step 5, inputting the training sample set into the multi-scale pyramid network to obtain a multi-scale initial subsequence set of each training sample, that is, the initial subsequence set X of K scales is { X ═ X1,…,Xk,…,XKIn which X is1Initial subsequence representing original scale, XkDenotes the k (1)<K is less than or equal to K) initial subsequences with scales, and the specific steps are as follows:
the multi-scale pyramid network comprises a plurality of pyramid layers, each training sample in the training sample set is input into the multi-scale pyramid network, and each input training sample is obtained through each pyramid layer by gradually stacking the pyramid layersObtaining initial subsequences of different scales, wherein the subsequence of the k scale is XkTo show, each pyramid layer contains two parallel convolutional networks:
the first convolutional network is used to capture the local pattern in the time dimension, and different pyramid layers use different convolutional kernel sizes. The initial convolution kernel has a larger receptive field and is slowly reduced in each pyramid layer, which can effectively control the size of the receptive field in the high-level pyramid layer and maintain the sequence characteristics of the large-scale subsequence. The convolution step size is set to 2 to increase the time scale so that each decomposition results in a subsequence of half the length of the input sequence. The calculation formula of the k-1 layer can be expressed as:
Figure BDA0003337369230000061
wherein
Figure BDA0003337369230000062
Representing a convolution operator, ReLU representing an activation function,
Figure BDA0003337369230000063
and
Figure BDA0003337369230000064
respectively representing the convolution kernel and the offset vector of the first convolution network in the (k-1) th pyramid layer,
Figure BDA0003337369230000065
representing the output of the first convolutional network.
The second convolutional network is to reduce the sensitivity of the model to the hyper-parameter settings (i.e., the convolutional kernel size and the step size) when only a single convolutional neural network is used. Introducing a convolution with a convolution kernel size of 1 × 1 and a pooling operation of 1 × 2, and constructing a structure parallel to the first convolutional neural network, formalized as:
Figure BDA0003337369230000066
wherein
Figure BDA0003337369230000067
And
Figure BDA0003337369230000068
the convolution kernel and the offset vector of the second convolution network in the k-1 th pyramid layer are represented, respectively, and Pooling represents the Pooling operation.
The outputs of the two convolutional neural networks are then added bitwise:
Figure BDA0003337369230000069
wherein XkRepresenting a subsequence of the k-th scale.
Finally, generating K scales of initial subsequence sets by the multi-scale pyramid network
Figure BDA0003337369230000071
Figure BDA0003337369230000072
Wherein X1Initial subsequence representing original scale, XkDenotes the k (1)<K is less than or equal to K) initial subsequences of scales.
Step 6, embedding the initial node into the vector EnodesAnd an initial scale embedding vector E for each scalescaleSending the data into an adaptive graph learning module to obtain K adjacent matrixes { A) with specific scales1,…,Ak,…,AKIn which A iskThe K (1 ≦ K) dimension specific adjacency matrix is represented, as shown in fig. 3, and the specific steps are:
the self-adaptive graph learning module comprises a multi-scale network layer, the self-adaptive graph learning module comprises the multi-scale network layer, each scale network layer comprises an initial scale embedded vector of a corresponding scale, the initial node embedded vectors are simultaneously input into the multi-scale network layer and fused with the initial scale embedded vectors of the corresponding scales to obtain an initial scale feature vector, and after pairwise similarity calculation is carried out on the initial scale feature vectors, sparse processing is carried out on the calculation results to obtain an adjacent matrix of each scale.
For the kth scale-specific network layer, the kth initial scale embeds a vector
Figure BDA0003337369230000073
Embedding vector E with initial nodenodesPerforming bit-wise multiplication:
Figure BDA0003337369230000074
wherein
Figure BDA0003337369230000075
Representing the initial scale-embedded vector in the k-th layer. Then, similar to the method of calculating node affinity using the similarity function, the similarity of paired nodes is calculated as follows:
Figure BDA0003337369230000076
Figure BDA0003337369230000077
Figure BDA0003337369230000078
wherein theta iskAnd
Figure BDA0003337369230000079
are parameters that can be learned in layer k, and tanh and ReLU are activation functions.
Figure BDA00033373692300000710
The value of (a) is normalized to (0-1) as a weighted edge. To reduce the convolution of the graphCalculating cost, reducing the influence of noise and making the model more robust, the following strategies are introduced to ensure that
Figure BDA00033373692300000711
Thinning:
Figure BDA00033373692300000712
wherein A iskAnd normalizing the final adjacency matrix of the kth scale by using a Softmax function, wherein a sparsification function Sparse is defined as:
Figure BDA00033373692300000713
where τ is the threshold of the TopK function, representing the maximum number of neighbors of a node. Finally, a dimension-specific adjacency matrix { A } may be obtained1,…,Ak,…,AK}。
Step 7, simultaneously inputting each time sequence and the adjacent matrix of the corresponding scale into the graph neural network to obtain the representation sequence of each time sequence, sharing parameters of the graph neural network at different times under the same scale, and obtaining the representation sequence set of all scale subsequences
Figure BDA00033373692300000714
Wherein
Figure BDA00033373692300000715
The method is used for representing the representation sequence of the kth (K is more than or equal to 1 and less than or equal to K) scale subsequence processed by the graph neural network, and comprises the following specific steps of:
initial set of subsequences
Figure BDA00033373692300000716
And adjacency matrix { A1,…,Ak,…,AKAnd (4) respectively sending the value of each moment in the subsequence of each scale and the corresponding adjacent matrix into the graph neural network, wherein the value of each moment in the subsequence of each scale and the corresponding adjacent matrix are sent into the graph neural network under one scaleThe neural networks of the graph at different times share parameters. For the k-th scale, X is first paired in the time dimensionkTime division is carried out to obtain
Figure BDA0003337369230000081
Introduce A intokAnd AkTranspose of (i.e., (A))k)T) And using two GNNs modules (in-degree information capture module)
Figure BDA0003337369230000082
And output information capturing module
Figure BDA0003337369230000083
) And simultaneously capturing the in-degree information and the out-degree information. The results of the two GNNs modules are then added:
Figure BDA0003337369230000084
wherein the content of the first and second substances,
Figure BDA0003337369230000085
and
Figure BDA0003337369230000086
the training parameters of the in-degree information capture module and the out-degree information capture module of the kth scale are respectively AkFor the adjacency matrix corresponding to the k-th dimension,
Figure BDA0003337369230000087
for the k-th scale of the incoming information capture module,
Figure BDA0003337369230000088
and the k-th scale output information acquisition module. Obtain the output of all time
Figure BDA0003337369230000089
Finally, a representation sequence set of initial subsequences of all scales is obtained
Figure BDA00033373692300000810
Wherein
Figure BDA00033373692300000811
And representing the representation sequence of the kth (1 ≦ K ≦ K) scale subsequence processed by the graph neural network.
Step 8, inputting the expression sequence set into a time convolution network to obtain a final subsequence of each scale, and constructing a multi-scale final subsequence set { h } by using the multi-scale final subsequences1,…,hk,…,hKIn which h iskThe method represents the kth (K is more than or equal to 1 and less than or equal to K) scale final subsequence processed by the time convolution network, and comprises the following specific steps:
the representation sequence of each scale initial subsequence is respectively fed into the time convolution network. For the k scale, will
Figure BDA00033373692300000812
Time convolution network TCN fed into k scalekIn order to obtain a final representation h of the scaled subsequencek
Figure BDA00033373692300000813
Wherein
Figure BDA00033373692300000814
Representing trainable parameters in the kth time convolution layer. And for each scale, respectively sending the subsequence corresponding to each variable into a time convolution network sharing parameters. Finally, a final subsequence set { h) of all scales is obtained1,…,hk,…,hKIn which h iskAnd representing the kth (K is more than or equal to 1 and less than or equal to K) final scale subsequence processed by the time convolution network.
Step 9, inputting the multi-scale final subsequence set into a multi-scale fusion module for weighted combination to obtain multi-scale fusion data h of each training samplemAs shown in FIG. 4, the specific steps are:
The multi-scale fusion module is used for collecting the multi-scale final subsequence set h1,…,hk,…,hKThe weighted combination is performed by first weighting { h }1,…,hk,…,hKStacking to obtain a multi-scale final subsequence matrix H:
H=Stac(h1,…,hk,…,hK)
where Stack represents a stacking operation. Then, an averaging pooling operation is performed on the scale dimension:
Figure BDA00033373692300000815
wherein h ispoolA representation after the pooling operation is shown. Then, h is mixedpoolFlattening the vector into a multi-scale final subsequence one-dimensional vector, and inputting the vector into a thinning module consisting of two fully-connected layers so as to learn cross-scale information:
α1=ReLU(W1hpool+b1)
α=Sigmoid(W2α1+b2)
wherein, W1And W2Is a weight matrix. b1And b2Is a bias vector. The second layer uses a Sigmoid activation function. Alpha is defined as a scale importance score vector that measures the degree of importance of all scale representations. Finally, weighting and combining all the scale subsequences by utilizing the aggregation layer to obtain a final representation hm
Figure BDA0003337369230000091
Step 10, fusing multi-scale data h of each training samplemInputting into the multi-layer convolutional neural network for mapping to obtain a multivariate time sequence predicted value of each training sample
Figure BDA0003337369230000092
Comprises the following stepsThe method comprises the following steps:
using a convolutional neural network to convert hmA convolutional neural network which is transformed into the desired output dimension and is subjected to another 1 × 1 convolution kernel to obtain the predicted value
Figure BDA0003337369230000093
Step 11, calculating the predicted loss
Figure BDA0003337369230000094
Namely the true value x corresponding to the training sample and the multivariable sequence predicted value of each training sample
Figure BDA0003337369230000095
The method comprises the following steps:
the present invention uses the squared error as the predicted loss
Figure BDA0003337369230000096
Namely the true value x corresponding to the training sample and the multivariable time sequence predicted value of each training sample
Figure BDA0003337369230000097
The error between:
Figure BDA0003337369230000098
step 12, loss function of all samples
Figure BDA0003337369230000099
Figure BDA00033373692300000910
Wherein
Figure BDA00033373692300000911
For the loss of the mth sample in a batch, M is the number of samples in each batchAmount of the compound (A).
Obtaining a model parameter theta through a training sample set loss function after iterationaComprises the following steps:
Figure BDA00033373692300000912
theta is the model parameter before iterative training, eta is the learning rate,
Figure BDA00033373692300000913
for training the sample set loss function, the model parameter θaThe method comprises a final node embedding vector, a final scale embedding vector, a graph neural network parameter, an adaptive graph learning network parameter and a time convolution network parameter.
And 13, repeating the steps 4-12 until all batches of the training data set participate in model training.
And 14, repeating the steps 4-13 until a specified iteration number is reached.
And step 15, inputting the multivariate time series, namely the historical observation time series of the household electricity consumption of the urban traffic flow or the urban block into the multivariate time series prediction model to predict the household electricity consumption of the urban traffic flow or the urban block at the next moment when in use.

Claims (9)

1. A multivariate time series prediction method for multi-scale adaptive graph learning is characterized by comprising the following steps:
s1: dividing the obtained multivariate time sequence by a sliding time window to obtain a training sample set, inputting the training sample set into a multi-scale pyramid network to obtain a multi-scale initial subsequence set of each training sample, an initial node embedded vector and an initial scale embedded vector of each scale;
s2: inputting the initial node embedding vector and the initial scale embedding vector of each scale into a self-adaptive graph learning module to obtain an adjacent matrix of each scale;
s3: the multi-scale initial subsequence set is composed of a plurality of scale initial subsequences, each scale initial subsequence is divided into a plurality of time values in the time dimension, each time value and an adjacent matrix of a corresponding scale are simultaneously input into a graph neural network to obtain the representation of each time value, the representation of the plurality of time values constructs a representation sequence of each scale, each scale representation sequence is input into a time convolution network to obtain each scale final subsequence, and the plurality of scale final subsequences construct a multi-scale final subsequence set;
s4: inputting the multi-scale final subsequence set into a multi-scale fusion module to perform weighted combination to obtain multi-scale fusion data of each training sample, and inputting the multi-scale fusion data of each training sample into a multi-layer convolutional neural network to map to obtain a multivariate time sequence prediction value of each training sample;
s5: constructing a loss function of each training sample based on a multivariate sequence predicted value and a true value of each training sample, constructing a loss function of a training sample set by a plurality of training sample loss functions, iterating steps S2-S4 until an iteration time threshold is met, and obtaining model parameters through the iterated training sample set loss function so as to determine a multivariate time sequence prediction model for multi-scale adaptive graph learning;
s6: when the method is applied, a multivariable time sequence, namely a historical observation time sequence of the household electricity consumption of the urban traffic flow or the urban block, is input into a multivariable time sequence prediction model to predict the household electricity consumption and the industrial electricity consumption of the urban traffic flow or the urban block at the next moment, a driving route is guided based on the urban traffic flow at the next moment, and the distributed and conveyed electricity quantity is guided based on the household electricity consumption and the industrial electricity consumption at the next moment.
2. The method of predicting multivariate time series for multi-scale adaptive graph learning as claimed in claim 1, wherein the step of obtaining training sample set by dividing the obtained multivariate time series by sliding time window comprises:
removing abnormal values in the multivariate time sequence, carrying out normalization processing on the multivariate time sequence with the abnormal values removed, setting a time window and a time step, and dividing the multivariate time sequence under the time window into a plurality of training samples based on the set time step to obtain a training sample set.
3. The multivariate time series prediction method for multi-scale adaptive graph learning according to claim 1, wherein inputting a training sample set into a multi-scale pyramid network to obtain a multi-scale initial subsequence set of each training sample comprises:
the multi-scale pyramid network comprises a plurality of pyramid layers, each training sample in the training sample set is input into the multi-scale pyramid network, initial sub-sequences with different scales are obtained through each pyramid layer by gradually stacking the pyramid layers, and finally the multi-scale initial sub-sequence set of each training sample is output.
4. The method as claimed in claim 3, wherein each pyramid layer includes two parallel first and second convolutional networks, the initial sub-sequence obtained from the previous layer is input to the first and second convolutional networks, and then the bit-wise addition is performed on the outputs of the first and second convolutional networks to obtain the initial sub-sequence of the current layer.
5. The method of claim 1, wherein inputting the initial node embedding vector and the initial scale embedding vector of each scale into an adaptive graph learning module to obtain the adjacency matrix of each scale comprises:
the self-adaptive graph learning module comprises multi-scale network layers, each scale network layer comprises initial scale embedded vectors of corresponding scales, the initial node embedded vectors are simultaneously input into the multi-scale network layers and are fused with the initial scale embedded vectors of the corresponding scales to obtain initial scale feature vectors, pairwise similarity calculation is carried out on the initial scale feature vectors, and then sparse processing is carried out on calculation results to obtain an adjacency matrix of each scale.
6. The method of predicting multivariate time series for multi-scale adaptive graph learning as claimed in claim 1, wherein the value of each time instant and the adjacency matrix of the corresponding scale are simultaneously input into the graph neural network to obtain the representation sequence of the value of each time instant, comprising:
each graph neural network comprises an outgoing degree information capture module and an outgoing degree information capture module, the value of each moment and the adjacent matrix of the corresponding scale are respectively input into the incoming degree information capture module and the outgoing degree information capture module, and the output result of the incoming degree information capture module and the output result of the outgoing degree information capture module are fused to obtain the representation sequence of each moment sequence;
sequence of representations of a sequence of times t of the kth scale
Figure FDA0003337369220000021
Comprises the following steps:
Figure FDA0003337369220000022
wherein the content of the first and second substances,
Figure FDA0003337369220000023
and
Figure FDA0003337369220000024
the training parameters of the in-degree information capture module and the out-degree information capture module of the kth scale are respectively AkFor the adjacency matrix corresponding to the k-th dimension,
Figure FDA0003337369220000031
for the k-th scale of the incoming information capture module,
Figure FDA0003337369220000032
and the k-th scale output information acquisition module.
7. The multivariate time series prediction method for multi-scale adaptive graph learning according to claim 1, wherein the step of inputting the multi-scale final subsequence set to a multi-scale fusion module for weighted combination to obtain the multi-scale fusion data of each training sample comprises the steps of:
stacking the multi-scale final subsequence sets to obtain a multi-scale final subsequence matrix, performing pooling operation on the multi-scale final subsequence matrix to obtain a multi-scale final subsequence one-dimensional vector, inputting the final subsequence one-dimensional vector to a thinning module to obtain a scale importance score vector, and fusing the scale importance score vector and the multi-scale final subsequence sets to obtain multi-scale fusion data of each sample.
8. The multivariate time series prediction method for multi-scale adaptive graph learning according to claim 7, wherein the scale importance score vector α is:
α1=ReLU(W1hpool+b1)
α=Sigmoid(W2α1+b2)
wherein h ispoolFor multi-scale final subsequence one-dimensional vector, Sigmoid and ReLU are both activation functions, b1And b2As an offset vector, W1And W2Is a weight matrix.
9. The multivariate time series prediction method for multi-scale adaptive graph learning according to claim 1, wherein a model parameter θ is obtained by a training sample set loss function after iterationaComprises the following steps:
Figure FDA0003337369220000033
theta is the initial model parameter before the iterative training, eta is the learning rate,
Figure FDA0003337369220000034
in order to train the sample set loss function,model parameter θaThe method comprises a final node embedding vector, a final scale embedding vector, a graph neural network parameter, an adaptive graph learning network parameter and a time convolution network parameter.
CN202111298623.8A 2021-11-04 2021-11-04 Multi-variable time series prediction method for multi-scale adaptive graph learning Pending CN114169394A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111298623.8A CN114169394A (en) 2021-11-04 2021-11-04 Multi-variable time series prediction method for multi-scale adaptive graph learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111298623.8A CN114169394A (en) 2021-11-04 2021-11-04 Multi-variable time series prediction method for multi-scale adaptive graph learning

Publications (1)

Publication Number Publication Date
CN114169394A true CN114169394A (en) 2022-03-11

Family

ID=80477983

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111298623.8A Pending CN114169394A (en) 2021-11-04 2021-11-04 Multi-variable time series prediction method for multi-scale adaptive graph learning

Country Status (1)

Country Link
CN (1) CN114169394A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114692788A (en) * 2022-06-01 2022-07-01 天津大学 Early warning method and device for extreme weather of Ernino based on incremental learning
CN114722950A (en) * 2022-04-14 2022-07-08 武汉大学 Multi-modal multivariate time sequence automatic classification method and device
CN116128130A (en) * 2023-01-31 2023-05-16 广东电网有限责任公司 Short-term wind energy data prediction method and device based on graphic neural network
CN116845889A (en) * 2023-09-01 2023-10-03 东海实验室 Hierarchical hypergraph neural network-based power load prediction method
CN116993185A (en) * 2023-09-28 2023-11-03 腾讯科技(深圳)有限公司 Time sequence prediction method, device, equipment and storage medium
WO2023221701A1 (en) * 2022-05-16 2023-11-23 北京火山引擎科技有限公司 Multivariable time sequence processing method and apparatus, device and medium
CN117806972A (en) * 2024-01-03 2024-04-02 西南民族大学 Multi-scale time sequence analysis-based modified code quality assessment method
CN118011220A (en) * 2024-04-08 2024-05-10 太湖能谷(杭州)科技有限公司 Battery pack state of charge estimation method, system and medium
CN117806972B (en) * 2024-01-03 2024-07-02 西南民族大学 Multi-scale time sequence analysis-based modified code quality assessment method

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114722950B (en) * 2022-04-14 2023-11-07 武汉大学 Multi-mode multi-variable time sequence automatic classification method and device
CN114722950A (en) * 2022-04-14 2022-07-08 武汉大学 Multi-modal multivariate time sequence automatic classification method and device
WO2023221701A1 (en) * 2022-05-16 2023-11-23 北京火山引擎科技有限公司 Multivariable time sequence processing method and apparatus, device and medium
CN114692788B (en) * 2022-06-01 2022-08-19 天津大学 Early warning method and device for extreme weather of Ernino based on incremental learning
CN114692788A (en) * 2022-06-01 2022-07-01 天津大学 Early warning method and device for extreme weather of Ernino based on incremental learning
CN116128130A (en) * 2023-01-31 2023-05-16 广东电网有限责任公司 Short-term wind energy data prediction method and device based on graphic neural network
CN116128130B (en) * 2023-01-31 2023-10-24 广东电网有限责任公司 Short-term wind energy data prediction method and device based on graphic neural network
CN116845889B (en) * 2023-09-01 2023-12-22 东海实验室 Hierarchical hypergraph neural network-based power load prediction method
CN116845889A (en) * 2023-09-01 2023-10-03 东海实验室 Hierarchical hypergraph neural network-based power load prediction method
CN116993185A (en) * 2023-09-28 2023-11-03 腾讯科技(深圳)有限公司 Time sequence prediction method, device, equipment and storage medium
CN116993185B (en) * 2023-09-28 2024-07-02 腾讯科技(深圳)有限公司 Time sequence prediction method, device, equipment and storage medium
CN117806972A (en) * 2024-01-03 2024-04-02 西南民族大学 Multi-scale time sequence analysis-based modified code quality assessment method
CN117806972B (en) * 2024-01-03 2024-07-02 西南民族大学 Multi-scale time sequence analysis-based modified code quality assessment method
CN118011220A (en) * 2024-04-08 2024-05-10 太湖能谷(杭州)科技有限公司 Battery pack state of charge estimation method, system and medium

Similar Documents

Publication Publication Date Title
CN114169394A (en) Multi-variable time series prediction method for multi-scale adaptive graph learning
CN111079931A (en) State space probabilistic multi-time-series prediction method based on graph neural network
Khan et al. Fuzzy cognitive maps with genetic algorithm for goal-oriented decision support
Shrivastava et al. GLAD: Learning sparse graph recovery
CN112183742B (en) Neural network hybrid quantization method based on progressive quantization and Hessian information
CN112819136A (en) Time sequence prediction method and system based on CNN-LSTM neural network model and ARIMA model
CN113852432B (en) Spectrum Prediction Sensing Method Based on RCS-GRU Model
CN115018193A (en) Time series wind energy data prediction method based on LSTM-GA model
CN112766603A (en) Traffic flow prediction method, system, computer device and storage medium
CN114817773A (en) Time sequence prediction system and method based on multi-stage decomposition and fusion
Wang et al. Optimizing deep belief echo state network with a sensitivity analysis input scaling auto-encoder algorithm
Liu et al. Adaptive multioutput gradient rbf tracker for nonlinear and nonstationary regression
CN117556949A (en) Traffic prediction method based on continuous evolution graph nerve controlled differential equation
CN116976405A (en) Variable component shadow quantum neural network based on immune optimization algorithm
CN109993282B (en) Typhoon wave and range prediction method
CN111310974A (en) Short-term water demand prediction method based on GA-ELM
CN115860232A (en) Steam load prediction method, system, electronic device and medium
CN115620046A (en) Multi-target neural architecture searching method based on semi-supervised performance predictor
CN113111308B (en) Symbolic regression method and system based on data-driven genetic programming algorithm
CN114169493B (en) Multivariable time sequence prediction method based on scale-aware neural architecture search
CN113077003A (en) Graph attention network inductive learning method based on graph sampling
Werbos How we cut prediction error in half by using a different training method
Chen et al. Multi-objective spiking neural network for optimal wind power prediction interval
Han et al. Online aware synapse weighted autoencoder for recovering random missing data in wastewater treatment process
Shi et al. A deepar-based neural network for time series forecasting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination