CN114169394A

CN114169394A - Multi-variable time series prediction method for multi-scale adaptive graph learning

Info

Publication number: CN114169394A
Application number: CN202111298623.8A
Authority: CN
Inventors: 陈岭; 陈东辉; 张友东; 文波; 杨成虎
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-11-04
Filing date: 2021-11-04
Publication date: 2022-03-11

Abstract

The invention discloses a multivariate time sequence prediction method for multi-scale adaptive graph learning, which comprises the following steps: inputting the training sample set into a multi-scale pyramid network to obtain a multi-scale initial subsequence set of each training sample; obtaining an adjacency matrix of each scale based on an adaptive graph learning module; simultaneously inputting each time sequence and the adjacent matrix of the corresponding scale into a graph neural network to obtain an expression sequence of each time sequence, inputting the expression sequence into a time convolution network to obtain a final subsequence of each scale, inputting the multi-scale final subsequence set into a multi-scale fusion module, and inputting multi-scale fusion data of each training sample into a multi-layer convolution neural network to map to obtain a multivariable time sequence predicted value of each training sample; determining model parameters based on the loss function, thereby determining a multivariate time series prediction model for multi-scale adaptive graph learning; the method can accurately predict the predicted value of the selected scene.

Description

Multi-variable time series prediction method for multi-scale adaptive graph learning

Technical Field

The invention relates to the field of multivariate time sequence prediction, in particular to a multivariate time sequence prediction method for multi-scale adaptive graph learning.

Background

Multivariate time series are ubiquitous in a variety of real-world scenarios, such as urban traffic flow and domestic electricity usage in urban blocks. Multivariate time series prediction is a method for predicting future trends based on a set of historical observation time series, and has been widely studied in recent years. The method has wide application space, for example, a better driving route can be planned in advance according to the predicted traffic flow of each intersection; investment strategies are designed by forecasting the prices of multiple stocks in recent stock markets.

Compared with univariate time series prediction, multivariate time series prediction needs to consider both the time correlation within a single variable and the correlation between variables (i.e. the predicted value of a single variable is influenced by other variables). To address this problem, conventional methods, such as vector autoregressive, time-regularized matrix decomposition, vector autoregressive moving average, and gaussian processes, typically rely on strict stationarity assumptions and are unable to capture the non-linear dependencies between variables. Deep neural networks show superiority in modeling non-stationary and non-linear dependencies. In particular, two variants of Recurrent Neural Networks (RNNs), namely long-term memory networks (LSTM) and gated recurrent neural networks (GRU), and time-convolutional networks (TCNs), achieve good performance in time series prediction. To capture long-term and short-term temporal dependencies, existing work introduces several strategies, such as hopping connections, attention mechanisms, and storage-based networks. These efforts focus on modeling the time dependence, treating multivariate time series as vectors, and assuming that the predicted values of a single variable are affected by all other variables, which is not reasonable in practical applications. For example, the traffic flow of a street is largely influenced by its neighbors, while the influence from distant streets is relatively small. Therefore, it is crucial to model explicitly the correlation between two variables.

A graph is an abstract data type that represents the relationship between nodes. Graph Neural Networks (GNNs) can efficiently capture higher-order representations of nodes while explicitly exploiting pairwise correlations are considered to be a promising approach to processing graph data. Multivariate time series prediction can be considered from a graph modeling perspective. The variables in the multivariate time series can be regarded as nodes in the graph, and the correlations between the paired variables can be regarded as edges. Recently, some studies have modeled multivariate time series using the rich structural information of the graph (i.e., node features and weighted edges) in conjunction with GNNs. These works combine the GNNs and time convolution modules to learn the timing pattern, which has a good effect. However, in the above work, there are still two problems.

First, existing studies only consider temporal correlation on a single scale, and cannot effectively capture multi-scale timing patterns (e.g., daily, weekly, monthly, and other specific periodic patterns). Secondly, the existing research defines the correlation between variables by some kind of a priori knowledge or artificial experience, and the implicit correlation between variables is difficult to be effectively described.

Disclosure of Invention

The invention provides a multivariate time series prediction method for multi-scale adaptive graph learning, which can accurately predict the predicted value of a given scene at the next moment.

A multivariate time series prediction method for multi-scale adaptive graph learning comprises the following steps:

s1: dividing the obtained multivariate time sequence by a sliding time window to obtain a training sample set, inputting the training sample set into a multi-scale pyramid network to obtain a multi-scale initial subsequence set of each training sample, an initial node embedded vector and an initial scale embedded vector of each scale;

s2: inputting the initial node embedding vector and the initial scale embedding vector of each scale into a self-adaptive graph learning module to obtain an adjacent matrix of each scale;

s3: the multi-scale initial subsequence set is composed of a plurality of scale initial subsequences, each scale initial subsequence is divided into a plurality of time values in the time dimension, each time value and an adjacent matrix of a corresponding scale are simultaneously input into a graph neural network to obtain the representation of each time value, the representation of the plurality of time values constructs a representation sequence of each scale, each scale representation sequence is input into a time convolution network to obtain each scale final subsequence, and the plurality of scale final subsequences construct a multi-scale final subsequence set;

s4: inputting the multi-scale final subsequence set into a multi-scale fusion module to perform weighted combination to obtain multi-scale fusion data of each training sample, and inputting the multi-scale fusion data of each training sample into a multi-layer convolutional neural network to map to obtain a multivariate time sequence prediction value of each training sample;

s5: constructing a loss function of each training sample based on a multivariate sequence predicted value and a true value of each training sample, constructing a loss function of a training sample set by a plurality of training sample loss functions, iterating steps S2-S4 until an iteration time threshold is met, and obtaining model parameters through the iterated training sample set loss function so as to determine a multivariate time sequence prediction model for multi-scale adaptive graph learning;

s6: when the method is applied, the multivariate time series, namely the historical observation time series of the household electricity consumption of the urban traffic flow or the urban block, is input into the multivariate time series prediction model to predict the urban traffic flow or the household electricity consumption of the urban block at the next moment.

Firstly, preprocessing time series data, and constructing a training data set through sliding window division; secondly, decomposing the time sequence layer by utilizing a multi-scale pyramid network, and constructing subsequences with different scales on the time dimension; then, introducing a self-adaptive graph learning module, automatically deducing graph structures under all scales from data, and fully mining the correlation among abundant and implicit variables; then, modeling the correlation and the time correlation between variables on each scale by using a graph neural network and a time convolution network; and finally, a multi-scale fusion module is introduced to automatically consider the importance of each scale representation and capture cross-scale correlation.

Dividing the obtained multivariate time sequence by a sliding time window to obtain a training sample set, comprising:

removing abnormal values in the multivariate time sequence, carrying out normalization processing on the multivariate time sequence with the abnormal values removed, setting a time window, and dividing the multivariate time sequence under the time window into a plurality of training samples based on the set time step to obtain a training sample set.

Inputting the training sample set into a multi-scale pyramid network to obtain a multi-scale initial subsequence set of each training sample, wherein the method comprises the following steps:

the multi-scale pyramid network comprises a plurality of pyramid layers, each training sample in the training sample set is input into the multi-scale pyramid network, initial sub-sequences with different scales are obtained through each pyramid layer by gradually stacking the pyramid layers, and finally the multi-scale initial sub-sequence set of each training sample is output.

Each pyramid layer comprises a first convolution network and a second convolution network which are parallel, the initial sub-sequences obtained from the previous layer are respectively input into the first convolution network and the second convolution network, and then bitwise addition is carried out on the outputs of the first convolution network and the second convolution network to obtain the initial sub-sequences of the current layer.

Inputting the initial node embedding vector and the initial scale embedding vector of each scale into an adaptive graph learning module to obtain an adjacency matrix of each scale, wherein the adjacency matrix comprises the following steps:

the self-adaptive graph learning module comprises multi-scale network layers, each scale network layer comprises initial scale embedded vectors of corresponding scales, the initial node embedded vectors are simultaneously input into the multi-scale network layers and are fused with the initial scale embedded vectors of the corresponding scales to obtain initial scale feature vectors, pairwise similarity calculation is carried out on the initial scale feature vectors, and then sparse processing is carried out on calculation results to obtain an adjacency matrix of each scale.

The value of each time instant and the adjacency matrix of the corresponding scale are simultaneously input into the graph neural network to obtain a representation sequence of the value of each time instant, and the representation sequence comprises the following steps:

each graph neural network comprises an incoming degree information capturing module and an outgoing degree information capturing module, the value of each moment and the adjacent matrix of the corresponding scale are respectively input into the incoming degree information capturing module and the outgoing degree information capturing module, and the output result of the incoming degree information capturing module and the output result of the outgoing degree information capturing module are fused to obtain the representation sequence of each moment sequence;

sequence of representations of a sequence of times t of the kth scale

Comprises the following steps:

wherein the content of the first and second substances,

and

the training parameters of the in-degree information capture module and the out-degree information capture module of the kth scale are respectively A^kFor the adjacency matrix corresponding to the k-th dimension,

for the k-th scale of the incoming information capture module,

and the k-th scale output information acquisition module.

Inputting the multi-scale final subsequence set into a multi-scale fusion module for weighted combination to obtain multi-scale fusion data of each training sample, wherein the multi-scale fusion data comprises the following steps:

stacking the multi-scale final subsequence set to obtain a multi-scale final subsequence matrix, performing pooling operation on the multi-scale final subsequence matrix to obtain a multi-scale final subsequence one-dimensional vector, inputting the final subsequence one-dimensional vector to a thinning module to obtain a scale importance score vector, and fusing the scale importance score vector and the multi-scale final subsequence set to obtain multi-scale fusion data of each sample.

The scale importance score vector α is:

α₁＝ReLU(W₁h_pool+b₁)

α＝Sigmoid(W₂α₁+b₂)

wherein h is_poolFor multi-scale final subsequence one-dimensional vector, Sigmoid and ReLU are both activation functions, b₁And b₂As an offset vector, W₁And W₂Is a weight matrix.

Obtaining a model parameter theta through a training sample set loss function after iteration_aComprises the following steps:

theta is the model parameter before iterative training, eta is the learning rate,

for training the sample set loss function, the model parameter θ_aThe method comprises a final node embedding vector, a final scale embedding vector, a graph neural network parameter, an adaptive graph learning network parameter and a time convolution network parameter.

Compared with the prior art, the invention has the beneficial effects that:

decomposing the multivariate time sequence into multi-scale subsequences by utilizing a multi-scale pyramid network, and introducing a multi-scale fusion module to automatically consider the importance of each scale expression and capture cross-scale correlation; the method comprises the steps of designing an adaptive graph learning module, automatically deducing graph structures under all scales under an end-to-end framework, fully excavating abundant and implicit correlation among variables, accurately predicting a predicted value of a given scene based on capturing cross-scale correlation and fully excavating abundant and implicit correlation among the variables, inputting urban traffic flow and a historical observation time sequence of household power consumption of an urban block to a multi-variable time sequence prediction model to predict urban traffic flow and household and industrial power consumption of the urban block at the next moment, guiding a driving route based on the urban traffic flow at the next moment to achieve the purpose of saving driving time, and guiding distribution of transmission power based on the household and industrial power consumption at the next moment to achieve the purpose of optimizing power distribution.

Drawings

FIG. 1 is a flowchart of a multivariate time series prediction method for multi-scale adaptive graph learning according to an embodiment;

FIG. 2 is a block diagram of a multivariate time series prediction method for multi-scale adaptive graph learning according to an embodiment;

FIG. 3 is an architecture diagram of an adaptive image learning module according to an embodiment;

fig. 4 is an architecture diagram of a multi-scale fusion module according to an embodiment.

Detailed Description

The invention discloses a multivariate time sequence prediction method for multi-scale adaptive graph learning, which comprises the following specific steps as shown in figures 1 and 2:

step 1: removing abnormal values in the multivariate time series, and carrying out normalization processing on the multivariate time series with the abnormal values removed so that each value after the processing is normalized to the range of [ -1, 1], wherein the conversion formula is as follows:

wherein X_iIs the value in the original time series of the ith variable, X_i,minIs the minimum value, X, in the original time series of the ith variable_i,maxIs the maximum value in the original time series of the ith variable, X'_iThe value after normalization for the ith variable.

And manually setting the size T of a time window according to experience, and dividing the normalized data by using sliding step lengths with fixed lengths to obtain a training sample set.

Step 2: and (4) batching the training sample set according to a fixed batch size, wherein the total number of batches is N. And the set batch size M batches the training sample set to obtain a plurality of training samples, and the total number of batches is N. The specific calculation method is as follows:

wherein N is_SamplesThe total number of training samples in the set of training samples.

Step 3, randomly initializing a learnable initial node embedded vector E_nodesAnd an initial scale embedding vector E for each scale_scale。

Two types of parameters are randomly initialized: node-embedded vector E shared among all scales_nodesAnd an initial scale embedding vector E for each scale_scale。

And 4, sequentially selecting a batch of training samples with the index i from the training sample set, wherein i belongs to {0,1, …, N +. Steps 5-11 are repeated for each training sample in the batch.

Step 5, inputting the training sample set into the multi-scale pyramid network to obtain a multi-scale initial subsequence set of each training sample, that is, the initial subsequence set X of K scales is { X ═ X¹,…,X^k,…,X^KIn which X is¹Initial subsequence representing original scale, X^kDenotes the k (1)<K is less than or equal to K) initial subsequences with scales, and the specific steps are as follows:

the multi-scale pyramid network comprises a plurality of pyramid layers, each training sample in the training sample set is input into the multi-scale pyramid network, and each input training sample is obtained through each pyramid layer by gradually stacking the pyramid layersObtaining initial subsequences of different scales, wherein the subsequence of the k scale is X^kTo show, each pyramid layer contains two parallel convolutional networks:

the first convolutional network is used to capture the local pattern in the time dimension, and different pyramid layers use different convolutional kernel sizes. The initial convolution kernel has a larger receptive field and is slowly reduced in each pyramid layer, which can effectively control the size of the receptive field in the high-level pyramid layer and maintain the sequence characteristics of the large-scale subsequence. The convolution step size is set to 2 to increase the time scale so that each decomposition results in a subsequence of half the length of the input sequence. The calculation formula of the k-1 layer can be expressed as:

wherein

Representing a convolution operator, ReLU representing an activation function,

and

respectively representing the convolution kernel and the offset vector of the first convolution network in the (k-1) th pyramid layer,

representing the output of the first convolutional network.

The second convolutional network is to reduce the sensitivity of the model to the hyper-parameter settings (i.e., the convolutional kernel size and the step size) when only a single convolutional neural network is used. Introducing a convolution with a convolution kernel size of 1 × 1 and a pooling operation of 1 × 2, and constructing a structure parallel to the first convolutional neural network, formalized as:

wherein

And

the convolution kernel and the offset vector of the second convolution network in the k-1 th pyramid layer are represented, respectively, and Pooling represents the Pooling operation.

The outputs of the two convolutional neural networks are then added bitwise:

wherein X^kRepresenting a subsequence of the k-th scale.

Finally, generating K scales of initial subsequence sets by the multi-scale pyramid network

Wherein X¹Initial subsequence representing original scale, X^kDenotes the k (1)<K is less than or equal to K) initial subsequences of scales.

Step 6, embedding the initial node into the vector E_nodesAnd an initial scale embedding vector E for each scale_scaleSending the data into an adaptive graph learning module to obtain K adjacent matrixes { A) with specific scales¹,…,A^k,…,A^KIn which A is^kThe K (1 ≦ K) dimension specific adjacency matrix is represented, as shown in fig. 3, and the specific steps are:

the self-adaptive graph learning module comprises a multi-scale network layer, the self-adaptive graph learning module comprises the multi-scale network layer, each scale network layer comprises an initial scale embedded vector of a corresponding scale, the initial node embedded vectors are simultaneously input into the multi-scale network layer and fused with the initial scale embedded vectors of the corresponding scales to obtain an initial scale feature vector, and after pairwise similarity calculation is carried out on the initial scale feature vectors, sparse processing is carried out on the calculation results to obtain an adjacent matrix of each scale.

For the kth scale-specific network layer, the kth initial scale embeds a vector

Embedding vector E with initial node_nodesPerforming bit-wise multiplication:

wherein

Representing the initial scale-embedded vector in the k-th layer. Then, similar to the method of calculating node affinity using the similarity function, the similarity of paired nodes is calculated as follows:

wherein theta is^kAnd

are parameters that can be learned in layer k, and tanh and ReLU are activation functions.

The value of (a) is normalized to (0-1) as a weighted edge. To reduce the convolution of the graphCalculating cost, reducing the influence of noise and making the model more robust, the following strategies are introduced to ensure that

Thinning:

wherein A is^kAnd normalizing the final adjacency matrix of the kth scale by using a Softmax function, wherein a sparsification function Sparse is defined as:

where τ is the threshold of the TopK function, representing the maximum number of neighbors of a node. Finally, a dimension-specific adjacency matrix { A } may be obtained¹，…，A^k，…，A^K}。

Step 7, simultaneously inputting each time sequence and the adjacent matrix of the corresponding scale into the graph neural network to obtain the representation sequence of each time sequence, sharing parameters of the graph neural network at different times under the same scale, and obtaining the representation sequence set of all scale subsequences

Wherein

The method is used for representing the representation sequence of the kth (K is more than or equal to 1 and less than or equal to K) scale subsequence processed by the graph neural network, and comprises the following specific steps of:

initial set of subsequences

And adjacency matrix { A¹，…，A^k，…，A^KAnd (4) respectively sending the value of each moment in the subsequence of each scale and the corresponding adjacent matrix into the graph neural network, wherein the value of each moment in the subsequence of each scale and the corresponding adjacent matrix are sent into the graph neural network under one scaleThe neural networks of the graph at different times share parameters. For the k-th scale, X is first paired in the time dimension^kTime division is carried out to obtain

Introduce A into^kAnd A^kTranspose of (i.e., (A))^k)^T) And using two GNNs modules (in-degree information capture module)

And output information capturing module

) And simultaneously capturing the in-degree information and the out-degree information. The results of the two GNNs modules are then added:

wherein the content of the first and second substances,

and

for the k-th scale of the incoming information capture module,

and the k-th scale output information acquisition module. Obtain the output of all time

Finally, a representation sequence set of initial subsequences of all scales is obtained

Wherein

And representing the representation sequence of the kth (1 ≦ K ≦ K) scale subsequence processed by the graph neural network.

Step 8, inputting the expression sequence set into a time convolution network to obtain a final subsequence of each scale, and constructing a multi-scale final subsequence set { h } by using the multi-scale final subsequences¹，…，h^k，…，h^KIn which h is^kThe method represents the kth (K is more than or equal to 1 and less than or equal to K) scale final subsequence processed by the time convolution network, and comprises the following specific steps:

the representation sequence of each scale initial subsequence is respectively fed into the time convolution network. For the k scale, will

Time convolution network TCN fed into k scale^kIn order to obtain a final representation h of the scaled subsequence^k：

Wherein

Representing trainable parameters in the kth time convolution layer. And for each scale, respectively sending the subsequence corresponding to each variable into a time convolution network sharing parameters. Finally, a final subsequence set { h) of all scales is obtained¹，…，h^k，…，h^KIn which h is^kAnd representing the kth (K is more than or equal to 1 and less than or equal to K) final scale subsequence processed by the time convolution network.

Step 9, inputting the multi-scale final subsequence set into a multi-scale fusion module for weighted combination to obtain multi-scale fusion data h of each training sample_mAs shown in FIG. 4, the specific steps are：

The multi-scale fusion module is used for collecting the multi-scale final subsequence set h¹，…，h^k，…，h^KThe weighted combination is performed by first weighting { h }¹，…，h^k，…，h^KStacking to obtain a multi-scale final subsequence matrix H:

H＝Stac(h¹，…，h^k，…，h^K)

where Stack represents a stacking operation. Then, an averaging pooling operation is performed on the scale dimension:

wherein h is_poolA representation after the pooling operation is shown. Then, h is mixed_poolFlattening the vector into a multi-scale final subsequence one-dimensional vector, and inputting the vector into a thinning module consisting of two fully-connected layers so as to learn cross-scale information:

α₁＝ReLU(W₁h_pool+b₁)

α＝Sigmoid(W₂α₁+b₂)

wherein, W₁And W₂Is a weight matrix. b₁And b₂Is a bias vector. The second layer uses a Sigmoid activation function. Alpha is defined as a scale importance score vector that measures the degree of importance of all scale representations. Finally, weighting and combining all the scale subsequences by utilizing the aggregation layer to obtain a final representation h_m：

Step 10, fusing multi-scale data h of each training sample_mInputting into the multi-layer convolutional neural network for mapping to obtain a multivariate time sequence predicted value of each training sample

Comprises the following stepsThe method comprises the following steps:

using a convolutional neural network to convert h_mA convolutional neural network which is transformed into the desired output dimension and is subjected to another 1 × 1 convolution kernel to obtain the predicted value

Step 11, calculating the predicted loss

Namely the true value x corresponding to the training sample and the multivariable sequence predicted value of each training sample

The method comprises the following steps:

the present invention uses the squared error as the predicted loss

Namely the true value x corresponding to the training sample and the multivariable time sequence predicted value of each training sample

The error between:

step 12, loss function of all samples

Wherein

For the loss of the mth sample in a batch, M is the number of samples in each batchAmount of the compound (A).

And 13, repeating the steps 4-12 until all batches of the training data set participate in model training.

And 14, repeating the steps 4-13 until a specified iteration number is reached.

And step 15, inputting the multivariate time series, namely the historical observation time series of the household electricity consumption of the urban traffic flow or the urban block into the multivariate time series prediction model to predict the household electricity consumption of the urban traffic flow or the urban block at the next moment when in use.

Claims

1. A multivariate time series prediction method for multi-scale adaptive graph learning is characterized by comprising the following steps:

s6: when the method is applied, a multivariable time sequence, namely a historical observation time sequence of the household electricity consumption of the urban traffic flow or the urban block, is input into a multivariable time sequence prediction model to predict the household electricity consumption and the industrial electricity consumption of the urban traffic flow or the urban block at the next moment, a driving route is guided based on the urban traffic flow at the next moment, and the distributed and conveyed electricity quantity is guided based on the household electricity consumption and the industrial electricity consumption at the next moment.

2. The method of predicting multivariate time series for multi-scale adaptive graph learning as claimed in claim 1, wherein the step of obtaining training sample set by dividing the obtained multivariate time series by sliding time window comprises:

removing abnormal values in the multivariate time sequence, carrying out normalization processing on the multivariate time sequence with the abnormal values removed, setting a time window and a time step, and dividing the multivariate time sequence under the time window into a plurality of training samples based on the set time step to obtain a training sample set.

3. The multivariate time series prediction method for multi-scale adaptive graph learning according to claim 1, wherein inputting a training sample set into a multi-scale pyramid network to obtain a multi-scale initial subsequence set of each training sample comprises:

4. The method as claimed in claim 3, wherein each pyramid layer includes two parallel first and second convolutional networks, the initial sub-sequence obtained from the previous layer is input to the first and second convolutional networks, and then the bit-wise addition is performed on the outputs of the first and second convolutional networks to obtain the initial sub-sequence of the current layer.

5. The method of claim 1, wherein inputting the initial node embedding vector and the initial scale embedding vector of each scale into an adaptive graph learning module to obtain the adjacency matrix of each scale comprises:

6. The method of predicting multivariate time series for multi-scale adaptive graph learning as claimed in claim 1, wherein the value of each time instant and the adjacency matrix of the corresponding scale are simultaneously input into the graph neural network to obtain the representation sequence of the value of each time instant, comprising:

each graph neural network comprises an outgoing degree information capture module and an outgoing degree information capture module, the value of each moment and the adjacent matrix of the corresponding scale are respectively input into the incoming degree information capture module and the outgoing degree information capture module, and the output result of the incoming degree information capture module and the output result of the outgoing degree information capture module are fused to obtain the representation sequence of each moment sequence;

sequence of representations of a sequence of times t of the kth scale

Comprises the following steps:

wherein the content of the first and second substances,

and

for the k-th scale of the incoming information capture module,

and the k-th scale output information acquisition module.

7. The multivariate time series prediction method for multi-scale adaptive graph learning according to claim 1, wherein the step of inputting the multi-scale final subsequence set to a multi-scale fusion module for weighted combination to obtain the multi-scale fusion data of each training sample comprises the steps of:

stacking the multi-scale final subsequence sets to obtain a multi-scale final subsequence matrix, performing pooling operation on the multi-scale final subsequence matrix to obtain a multi-scale final subsequence one-dimensional vector, inputting the final subsequence one-dimensional vector to a thinning module to obtain a scale importance score vector, and fusing the scale importance score vector and the multi-scale final subsequence sets to obtain multi-scale fusion data of each sample.

8. The multivariate time series prediction method for multi-scale adaptive graph learning according to claim 7, wherein the scale importance score vector α is:

α₁＝ReLU(W₁h_pool+b₁)

α＝Sigmoid(W₂α₁+b₂)

9. The multivariate time series prediction method for multi-scale adaptive graph learning according to claim 1, wherein a model parameter θ is obtained by a training sample set loss function after iteration_aComprises the following steps:

theta is the initial model parameter before the iterative training, eta is the learning rate,

in order to train the sample set loss function,model parameter θ_aThe method comprises a final node embedding vector, a final scale embedding vector, a graph neural network parameter, an adaptive graph learning network parameter and a time convolution network parameter.