CN112232543A

CN112232543A - Multi-site prediction method based on graph convolution network

Info

Publication number: CN112232543A
Application number: CN202010895383.9A
Authority: CN
Inventors: 刘博�; 贺玺
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2020-08-31
Filing date: 2020-08-31
Publication date: 2021-01-15
Anticipated expiration: 2040-08-31
Also published as: CN112232543B

Abstract

The invention discloses a multi-site prediction method based on a graph convolution network, which is used for acquiring relevant atmospheric visibility data and cleaning the data after the data is acquired. And performing pretreatment. And based on a prediction model of the graph convolution network, performing prediction by using the same configuration in a contrast experiment, and finally comparing results. The method is improved based on the graph convolution network, and provides the reason for extracting the features in the space and time dimensions by utilizing the advantages of the graph convolution processing of non-European data and then introducing an attention mechanism to improve the model effect. The method provided by the invention has certain improvement in multi-station atmospheric visibility prediction compared with other models.

Description

Multi-site prediction method based on graph convolution network

Technical Field

The invention belongs to the technical field of machine learning, and particularly relates to a multi-station atmospheric visibility prediction task based on a graph convolution network and other related deep learning technologies.

Background

Atmospheric visibility is closely related to daily life of people, and the quality of the atmospheric visibility can reflect the quality of atmospheric environment in a region to a certain extent. In recent years, atmospheric visibility is reduced, air quality is poor, and more cities are often accompanied by haze weather, so that life and work of people are greatly affected, and the air quality problem is concerned by governments and people. The method can ensure traffic safety for accurate prediction of atmospheric visibility at multiple stations, particularly for prediction of low-visibility weather information, can make people control and prevent air pollution events in a targeted manner and reduce various losses caused by polluted weather, and has positive significance for traffic operation management departments, wide traveling citizens and maintenance of good air environment. In addition, with the reduction of data collection and storage cost and the rapid development of machine learning technology in recent years, various meteorological data of most regions can be accurately observed and stored, which provides a basis for abundant data and algorithm research for scientific research in the meteorological field.

Neural networks have evolved dramatically since 2012, with deep learning techniques making a number of breakthroughs in the research efforts in both computer vision and natural language processing areas. Compared with a series of steps that the features are required to be manually extracted when a traditional method is used for carrying out a picture classification task, and then the features are input into a classifier to finally obtain a classification result, a deep learning technology (such as a convolutional neural network) can directly input a picture into a model according to a fixed coding format, and finally a prediction label to which the picture belongs is directly output. Therefore, the two steps of feature extraction and classification can be combined into one, the tedious operation of manually extracting features is avoided, the end-to-end learning is adopted as the end-to-end learning, the features and the patterns with higher latitudes can be learned compared with the traditional method, and the labor consumption is reduced while the accuracy is improved.

Convolutional neural networks are limited in the type of data available, which requires that the input data must be confined to the euclidean domain. The most significant features of the european data are that the data have a regular spatial structure, for example, the image is a regular square grid, the voice is regular one-dimensional sequence data, and the data can be represented by a matrix. In the actual situations, such as electronic transactions, brain signals, recommendation systems and multi-site atmospheric visibility prediction problems in the research, most of the data in the problems do not have a regular spatial structure and are called non-European data. In these data structures, each node connection is different, i.e., the degree of each node may be different. In order to transplant the Convolution operation to such non-european data, a Graph Convolution Network (GCN) has appeared, and generally, the Graph Convolution Network has three steps, one is transmission, that is, each node transmits its own feature information to an adjacent node after transformation, which can achieve the purpose of extracting and transforming the feature information of each node, the second step is reception, each node aggregates the feature information of the adjacent nodes, thereby fusing the local structure information of the nodes, and the last step is transformation, that is, the previous information is aggregated and then non-linear transformation is performed, thereby achieving the purpose of increasing the expression capability of the model. In addition, the GCN model also has three properties of deep learning, namely hierarchical structure (characteristics are extracted layer by layer, one layer is more abstract and higher than one layer), nonlinear transformation (the expression capability of the model is increased) and end-to-end training, namely the model can learn by itself and feature information and structural information are fused by only giving a mark to a node of the graph without defining any rule. Certainly, the GCN also has the common property of the convolutional neural network, one is local parameter sharing, and the second is that the receptive field is proportional to the number of layers, when the first layer of calculation is started, each node contains the information of the respective directly adjacent node, and when the second layer of calculation is started, the information of the second-order neighbor can be also contained, so that the information participating in the calculation is more and more sufficient, that is to say, the more the number of layers, the wider the receptive field, and the more the information participating in the calculation.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a multi-site prediction method based on a graph convolution network, and the used data set is atmospheric visibility related data of 16 sites.

The technical scheme adopted by the invention is a multi-site prediction method based on a graph convolution network, which comprises the following steps:

step 1, obtaining relevant atmospheric visibility data, wherein the relevant atmospheric visibility data comprises seven characteristics of body sensing temperature, wind power, humidity, precipitation, atmospheric pressure and atmospheric visibility at the level of 16 urban areas of Beijing City from 2018 to 2019. The equipment used for collecting visibility data is a forward scattering visibility meter, the model number of which is DNQ1, the installation site covers 16 urban areas in Beijing City, including Huairou area, dense cloud area, Yanqing area, Chang Ping area, Shunqi area, valley area, gully area, mountain area, great happy area, Tongzhou area, mountain area, Fengtai area, Haizhou area, east City area, West City area and sunny area. The latitude and longitude ranges of the measurements are 115.7 ° E-117.4 ° E, 39.4 ° N-41.6 ° N. The data is cleaned after the data acquisition.

And 2, preprocessing the data aiming at the conditions of data loss, unsmooth and the like in the atmospheric visibility data collected in the step one.

And 3, constructing a training set and a test set by using the experimental data subjected to the preprocessing step based on the prediction model of the graph convolution network, and optimizing the hyper-parameters by using gradient descent in the experimental process so as to obtain the optimal solution of the prediction model.

And 4, predicting by using the same configuration in a control experiment, and finally comparing results.

Preferably, step 2 specifically comprises the following steps:

step 2.1, filling the missing values in the data by using the average values of the previous and next moments;

2.2, aiming at the unstable time sequence, converting the unstable time sequence into a stable sequence by a first-order difference processing method;

2.3, constructing an adjacency matrix of the meteorological sites through the geographical position distance between every two meteorological sites based on the positions of the 16 meteorological sites, and using the adjacency matrix as the other input data of the graph convolution network;

preferably, step 3 specifically comprises the following steps:

and 3.1, dividing the initial data set into a training set, a verification set and a test set according to a data division method, wherein the proportion is 60%, 20% and 20% respectively. The training set is used for training parameters of the model, and then the verification set is used for testing the accuracy of the current model after each iteration;

step 3.2, the model is composed of two space-time convolution modules, each space-time convolution module is composed of two time convolution layers and a space convolution layer, time and space characteristics are extracted by performing convolution operation on input data in a time dimension and a space dimension respectively, the purpose of accurate prediction is achieved, and in addition, the purpose of endowing the input data with different weights is achieved by adding an attention mechanism;

step 3.3, defining a loss function, using an average absolute error MAE loss function and an Adam version with a random gradient descending, and additionally setting 12 hours as historical time steps respectively, wherein 3, 6, 9 and 12 hours are prediction time steps, namely predicting the atmospheric visibility in the future of 3, 6, 9 and 12 hours by using visibility data in the past 12 hours;

step 3.4, the model is suitable for 512 training periods with the batch size of 128, then the model is trained, and the hyper-parameter is continuously adjusted to obtain the optimal solution of the model;

preferably, step 4 specifically comprises the following steps:

step 4.1, using a model based on a seq2seq structure as a contrast experiment model, and respectively obtaining seq2seq-LSTM with an LSTM encoder and a LSTM decoder, seq2seq-LSTM-AM with an LSTM encoder and a stress-adding mechanism, seq2seq-GRU with a GRU encoder and a stress-adding mechanism, and seq2seq-GRU-AM with a GRU encoder and a stress-adding mechanism;

and 4.2, performing visibility prediction with the same parameter configuration by using a contrast experiment model to obtain a result and carrying out error measurement, wherein the measurement indexes are MAE, MSE and RMSE, and compared with a seq2seq model for respectively carrying out modeling prediction on each station, the visibility prediction method based on the seq2seq model has certain improvement in time and accuracy through the experiment results shown in the figures 4-7.

Compared with the prior art, the invention has the following obvious advantages:

the method is improved based on the graph convolution network, and provides the reason for extracting the features in the space and time dimensions by utilizing the advantages of the graph convolution processing of non-European data and then introducing an attention mechanism to improve the model effect. The method provided by the invention has certain improvement in multi-station atmospheric visibility prediction compared with other models.

Description of the drawings:

FIG. 1 is a structure of a time convolution layer in a model of the present invention;

FIG. 2 is a schematic representation of a spatio-temporal convolution layer in a model of the present invention;

FIG. 3 is a flow chart of a method to which the present invention relates;

FIGS. 4-7 are comparisons of experimental results of the present invention.

Detailed Description

The present invention will be described in further detail below with reference to specific embodiments and with reference to the attached drawings.

The hardware equipment used by the invention comprises 1 PC and 2 1080 video cards;

as shown in fig. 3, the present invention provides a multi-site prediction method based on graph convolution network, which specifically includes the following steps:

step 1, acquiring relevant time sequence data, and cleaning the data.

And 2, preprocessing the data aiming at the conditions of data loss, unsmooth and the like.

And 2.1, filling missing values in the data by using average values of the previous time and the next time to complete.

And 2.2, aiming at the unstable time sequence, converting the unstable time sequence into a stable sequence by a first-order difference processing method.

And 2.3, constructing an adjacency matrix of the meteorological sites through the distance based on the positions of the 16 meteorological sites.

And 3, building a model, constructing a training set and a testing set, and optimizing parameters to obtain the optimal solution of the prediction model.

Step 3.1, the initial data set is divided into a training set, a verification set and a test set according to a widely used conventional data division method, and the proportion is 60%, 20% and 20% respectively. The training set is used to train the parameters of the model, and then the validation set is used to test the accuracy of the current model after each iteration.

And 3.2, defining a model which consists of two space-time convolution modules, wherein each space-time convolution module consists of two time convolution layers and a space convolution layer, and an attention mechanism is added to realize the purpose of endowing different weights to input data.

And 3.3, defining a loss function, using a Mean Absolute Error (MAE) loss function and an efficient Adam version with a random gradient descending, and additionally setting 12 hours as historical time steps respectively, wherein 3, 6, 9 and 12 hours are prediction time steps, namely predicting the atmospheric visibility at 3, 6, 9 and 12 hours in the future by using visibility data at the last 12 hours.

And 3.4, the model is suitable for 512 training periods with the batch size of 128, then the model is trained, and the hyper-parameter is continuously adjusted to obtain the optimal solution of the model.

And 4, predicting by using the same configuration in a contrast experiment, and finally comparing results.

And 4.1, mainly using a model based on a seq2seq structure in the contrast experiment model, and respectively using a seq2seq-LSTM with LSTM both as an encoder and a decoder, a seq2seq-LSTM-AM with LSTM both as an encoder and a decoder and with an attention system added, a seq2seq-GRU with GRU both as an encoder and a decoder, and a seq2seq-GRU-AM with GRU both as an encoder and a decoder and with an attention system added.

And 4.2, performing visibility prediction with the same parameter configuration by using a contrast experiment model to obtain a result and carrying out error measurement, wherein the measurement indexes are MAE, MSE and RMSE, and compared with the experiment results of FIGS. 4-7, the model disclosed by the invention is improved to a certain extent in time and accuracy compared with the case that a seq2seq model is used for carrying out modeling prediction on each station respectively.

The weather station is essentially a graph structure, and the characteristics of each node can be regarded as signals on the graph. Therefore, in order to fully utilize the topological characteristics of the network, the signal is directly processed by adopting graph convolution based on spectrogram theory on each time slice, and the signal correlation on the network is mined in a space dimension. The spectrum method converts the graph into an algebraic form and analyzes topological properties of the graph, such as connectivity in the graph. In spectrogram analysis, the map can be prepared fromThe graph is represented by a laplacian matrix, and the properties of the graph structure can be obtained by analyzing the laplacian matrix and eigenvalues thereof. GCN is a convolution operation using diagonalized linear operators in the Fourier domain instead of classical convolution operators and based thereon using the kernel g_θThe signal x on the graph G is filtered as shown in the following equation:

g_θ*Gx＝g_θ(L)x＝g_θ(U∧U^T)x＝∪g_θ(∧)U^Tx

where G represents the graph convolution operation, the above formula can also be understood as the G values are each separately transformed by the graph Fourier transform since the graph signal convolution operation is equal to the product of the signals that have been transformed into the spectral domain by the graph Fourier transform_θAnd x are subjected to Fourier transform, then the transform results are multiplied, and then the products are subjected to inverse Fourier transform, so that the final result of the convolution operation can be obtained. However, when there are many points in the graph and the data is large, it is time consuming to perform the eigen decomposition directly on the laplacian matrix. The present invention therefore employs Chebyshev polynomials to approximate and effectively solve this problem, as shown in the following formula:

wherein

Is a vector of coefficients of the polynomial,

the formula (5) is shown in (5-3):

wherein λ_maxIs the maximum eigenvalue of the laplacian matrix, and the recursion of chebyshev polynomials defines equation (5-4):

T_k(x)＝2xT_k-1(x)-T_k-2(x)

wherein T is₀(x)＝1，T₁(x) X. Solving this formula using approximate expansion of the Chebyshev polynomial is equivalent to passing through the convolution kernel g_θThe distance near each node in the extraction graph is from 0 to (K-1)^thNeighbor information of the range, eventually using the ReLU as the final activation function.

In the aspect of time feature extraction, some RNN-based models are common in time series analysis research, but a cyclic network model for multi-site atmospheric visibility requires a long time iteration, wherein a complex door mechanism cannot make timely adjustment well when dynamically changing the number of sites, and even needs to retrain the model. On the contrary, the graph convolution network model has certain advantages, can simplify a part of work and has no dependency on previous steps, the time dynamic characteristics are captured by utilizing a convolution structure on a time axis, and the specific design can form layered representation through a multilayer convolution structure to realize parallel training, so that the training time of the model is shortened. As shown in FIG. 1, the time convolution has a width k of 1-D_tFollowed by the connection of a gated linear unit GLU to provide the non-linear transformation. For each node in the graph, the time convolution layer is k on the time axis for the input element_tThe neighbors perform a convolution operation to obtain the temporal signature. It has been described so far how to use GCN to extract the spatial and temporal features of the nodes in the graph, and finally, to splice these two modules to form a space-time convolution layer, as shown in fig. 2, so as to further extract the space-time correlation between the nodes in the graph, and to add a full-connection layer at the end of the model to ensure that the final output has the same size and shape as the predicted target.

In addition, in order to acquire dynamic time and space correlation on the meteorological site network, a space attention mechanism and a time attention mechanism are introduced to solve the problem. In the spatial dimension, meteorological sites at different positions are mutually influenced, the influence is dynamically changed, and an attention mechanism is used for adaptively learning the dynamic correlation among nodes in the spatial dimension, and the formula is as follows:

wherein

Is of the gamma^thInput to a space-time convolution module, C_r-1Is of the gamma^thNumber of channels of input data of a layer. v. of_s，b_b，

Is a parameter to be learned, and σ is a sigmoid function used as an activation function therein. And dynamically updating an attention matrix s according to different inputs in each layer, wherein s_i，jThe value of (d) represents how semantically inordinate inodes and j nodes are related, and then the softmax function is used to ensure that the attention weight sum between the nodes is 1. In the process of graph convolution, the adjacency matrix A and the space attention moment matrix are utilized

The impact weights between the nodes are dynamically adjusted together.

In the time dimension, there is also a certain correlation between atmospheric visibility at different moments, and the correlation is different in different situations, and likewise, an attention mechanism is used to adaptively give different importance to different data, see the following formula:

wherein v is_e，

Is a learnable parameter, the temporal attention matrix E is determined by varying inputs, the element E in E_i，jThe strength of the dependency between time i and time j is shown over time, and finally E is normalized using the softmax function. Applying the normalized temporal attention matrix directly to the input then yields:

the input is dynamically adjusted by incorporating the relevant information. By then combining the temporal attention module and the spatial attention module, the spatiotemporal convolution module can automatically place a higher degree of attention on the valuable information.

The model structure provided by the invention is introduced, the space characteristic and the time characteristic in the multi-station atmospheric visibility data are completed by mainly utilizing the space-time convolution layers, each space-time convolution layer comprises two time convolution layers and one space convolution layer, so that the model learns the time characteristic and the space characteristic in the data, and in addition, an attention mechanism is introduced in the process to ensure that the model gives higher weight to more important information in the input data, thereby achieving the effect of improving the accuracy of the model.

The above embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and the scope of the present invention is defined by the claims. Various modifications and equivalents may be made by those skilled in the art within the spirit and scope of the present invention, and such modifications and equivalents should also be considered as falling within the scope of the present invention.

Claims

1. A multi-site prediction method based on graph convolution network is characterized in that: the method comprises the following steps:

step 1, obtaining relevant atmospheric visibility data of a meteorological station, wherein the relevant atmospheric visibility data comprises seven characteristics of sensible temperature, wind power, humidity, precipitation, atmospheric pressure and atmospheric visibility;

step 2, preprocessing the data aiming at the situations of data loss and unsmooth in the atmospheric visibility data collected in the step 1;

step 3, constructing a training set and a test set by using the preprocessed experimental data based on a prediction model of the graph convolution network, and optimizing the hyper-parameters by using gradient descent in the experimental process to obtain the optimal solution of the prediction model;

2. The multi-site prediction method based on graph and volume network as claimed in claim 1, wherein: the step 2 specifically comprises the following steps:

and 2.3, constructing an adjacency matrix of the meteorological sites through the geographical position distance between every two meteorological sites based on the positions of the 16 meteorological sites, and using the adjacency matrix as the other input data of the graph convolution network.

3. The multi-site prediction method based on graph and volume network as claimed in claim 1, wherein: the step 3 specifically comprises the following steps:

step 3.1, dividing the initial data set into a training set, a verification set and a test set according to a data division method, wherein the proportion is 60%, 20% and 20% respectively; the training set is used for training parameters of the model, and then the verification set is used for testing the accuracy of the current model after each iteration;

step 3.2, the model is composed of two space-time convolution modules, each space-time convolution module is composed of two time convolution layers and a space convolution layer, time and space characteristics are extracted by performing convolution operation on input data in a time dimension and a space dimension respectively, and the purpose of giving different weights to the input data is realized by adding an attention mechanism;

4. The multi-site prediction method based on graph and volume network as claimed in claim 1, wherein: the step 4 specifically comprises the following steps:

and 4.2, performing visibility prediction with the same parameter configuration by using a contrast experiment model to obtain a result and carrying out error measurement, wherein the measurement indexes are MAE, MSE and RMSE, and compared with a seq2seq model for respectively carrying out modeling prediction on each station, the model is improved to a certain extent in terms of time and accuracy.