CN114662791A - Long time sequence pm2.5 prediction method and system based on space-time attention - Google Patents
Long time sequence pm2.5 prediction method and system based on space-time attention Download PDFInfo
- Publication number
- CN114662791A CN114662791A CN202210424395.2A CN202210424395A CN114662791A CN 114662791 A CN114662791 A CN 114662791A CN 202210424395 A CN202210424395 A CN 202210424395A CN 114662791 A CN114662791 A CN 114662791A
- Authority
- CN
- China
- Prior art keywords
- data
- attention
- network
- prediction
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000000605 extraction Methods 0.000 claims abstract description 61
- 238000013528 artificial neural network Methods 0.000 claims abstract description 26
- 238000012549 training Methods 0.000 claims abstract description 23
- 230000002457 bidirectional effect Effects 0.000 claims abstract description 13
- 239000003344 environmental pollutant Substances 0.000 claims description 16
- 231100000719 pollutant Toxicity 0.000 claims description 16
- 230000000737 periodic effect Effects 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 9
- 239000013598 vector Substances 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 abstract description 5
- 230000006870 function Effects 0.000 description 24
- 239000003570 air Substances 0.000 description 11
- 230000007246 mechanism Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 6
- 238000010606 normalization Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000002441 reversible effect Effects 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 3
- 238000003915 air pollution Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 230000008034 disappearance Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000004880 explosion Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 239000000809 air pollutant Substances 0.000 description 1
- 231100001243 air pollutant Toxicity 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 239000012080 ambient air Substances 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000005180 public health Effects 0.000 description 1
- 210000002345 respiratory system Anatomy 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention belongs to the field of PM2.5 time sequence prediction, and relates to a long time sequence PM2.5 prediction method and a long time sequence PM2.5 prediction system based on space-time attention, wherein the method comprises the steps of obtaining output and preprocessing; inputting the preprocessed data into a feature extraction network for feature extraction; connecting and fusing the features extracted from different sites by using a space attention network; obtaining the past characteristics of the characteristics processed by the spatial attention network through a multi-layer bidirectional LSTM; extracting the known future characteristic data corresponding to the time period to be predicted through a neural network to obtain future characteristics, and connecting the future characteristics to obtain a prediction result; iteratively training the network using a loss function that accounts for standard deviation fluctuations and mean errors of the data until convergence; inputting the data of the station to be tested into a PM2.5 prediction network which is trained and based on space-time attention, and outputting a prediction result; the method can accurately predict the pm2.5 of the long time sequence.
Description
Technical Field
The invention belongs to the field of PM2.5 time sequence prediction, and particularly relates to a long time sequence PM2.5 prediction method and system based on space-time attention.
Background
PM2.5 (particles with aerodynamic diameter less than 2.5 μm) is a major pollutant in the atmosphere and has attracted extensive attention in the field of prognostics due to its negative impact on the quality of the ambient air, public health and socioeconomic development. Especially, PM2.5 concentration prediction has important significance on controlling and reducing air pollution, and is helpful for governments to make effective early warning decisions and remind public to travel healthily. Therefore, more accurate predictions of PM2.5 concentration made by an effective predictive model will become increasingly important. However, since pm2.5 is affected by external factors such as weather, vehicle flow, wind direction, wind speed, and other meteorological factors, a complex time entanglement occurs, making pm2.5 prediction challenging in long time series.
The PM2.5 concentration prediction method is mainly divided into two main categories, namely a physical model method and a data driving method. Physical model-based approaches such as the CMAQ model and the WRF/Chem model are widely used for air quality prediction. The physical model method is usually based on professional knowledge such as the physical change process of the atmospheric pollutants, and a pollutant concentration change model is constructed from a professional perspective. The method based on the physical model has the main advantages of wide application and clear view of the operation rule and principle of the interaction of all elements under certain environmental conditions. However, because pm2.5 components have obvious variation, the physical propagation process is very complex, and corresponding knowledge and guess are lacked, so that all the occurring conditions are difficult to consider comprehensively. Meanwhile, the environment of different areas is very different, such as wind direction and weather climate, the difference of local industrial facility construction conditions and deployment positions and density, and the change of vehicle flow, so that the transmission and reaction of air pollutants are greatly different in different places. Therefore, compared with a physical model method, the statistical method based on data driving is simple in modeling and has good performance.
In the prior art, however, due to the dynamic characteristic of the atmospheric environment, a Recurrent neural network (Recurrent neural network) can process any input sequence, thereby ensuring the learning capability of the time sequence, and being particularly suitable for simulating the time evolution of atmospheric pollutant distribution. However, when the conventional RNN lag time is long, problems such as disappearance of the gradient, explosion of the gradient, etc. occur. Long short term memory (Long short term memory) networks can alleviate this problem to some extent.
Recently, methods using complex models have also become a trend, and some methods use convolutional neural networks (convolutional neural networks) to mine the nonlinear spatial correlation of data, thereby further improving the performance of the models. Many researchers use neural networks based on CNN-LSTM multi-layer structures to learn the intrinsic spatiotemporal correlation of air pollution time series data. However, the CNN network is a two-dimensional convolution, and this method destroys the original structural information of the data and ignores the time correlation.
The existing PM2.5 concentration prediction method ignores the dynamic influence of the space-time states of different sites on the future PM2.5 concentration, and most methods cannot effectively simulate the space-time dependence of the PM2.5 concentration at the same time. Meanwhile, the simultaneous processing of the spatial relationship between the characteristics of the sites themselves and the sites can cause errors in the extraction of the characteristics, and meanwhile, due to the diversity of the environmental characteristics of different areas, how to learn the spatial correlation among different sites according to the self-adaptive extraction characteristics of data characteristics and capture a complex pm2.5 periodic pattern is a problem to be solved urgently at present.
Disclosure of Invention
In order to solve the problems, the invention provides a self-adaptive long time sequence pm2.5 prediction method and a self-adaptive long time sequence pm2.5 prediction system based on a space-time attention mechanism, wherein the method comprises the following steps of:
acquiring different pollutant concentration data and meteorological factor data, and carrying out normalization and missing value filling on the data;
inputting the preprocessed meteorological data of different sites into a corresponding feature extraction network for feature extraction;
connecting and fusing the features extracted from different sites by using a space attention network;
putting the processed features into a multilayer bidirectional LSTM to obtain the forward and reverse trends of the data, and extracting the complex periodic features of PM2.5 concentration;
obtaining known future feature data, obtaining future features through an embedding layer by using a feature extraction network, connecting the past features, and finally outputting a regression result through a linear layer;
iteratively training the network using a loss function that accounts for standard deviation fluctuations and mean errors of the data until convergence;
and inputting the data of the station to be tested into the trained PM2.5 prediction network based on space-time attention, and outputting a prediction result.
Further, inputting the preprocessed meteorological data of different sites into the corresponding feature extraction network for feature extraction includes:
FEN(f)=GLU(μ0)+μ1;
μ0=tanh(w0f+b0);
μ1=w1f+b1;
FEN (f) is a feature extraction network; f is the preprocessed meteorological data of different stations; GLU () is a gated linear network; w is a0Is a feature weight; b0Is a bias term; w is a1Is a feature weight; b1Is a bias term.
Further, the process of extracting the nonlinear features by the gated linear network is represented as follows:
GLU(μ0)=(σ(w1*μ0+b2)⊙(w1*μ0+b3));
among them, GLU (. mu.) is0) Is a non-linear feature extracted from the input data; w is a1Is a hidden feature weight; b2、b3Is a bias term; an indication of a dot product; σ () represents a sigmoid function.
Further, the features extracted from different stations are connected and fused by using a spatial attention network, that is, the extracted features are input into a feed-forward neural network to obtain feature factors of the stations, which are expressed as:
h0=wtarhtar+btar;
αi=concat(hi,h0);
and respectively splicing the characteristic factors of the target site with the characteristic factors of other sites, and calculating the attention weight value of the spliced value through a hyperbolic tangent function activation function, wherein the calculation is represented as:
calculating the attention weight of each site through softmax and the attention weight value, and expressing the attention weight as follows:
wherein h is0Influencing the weight for the target feature; w is atarIs the target site space weight; btarA spatial offset for the target site; h isiIs a characteristic factor of a non-target site; alpha is alphaiIs h0And hiSplicing the obtained features;denotes alphaiThe j-th dimension of (1); w is aiIs a feature weight; biSpatial offset for the ith station;represents the importance weight of the jth dimension of air station i,a spatial attention weight representing the jth dimension of an air station i; l is the characteristic dimension of the site, and exp represents an exponential function; h istarIs a target site signature sequence.
Further, the loss function considering the standard deviation fluctuation and the mean error of the data is expressed as:
wherein, Loss is a Loss function; MSE is the mean square error in the parameter estimation; std*Is the standard deviation of the predicted sequence; std is the standard deviation of the authentic sequence; w is a2Regularize the parameter for L2, denoted asλ is a regularization parameter, wiParameters of the ith neural network; and M is the number of parameters in the neural network.
Further, the mean square error in the parameter estimation is an expected value of the square of the difference between the parameter estimation value and the parameter true value, and is represented as:
wherein N is the number of sites;is a parameter estimation value; y is(j)And (4) parameter truth values.
Further, the standard deviation calculation process of the predicted sequence and the real sequence comprises:
wherein N is the number of sites;is a predicted sequence point value; y is*-Is a predicted sequence mean; y is(j)Is a true sequence point value; y is-Is the true sequence mean.
The invention also provides a long time sequence pm2.5 prediction system based on space-time attention, which is used for realizing a long time sequence pm2.5 prediction method based on space-time attention, and the system comprises a time sequence data acquisition module, a feature extraction module, a space attention network, a multilayer bidirectional LSTM, a time sequence feature extraction module, a feature connection module and a prediction module; wherein:
the time sequence data acquisition module is used for acquiring pollutant concentration data and meteorological factor data of different sites, including historical data and real-time data, and training the system according to the historical data; inputting real-time data into a system completing training to predict in real time;
the characteristic extraction module is used for extracting the characteristics of the data acquired by the time sequence data acquisition module, and inputting the weather data after pretreatment of different sites into a corresponding characteristic extraction network for characteristic extraction; and to compare the characteristics of the target site, i.e. the site to be predicted, with the characteristics of other sites
The spatial attention network is used for acquiring the attention weight of each auxiliary station, namely, the station to be predicted is taken as a target station, other stations are taken as auxiliary stations, the characteristic factors of the stations are obtained through the characteristics acquired by the characteristic extraction module through a feedforward neural network, the attention weight is calculated through a hyperbolic tangent function activation function after the characteristic factors of the target station are respectively spliced with the characteristic factors of each auxiliary station, and the attention weight of each station is calculated through softmax and the attention weight;
the multi-layer bidirectional LSTM is used for extracting periodic characteristics of the output characteristics of the spatial attention network;
the time sequence feature extraction module is used for acquiring known future feature data, namely the season of a time period to be predicted and information of upcoming festivals and holidays, converting the acquired information into a dimension vector by adopting embedded operation, and extracting features of the dimension vector through a neural network;
and the prediction module is used for obtaining the linear weighting of the characteristics output by the multilayer bidirectional LSTM and the characteristics output by the time sequence characteristic extraction module to obtain a regression prediction result, and taking the prediction result as the output of a predicted system.
Compared with the prior art, the beneficial technical effects of the invention comprise:
(1) compared with other pm2.5 time sequence prediction models, the method has the effect of high precision, and can be used for performing feature extraction and space-time attention parameters on a data set input by any feature dimension.
(2) A self-adaptive feature selection network is designed, linear features and non-linear features of data can be dynamically captured, the fitting complexity of the model is adaptively determined according to features of different data sets, and the flexibility of the model is improved.
(3) A new attention mechanism is designed for the model, and accurate space interpretation can be realized. The attention mechanism may obtain attention weights between the target site and the secondary site simultaneously. The algorithm adaptively weights different feature states of different regions and captures complex dynamic relationships between each auxiliary time series and the target time series.
(4) According to the invention, a time feature extraction module is added after the spatial feature extraction, and the module can capture forward and backward time trends of data and can be used for extracting complex cycle features.
(5) The invention provides a time sequence characteristic enhancing module, which utilizes known future data, enhances the perception visual field of a model based on the known time sequence data such as holiday periods and the like, and finally connects past high-dimensional hidden characteristic data with the future hidden characteristic to obtain multi-scale time sequence characteristic data.
(6) The invention provides an error measurement index which not only utilizes absolute errors on a predicted value and an actual observation value, but also can measure the fluctuation degree of long time series data and comprehensively reflect the difference between the predicted value and the actual observation value.
Drawings
FIG. 1 is a flowchart of a long time series pm2.5 prediction method based on a spatio-temporal attention mechanism according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a site adaptive feature extraction module according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a station space attention weight obtaining module according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an LSTM module according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a long-time pm2.5 prediction network based on a spatio-temporal attention mechanism according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a training process according to an embodiment of the present invention;
fig. 7 is a diagram illustrating an application effect of the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a self-adaptive long time sequence pm2.5 prediction method based on a space-time attention mechanism, which comprises the following steps of:
acquiring different pollutant concentration data and meteorological factor data, and carrying out normalization and missing value filling on the data;
inputting the preprocessed meteorological data of different sites into a corresponding feature extraction network for feature extraction;
connecting and fusing the features extracted from different sites by using a space attention network;
putting the processed features into a multilayer bidirectional LSTM to obtain the forward and reverse trends of the data, and extracting the complex periodic features of PM2.5 concentration;
obtaining known future feature data, obtaining future features through an embedding layer by using a feature extraction network, connecting the past features, and finally outputting a regression result through a linear layer;
iteratively training the network using a loss function that accounts for standard deviation fluctuations and mean errors of the data until convergence;
and inputting the data of the station to be tested into the trained PM2.5 prediction network based on space-time attention, and outputting a prediction result.
The long time sequence pm2.5 prediction method based on space-time attention provided by the invention can be applied to the following scenes:
weather forecast scene
For the demand of climate prediction, a technical auxiliary means can be realized for a relevant mechanism, the pm2.5 time sequence prediction model provided by the application is called by a background, and the pm2.5 prediction result of a future period of time is extracted for a user.
Second, travel prompting scene
For example, when the user needs to schedule a travel plan, in order to reduce travel damage caused by an unexpected weather condition, the user may be provided with air quality conditions for a future period of time to assist the user in scheduling the travel plan.
Third, air early warning scene
The system can assist relevant government departments to have reliable decision of data support, and cities and individuals can respond in advance to air pollution early warning which possibly occurs, for example, traffic travel is limited, heavily polluted factories are temporarily closed, outdoor activities of partial crowds (with diseases related to respiratory tract and the like) are limited, and a series of behaviors that the individuals wear masks in advance are adopted to reduce possible damage.
For convenience of understanding, this embodiment proposes a specific implementation of a long time series pm2.5 prediction method based on a spatio-temporal attention mechanism, as shown in fig. 1, including:
101. and acquiring pollutant concentration data and gas phenomenon factor data of different sites, and performing pretreatment operation on the data.
Specifically, the existing air quality data sets of Xian and Beijing are adopted to carry out corresponding preprocessing, in the embodiment of the invention, the air quality data set of Beijing is taken as an example, and the data set consists of three parts, namely air quality characteristics, meteorological characteristics and time characteristics. The air quality data of the study data set is derived from UCI public data. The method comprises the steps of selecting 43824 air quality records of 12 monitoring stations in the Beijing region from 1 month and 1 day in 2010 to 31 days in 12 months in 2014 each hour. Each air quality example contains six pollutants, namely the pollutant concentration data includes: PM2.5, PM10, NO2, CO, O3 and SO 2. Simultaneously recording weather, namely weather factor data comprises 7 attributes of time, weather, temperature, pressure, humidity, wind speed and wind direction.
These data sets fluctuate widely in both mean and variance. For the data set, 60%, 10% and 30% are selected as training, verifying and testing sets according to the time sequence order. And for continuous missing data, performing linear interpolation to fill the missing data according to the front data and the rear data. The present embodiment also adds time information including year, month, day of the week, etc. to each data block.
And finally, before the data set is input into the network, normalization processing is carried out on the feature data of different scales so as to ensure that each feature is treated equally by a classifier, and the situation that small data features on absolute numerical values are excessively influenced by large data features is avoided.
102. And inputting the preprocessed meteorological data of different sites into a feature extraction network for feature extraction.
Inputting the preprocessed training time sequence data sample into a Feature Extraction Network (Feature Extraction Network). Firstly, time sequence data samples from different sites are used as training samples to enter respective feature extraction networks, wherein input f is multi-dimensional time sequence features of a single site, and classification features are converted into dimension vectors by embedding operation for possible category features to unify data. Firstly, a linear activation layer is used for acquiring nonlinear characteristic data of a data set, then, a gated linear network GLU is used for carrying out self-adaptive selection on nonlinear characteristics mapped to a hidden layer, important characteristics are further amplified, and some characteristics which may not be effective are suppressed, wherein the formula of the GLU is as follows:
GLU(μ0)=(σ(w1*μ0+b2)⊙(w1*μ0+b3));
meanwhile, in order to enable the model to have the capability of self-adaptive selection of the fitting complexity, residual error connection is carried out on input, initial input is enabled to obtain simple linear characteristics through a linear layer, and finally the extracted linear characteristics and nonlinear characteristics are added to carry out self-adaptive characteristic selection, wherein the overall structure is as follows:
μ0=tanh(w0f+b0)
μ1=w1f+b1
FEN(f)=GLU(μ0)+μ1
103. and performing connection fusion on the extracted features of the different sites by using a spatial attention network.
In this embodiment, the spatial attention extraction network is used to perform spatial attention acquisition on the features of each station after feature extraction processing, and first, a target station feature sequence is passed through a shallow layer perceptron, namely a feed-Forward Neural Network (FNN), to obtain a target feature influence weight h0Wherein h istarFor the target site characteristic sequence, then connecting the characteristic sequences of all sites with the characteristic factors of the target site to obtain sequence characteristics alphaiThe method comprises the following steps:
h0=wtarhtar+btar
αi=concat(hi,h0)
wherein, wtarIs the target site space weight; btarThe targeted site is spatially offset.
Then, let the sequence features pass through shallow multi-layer perception and hyperbolic tangent function activation function (tanh), obtain the following attention weights:
next, the spatial attention weight of each site is estimated by the softmax formula:
whereinIs a normalized attention weight value representing the attention weight value given by different features of different sites, wherein i represents the site index, j represents the jth dimension feature of the site, and L represents the number of features in each site.
104. And putting the processed features into a multi-layer bidirectional LSTM, and acquiring the forward and reverse trends of the data to extract the complex periodic features of PM2.5 concentration.
Thanks to a complex gating mechanism, LSTM can selectively store and discard information in long-time data, so that the problems of gradient explosion and gradient disappearance during long-time dependency modeling can be effectively alleviated compared with the conventional RNN, and the basic structure of the LSTM unit module is shown. It is composed of memory cells c responsible for storing historical informationtHidden state h at the present momenttAnd the other three gating mechanisms responsible for handling message passing. Wherein forgetting door ftIs formed by ht、xtC of the last moment of decisiont-1For obtaining a message, the input gate being used to determine the input x fromtAnd a previous-time hidden state ht-1How many messages are fetched, the output gate decides from the current ctObtain and output htThe formula is as follows:
ft=σg(Wfxt+Ufht-1+bf)
it=σg(Wixt+Uiht-1+bi)
ot=σg(Woxt+Uoht-1+bo)
ct=ft⊙ct-1+it⊙σc(Wcxt+Ucht-1+bc)
ht=ot⊙tanh(ct)
wherein σg() Representing a sigmoid function; wfRepresenting a forgetting gate weight; x is the number oftRepresenting a current time input; u shapefRepresenting a forgetting gate weight; b is a mixture offIndicating a forgotten gate bias; wiRepresenting an input gate weight; u shapeiRepresenting an input gate weight; biRepresenting an input gate weight; woRepresents an output gate weight; u shapeoThe representation represents the output gate weight; b0Represents the output gate offset; i all right angletIndicating the result of the input gate; sigmac() Denotes a sigmoid function, in the present embodiment the subscripts g, c of the sigmoid function only indicate that the sigmoid function is a sigmoid function for some area, e.g. sigmag() Is sigmoid function of forgetting gate, input gate and output gatec() A sigmoid function representing the current cell; wcRepresenting a current cell state weight; u shapecRepresenting a current cell state weight; bcIndicating the current cell state bias.
The Bi-LSTM module acts as a periodic time simulator, can capture the forward and reverse trends of the data, and can be used for extracting the PM2.5 concentration period characteristics. Therefore, the multi-layer bidirectional LSTM is selected for extracting the time characteristics of the network, and compared with the periodic characteristics of the time sequence data obtained by the unidirectional LSTM, the time sequence data carries richer characteristic information, can capture the forward and backward trends of the periodic data at the same time, can more efficiently utilize the time sequence information of the data, and can more accurately predict the time sequence data.
105. And embedding known future feature data into a layer, then using a feature extraction network to obtain future features, connecting the past features, and finally outputting a regression result through a linear layer.
Pm2.5 has significant multi-scale periodicity in many regions of the world, subject to human natural activity and meteorological conditions. Known future feature data includes season, upcoming holidays, weekends, etc. Unlike the continuous digital features such as pm2.5, the features of the future time period can be obtained in advance, and the features are directly and correspondingly related to the predicted result in time and have an auxiliary effect on the final predicted result.
In order to unify numerical characteristics and category characteristics, embedding operation is adopted to convert the category characteristics into dimension vectors. This operation is similar to word embedding in natural language processing tasks and can be trained over a network. In this way, the class features have a "semantic" meaning and can be directly input into the neural network. And after feature extraction is carried out on the future time features subjected to embedding operation, the future time features are connected and fused with the processed past time sequence features to obtain a regression prediction result.
106. The network is iteratively trained until convergence using a loss function that takes into account the standard deviation fluctuations and mean error of the data.
In the invention, the parameters in the model are updated by using a loss function containing L2 weight regularization, so that the deep network overfitting can be prevented, and the condition that a certain characteristic in the model is better than the predictive performance of the model is avoided. The loss function may be defined as follows:
where MSE is defined as the mean square error in the estimation of a parameter, which is the expectation of the square of the difference between the estimated value of the parameter and the true value of the parameter, where N is the number of predicted samples.
Std*Std is defined as follows, and standard deviation is also called standard deviation, or experimental standard deviation, and is most commonly used in probability statistics as a measure of the degree of statistical distribution. The standard deviation is the arithmetic square root of the variance. The standard deviation can reflect the degree of dispersion of a data set. The standard deviation is not necessarily the same for two sets of data with the same mean, so to reflect the data fluctuation difference between the predicted sequence and the true sequence at the same time, we add the standard deviation to the formula, where Std*The predicted sequence is shown, Std represents the true sequence, y-Represents the mean of the true sequence; y is*-Means representing a predicted sequence; y is(j)A value representing the jth element in the real sequence;and representing the value of the jth element in the prediction sequence, wherein each element in the prediction sequence and the real sequence is a station.
w2For L2 regularization, w2Is expressed as:
where λ is the regularization parameter, wiIs the weight parameter of the ith neural network, and M is the number of parameters in the neural network. Set up w2The purpose is to limit the parameters too much or too large and avoid the model from being more complicated. For example, when using a polynomial model, if higher order polynomials are included in the model, the model may be too complex and overfitting may easily occur. Therefore, to prevent overfitting, the weight of its high-order part can be limited to 0, which is equivalent to converting from a high-order form to a low-order form.
Therefore, when the prediction timing sample length is N, the loss function can also be written in the following form.
107. And inputting the station data to be tested into a PM2.5 prediction network based on space-time attention after training is completed, and outputting a prediction result.
In the embodiment of the invention, sequencing data to be detected can be input into a neural network, the prediction length can be dynamically selected according to the actual prediction requirement, meanwhile, a past regression result can be output from the past sequencing data, or a past and future characteristic connection fusion result can be output by combining a future time characteristic network, the model can be applied to any application related to a multivariate space-time sequence, not only can pm2.5 be responded, but also the space-time interpretability is provided for the prediction.
In other embodiments of the present invention, the present invention further provides a system for long time series pm2.5 prediction based on spatio-temporal attention, the system comprising:
and the time sequence data acquisition module is used for acquiring pollutant concentration data and gas image factor data of different sites.
And the data preprocessing module is used for preprocessing the pollutant concentration data and the meteorological factor data of different sites.
The neural network training module based on space-time attention inputs the preprocessed training time sequence data sample into the feature extraction network, enables initial input to obtain linear features and simultaneously extracts nonlinear features, and finally performs self-adaptive feature selection. Then, for the features of each station after feature extraction processing, acquiring the spatial attention weight by using a spatial attention extraction network to obtain the attention weight of each station, then selecting a multilayer bidirectional lstm for extracting the time features of the network, simultaneously capturing the forward and backward trends of periodic data, acquiring the known future feature data by using the feature extraction network after embedding the layer, connecting the past features, and finally outputting a regression result through a linear layer; the loss function of standard deviation fluctuation and mean error of data is considered jointly, iterative training is carried out on the neural network until convergence is achieved;
and the output module is used for outputting the prediction result of the station time sequence data to be tested.
In some embodiments, the present invention may use an ADAM optimizer to perform training adjustment, after multiple rounds of training, the neural network tends to be stable, and the iterative training is ended, and the training process is as shown in fig. 6:
after pollutant concentration data and meteorological factor data of different sites are obtained, preprocessing a data set;
constructing a neural network model based on space-time attention;
training a neural network by using the data set, and performing multiple iterations;
performing loss solution on the output result of the neural network and the real value of the time sequence until the loss tends to be stable;
at this time, the training is finished and the trained neural network model is saved.
The trained neural network model is shown in fig. 5, in which the adaptive feature extraction module is shown in fig. 2, the spatial attention feature extraction module is shown in fig. 3, and the temporal attention extraction module is shown in fig. 4.
In some embodiments, the neural network training module comprises a past feature network module and a future feature network module, wherein the past feature network module is composed of an adaptive feature extraction module, a spatial attention feature extraction module and a temporal attention extraction module, and the network module is used for extracting past features; and the future feature network module is composed of a feature embedding module and a feature extracting module.
The self-adaptive feature extraction module enables each station to initially input and acquire linear features and simultaneously extract nonlinear features, and finally self-adaptive feature selection is carried out, the space attention feature extraction module is used for acquiring space attention weights of each station, the time attention extraction module is used for acquiring complex periodic patterns of data, and the future feature network module is based on regression results of past features and integrates future time features to further carry out regression prediction.
FIG. 7 is a timing prediction result diagram of the present invention, which is used to obtain original past and future timing data, and then preprocess the data by missing value filling, data normalization, data alignment and data cleaning, so as to make the input more stable and reliable; and inputting the processed data into a pm2.5 time sequence prediction network based on space-time attention to perform feature extraction and time sequence prediction, and finding out a final output prediction result containing a predicted value and a true value from an image.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (8)
1. A long time sequence pm2.5 prediction method based on space-time attention is characterized by comprising the following steps:
acquiring different pollutant concentration data and meteorological factor data, and normalizing and filling missing values of the pollutant concentration data and the meteorological factor data;
inputting the preprocessed meteorological data of different sites into a corresponding feature extraction network for feature extraction;
connecting and fusing the features extracted from different sites by using a space attention network;
obtaining the past characteristics of the characteristics processed by the spatial attention network through a multi-layer bidirectional LSTM;
obtaining and converting known future feature data corresponding to a time period to be predicted, namely season and holiday information corresponding to the time period to be predicted, into a dimension vector through an embedding layer, and extracting future features through a neural network;
connecting the past characteristics with the future characteristics, and outputting a regression result through a linear layer to obtain a prediction result;
iteratively training the network using a loss function that accounts for standard deviation fluctuations and mean errors of the data until convergence;
and inputting the data of the station to be tested into the trained PM2.5 prediction network based on space-time attention, and outputting a prediction result.
2. The spatio-temporal attention-based long time series pm2.5 prediction method according to claim 1, wherein the step of inputting the preprocessed meteorological data of different sites into the corresponding feature extraction networks for feature extraction comprises:
FEN(f)=GLU(μ0)+μ1;
μ0=tanh(w0f+b0);
μ1=w1f+b1;
FEN (f) is a feature extraction network; f is the preprocessed meteorological data of different stations; GLU () is a gated linear network; w is a0Is a feature weight; b0Is a bias term; w is a1Is a feature weight; b1Is the bias term.
3. The spatio-temporal attention-based long time series pm2.5 prediction method according to claim 2, characterized in that the process of gating the linear network to extract the nonlinear features is expressed as:
GLU(μ0)=(σ(w1*μ0+b2)⊙(w1*μ0+b3));
among them, GLU (. mu.) is0) Is a non-linear feature extracted from the input data; w is a1Is a hidden feature weight; b2、b3Is a bias term; an indication of a dot product; σ () represents a sigmoid function.
4. The spatio-temporal attention-based long time sequence pm2.5 prediction method according to claim 1, characterized in that features extracted from different stations are connected and fused by using a spatial attention network, that is, the extracted features are input into a feed-forward neural network to obtain feature factors of the stations, and are expressed as follows:
h0=wtarhtar+btar;
αi=concat(hi,h0);
and respectively splicing the characteristic factors of the target site with the characteristic factors of other sites, and calculating the attention weight value of the spliced value through a hyperbolic tangent function activation function, wherein the calculation is represented as:
calculating the attention weight of each site through softmax and the attention weight value, and expressing the attention weight as follows:
wherein h is0Influencing the weight for the target feature; w is atarIs the target site space weight; btarA spatial offset for the target site; h isiIs a characteristic factor of a non-target site; alpha is alphaiIs h0And hiSplicing the obtained features;denotes alphaiThe j-th dimension of (1); w is aiIs a feature weight; biSpatial offset for the ith station;represents the importance weight of the jth dimension of air station i,a spatial attention weight representing the jth dimension of an air station i; l is the characteristic dimension of the site, and exp represents an exponential function; h istarIs a target site signature sequence.
5. A spatiotemporal attention-based long time series pm2.5 prediction method as claimed in claim 1, characterized in that the loss function considering the standard deviation fluctuation and mean error of the data is expressed as:
wherein, Loss is a Loss function; MSE is the mean square error in the parameter estimation; std*Is the standard deviation of the predicted sequence; std is a true sequence tagTolerance; w is a2Regularized by L2, denoted asλ is a regularization parameter, wiIs the parameter of the ith neural network; and M is the number of parameters in the neural network.
6. The spatio-temporal attention-based long time series pm2.5 prediction method as claimed in claim 4, wherein the mean square error in the parameter estimation is an expectation value of the square of the difference between the parameter estimation value and the parameter true value, and is expressed as:
7. The spatio-temporal attention-based long time series pm2.5 prediction method according to claim 4, wherein the standard deviation calculation process of the predicted sequence and the real sequence comprises:
8. A long time sequence pm2.5 prediction system based on space-time attention is characterized in that the system is used for realizing any one of the methods for predicting the long time sequence pm2.5 based on space-time attention according to claims 1 to 7, and the system comprises a time sequence data acquisition module, a feature extraction module, a space attention network, a multi-layer bidirectional LSTM, a time sequence feature extraction module, a feature connection module and a prediction module; wherein:
the time sequence data acquisition module is used for acquiring pollutant concentration data and meteorological factor data of different sites, including historical data and real-time data, and training the system according to the historical data; inputting real-time data into a system for completing training to predict in real time;
the characteristic extraction module is used for extracting the characteristics of the data acquired by the time sequence data acquisition module, and inputting the preprocessed meteorological data of different sites into a corresponding characteristic extraction network for characteristic extraction; and to compare the characteristics of the target site, i.e. the site to be predicted, with the characteristics of other sites
The spatial attention network is used for acquiring the attention weight of each auxiliary station, namely, the station to be predicted is taken as a target station, other stations are taken as auxiliary stations, the characteristic factors of the stations are obtained through the characteristics acquired by the characteristic extraction module through a feedforward neural network, the attention weight is calculated through a hyperbolic tangent function activation function after the characteristic factors of the target station are respectively spliced with the characteristic factors of each auxiliary station, and the attention weight of each station is calculated through softmax and the attention weight;
the multi-layer bidirectional LSTM is used for extracting periodic characteristics of the output characteristics of the spatial attention network;
the time sequence feature extraction module is used for acquiring known future feature data, namely the season of a time period to be predicted and information of upcoming festivals and holidays, converting the acquired information into a dimension vector by adopting embedded operation, and extracting features of the dimension vector through a neural network;
and the prediction module is used for obtaining the linear weighting of the characteristics output by the multi-layer bidirectional LSTM and the characteristics output by the time sequence characteristic extraction module to obtain a regression prediction result, and taking the prediction result as the output of a predicted system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210424395.2A CN114662791A (en) | 2022-04-22 | 2022-04-22 | Long time sequence pm2.5 prediction method and system based on space-time attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210424395.2A CN114662791A (en) | 2022-04-22 | 2022-04-22 | Long time sequence pm2.5 prediction method and system based on space-time attention |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114662791A true CN114662791A (en) | 2022-06-24 |
Family
ID=82037089
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210424395.2A Pending CN114662791A (en) | 2022-04-22 | 2022-04-22 | Long time sequence pm2.5 prediction method and system based on space-time attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114662791A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116913098A (en) * | 2023-09-14 | 2023-10-20 | 华东交通大学 | Short-time traffic flow prediction method integrating air quality and vehicle flow data |
CN116936103A (en) * | 2023-09-12 | 2023-10-24 | 神州医疗科技股份有限公司 | User health prediction management method and system based on homodromous network |
CN117609792A (en) * | 2024-01-18 | 2024-02-27 | 北京英视睿达科技股份有限公司 | Water quality prediction model training method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180073759A1 (en) * | 2016-09-13 | 2018-03-15 | Board Of Trustees Of Michigan State University | Intelligent Sensing System For Indoor Air Quality Analytics |
CN109214592A (en) * | 2018-10-17 | 2019-01-15 | 北京工商大学 | A kind of Air Quality Forecast method of the deep learning of multi-model fusion |
US20210018210A1 (en) * | 2019-07-16 | 2021-01-21 | Airthinx, Inc | Environment monitoring and management systems and methods |
CN113887143A (en) * | 2021-10-21 | 2022-01-04 | 重庆邮电大学 | Spatial interpolation method and device for multi-source heterogeneous air pollutants and computer equipment |
-
2022
- 2022-04-22 CN CN202210424395.2A patent/CN114662791A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180073759A1 (en) * | 2016-09-13 | 2018-03-15 | Board Of Trustees Of Michigan State University | Intelligent Sensing System For Indoor Air Quality Analytics |
CN109214592A (en) * | 2018-10-17 | 2019-01-15 | 北京工商大学 | A kind of Air Quality Forecast method of the deep learning of multi-model fusion |
US20210018210A1 (en) * | 2019-07-16 | 2021-01-21 | Airthinx, Inc | Environment monitoring and management systems and methods |
CN113887143A (en) * | 2021-10-21 | 2022-01-04 | 重庆邮电大学 | Spatial interpolation method and device for multi-source heterogeneous air pollutants and computer equipment |
Non-Patent Citations (2)
Title |
---|
JUNYOUNG CHOI: "Air Quality Prediction with 1-Dimensional Convolution and Attention on Multi-modal Features", 《2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING》, 10 March 2021 (2021-03-10), pages 196 - 202 * |
XIAOXIA ZHANG: "An adaptive spatio-temporal neural network for PM2.5 concentration forecasting", 《ARTIFICIAL INTELLIGENCE REVIEW》, vol. 56, 31 May 2023 (2023-05-31), pages 14483 - 14510 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116936103A (en) * | 2023-09-12 | 2023-10-24 | 神州医疗科技股份有限公司 | User health prediction management method and system based on homodromous network |
CN116936103B (en) * | 2023-09-12 | 2023-12-15 | 神州医疗科技股份有限公司 | User health prediction management method and system based on homodromous network |
CN116913098A (en) * | 2023-09-14 | 2023-10-20 | 华东交通大学 | Short-time traffic flow prediction method integrating air quality and vehicle flow data |
CN116913098B (en) * | 2023-09-14 | 2023-12-22 | 华东交通大学 | Short-time traffic flow prediction method integrating air quality and vehicle flow data |
CN117609792A (en) * | 2024-01-18 | 2024-02-27 | 北京英视睿达科技股份有限公司 | Water quality prediction model training method |
CN117609792B (en) * | 2024-01-18 | 2024-06-11 | 北京英视睿达科技股份有限公司 | Water quality prediction model training method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111899510B (en) | Intelligent traffic system flow short-term prediction method and system based on divergent convolution and GAT | |
CN114626512B (en) | High-temperature disaster forecasting method based on directed graph neural network | |
CN111191841B (en) | Power load prediction method and device, computer equipment and storage medium | |
CN108280551B (en) | Photovoltaic power generation power prediction method utilizing long-term and short-term memory network | |
Alaloul et al. | Data processing using artificial neural networks | |
Wang et al. | Adaptive learning hybrid model for solar intensity forecasting | |
CN114662791A (en) | Long time sequence pm2.5 prediction method and system based on space-time attention | |
CN110348624B (en) | Sand storm grade prediction method based on Stacking integration strategy | |
CN112488415A (en) | Power load prediction method based on empirical mode decomposition and long-and-short-term memory network | |
CN112116080A (en) | CNN-GRU water quality prediction method integrated with attention mechanism | |
CN111695731B (en) | Load prediction method, system and equipment based on multi-source data and hybrid neural network | |
CN111814956B (en) | Multi-task learning air quality prediction method based on multi-dimensional secondary feature extraction | |
CN112508265A (en) | Time and activity multi-task prediction method and system for business process management | |
CN114492922A (en) | Medium-and-long-term power generation capacity prediction method | |
CN117494034A (en) | Air quality prediction method based on traffic congestion index and multi-source data fusion | |
CN114676822A (en) | Multi-attribute fusion air quality forecasting method based on deep learning | |
CN116844041A (en) | Cultivated land extraction method based on bidirectional convolution time self-attention mechanism | |
CN117494871A (en) | Ship track prediction method considering ship interaction influence | |
CN116434569A (en) | Traffic flow prediction method and system based on STNR model | |
CN117332335A (en) | Domino effect prediction method based on information fusion | |
CN115052018A (en) | Big data system of thing networking smog and environmental parameter | |
Bi et al. | Multi-indicator water time series imputation with autoregressive generative adversarial networks | |
Hartomo et al. | Enhancing Multi-Output Time Series Forecasting with Encoder-Decoder Networks. | |
CN118194139B (en) | Spatio-temporal data prediction method based on adaptive graph learning and nerve controlled differential equation | |
CN117496699B (en) | Traffic flow prediction method for multi-element traffic flow time-space data information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |