CN113095550B - Air quality prediction method based on variational recursive network and self-attention mechanism - Google Patents
Air quality prediction method based on variational recursive network and self-attention mechanism Download PDFInfo
- Publication number
- CN113095550B CN113095550B CN202110322814.7A CN202110322814A CN113095550B CN 113095550 B CN113095550 B CN 113095550B CN 202110322814 A CN202110322814 A CN 202110322814A CN 113095550 B CN113095550 B CN 113095550B
- Authority
- CN
- China
- Prior art keywords
- data
- hidden layer
- encoder
- information
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 230000007246 mechanism Effects 0.000 title claims abstract description 16
- 238000012549 training Methods 0.000 claims abstract description 31
- 238000012360 testing method Methods 0.000 claims abstract description 21
- 239000003344 environmental pollutant Substances 0.000 claims abstract description 10
- 231100000719 pollutant Toxicity 0.000 claims abstract description 10
- 238000013528 artificial neural network Methods 0.000 claims abstract description 7
- 239000000356 contaminant Substances 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 20
- 238000004364 calculation method Methods 0.000 claims description 17
- RAHZWNYVWXNFOC-UHFFFAOYSA-N Sulphur dioxide Chemical compound O=S=O RAHZWNYVWXNFOC-UHFFFAOYSA-N 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- UGFAIRIUMAVXCW-UHFFFAOYSA-N Carbon monoxide Chemical compound [O+]#[C-] UGFAIRIUMAVXCW-UHFFFAOYSA-N 0.000 claims description 3
- 229910002091 carbon monoxide Inorganic materials 0.000 claims description 3
- 238000000342 Monte Carlo simulation Methods 0.000 claims description 2
- 238000004140 cleaning Methods 0.000 claims description 2
- 238000011109 contamination Methods 0.000 claims description 2
- 238000013136 deep learning model Methods 0.000 claims description 2
- 230000004069 differentiation Effects 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 claims description 2
- 239000000047 product Substances 0.000 claims description 2
- 239000013589 supplement Substances 0.000 claims description 2
- 230000009466 transformation Effects 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims 1
- 230000000306 recurrent effect Effects 0.000 abstract description 3
- 238000009825 accumulation Methods 0.000 abstract description 2
- 238000013135 deep learning Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 235000010269 sulphur dioxide Nutrition 0.000 description 2
- YTAHJIFKAKIKAV-XNMGPUDCSA-N [(1R)-3-morpholin-4-yl-1-phenylpropyl] N-[(3S)-2-oxo-5-phenyl-1,3-dihydro-1,4-benzodiazepin-3-yl]carbamate Chemical compound O=C1[C@H](N=C(C2=C(N1)C=CC=C2)C1=CC=CC=C1)NC(O[C@H](CCN1CCOCC1)C1=CC=CC=C1)=O YTAHJIFKAKIKAV-XNMGPUDCSA-N 0.000 description 1
- 238000003915 air pollution Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000003912 environmental pollution Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000005180 public health Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 239000004291 sulphur dioxide Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Tourism & Hospitality (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Artificial Intelligence (AREA)
- Strategic Management (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Development Economics (AREA)
- Mathematical Analysis (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Computational Mathematics (AREA)
- Operations Research (AREA)
- Geometry (AREA)
- Game Theory and Decision Science (AREA)
- Medical Informatics (AREA)
- Quality & Reliability (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Entrepreneurship & Innovation (AREA)
- Computer Hardware Design (AREA)
Abstract
The invention discloses an air quality prediction method based on a variational recurrent neural network and a self-attention mechanism, which comprises the following steps: air quality data and weather data are acquired and preprocessed to construct input data and output data. The input data to the encoder includes contaminant data and historical meteorological data. The input data of the decoder comprises the output result of the encoder, weather forecast data and pollutant data of the last moment. The data is partitioned into training data and test data. Training the Seq2Seq model using training data: the predicted outcome is tested using the test data. The present invention predicts air quality using the Seq2Seq model. Firstly, a self-attention mechanism is introduced in the input stage of the encoder, so that characteristic factors are selected, dependence relations are needed for a long time are mastered, VRNN is used for replacing the RNN of the decoder in a model, complex dependence relations between different time steps of an output end are further captured, error accumulation is effectively reduced, and accordingly prediction accuracy is improved.
Description
Technical Field
The invention belongs to the technical field of data mining, and is mainly used for establishing an air quality prediction model.
Background
The accurate prediction research result of the air quality not only can control the change trend of the air pollution more intuitively, but also has important guiding significance in the fields of urban environmental pollution treatment, urban construction, public health and the like, and a plurality of scholars aim at the prediction research of the air quality in recent decades. In recent years, the deep learning method is widely applied to various time Sequence prediction problems, and the current mainstream model is the Sequence 2Seq through gradual development from RNN (Recurrent Neural Network), LSTM (Long Short-Term Memory), GRU (Gated Recurrent Unit) to the Sequence 2Seq (Sequence-to-Sequence), and the time Sequence prediction problem is also very suitable for air quality prediction, because the task of air quality prediction is to obtain a pollutant Sequence for a future period by using historical pollutants and weather information sequences. Current studies will generally employ the Seq2Seq and attention mechanisms. However, the current research has two problems, namely, the training speed of the Seq2Seq is too slow, because a model is generally built for each monitoring station by using deep learning to predict the air quality, and the prediction accuracy of such a statistical model is low along with the time, so that retraining is often needed for a period of time, and if a large number of models are trained simultaneously, a large amount of time is consumed, and therefore, the training needs to be accelerated. The other is that the air quality data are space-time heterogeneous data, and a large amount of noise exists at the same time, so that the current mainstream model cannot model the high variability of the prediction data, and therefore the prediction precision disturbance is extremely large, and the problem of low prediction precision is caused.
Disclosure of Invention
The invention aims to solve the problem of slow training speed of the Seq2Seq model, and introduces a strong dependency relationship of potential semantic variable capturing prediction time steps so as to improve prediction accuracy.
For the problem of slow training of the Seq2Seq, the root of the problem is that the training speed of the RNN is slow, and the calculation of each time step of the RNN needs to wait for the last time step to finish, so that the calculation cannot be performed in parallel. And the sequence coding of the RNN is only suitable for the short-distance dependency relationship due to the gradient disappearance problem when the long-distance dependency relationship is processed. A long-distance dependency relationship between input sequences is established, a fully-connected network can be used, but the fully-connected network cannot process variable-length sequences, so that an attention model capable of dynamically generating weights is used for replacing a fully-connected layer, and position codes are added to retain time sequence information of the input sequences. After each time step uses the self-attention mechanism, all time steps can complete calculation in parallel, and can process variable-length sequences, and because the self-attention mechanism can capture the dependency relationship of input sequences, the training speed can be effectively improved. In addition, VRNN recursive prediction is applied to the decoder, as shown in fig. 1. The reason for the large fluctuation of the prediction error is that the air quality data is space-time heterogeneous data, is highly structured data, has extremely large fluctuation due to disturbance of environmental noise, and has relatively small error in predicting the previous time steps, but the prediction error of the current time step is larger because the prediction is recursive, the prediction input is the prediction result of the last time step and the last time step is error as the latter time step is reached. The decoder is replaced by VRNN, potential semantic information among different time steps in a prediction stage can be captured, internal relations of the different time steps are examined, potential random variables are introduced into the Seq2Seq model to guide the generation process of hidden layer variables, prediction input depends on hidden layer states, and therefore the generation of prediction output is indirectly influenced by the introduced potential random variables. Meanwhile, in order to train the posterior probability model in a deep learning environment, a neural network and a heavy parameter method are adopted to approximate the posterior probability. Therefore, the method can be used for mutually constraining in different time steps in the prediction stage to generate a robust and complex dependency relation model, and capturing global context semantics, so that the performance of the Seq2Seq model is improved, and errors are reduced.
The technical scheme adopted by the invention is an air quality prediction method based on a variation recursive network and a self-attention mechanism, and the method comprises the following steps:
step 1, acquiring air quality data and atmospheric data, performing pretreatment operations such as arrangement and cleaning on the data, and constructing input data and output data; the input data of the encoder includes contaminant data and historical meteorological data; the input data of the decoder comprises the output result of the encoder, weather forecast data and pollutant data at the last moment;
step 2, dividing the data into training data and test data;
step 3, constructing an AVAQP model, and training the AVAQP model by using training data:
1) And inputting the input data and the position codes into the encoder to obtain the hidden layer state of the encoder at each moment.
2) Constructing a variation inference model of the potential random variable, and calculating the potential random variable z j 。
3) And taking the prediction result and the latent semantic variable obtained in the last time step as the input of the current time step, and obtaining the hidden layer state of the decoder VRNN.
4) The context vector is derived using the decoder hidden layer state and the encoder state.
5) The input data at the next time, including the predicted concentration at the previous time and the weather data at the next time, the potentially random information, the decoder hidden layer state, and the context information are used to generate a predicted probability distribution.
6) Constructing a loss function and optimizing using a gradient descent algorithm
And 4, testing the prediction result by using the test data.
The present invention predicts air quality using the Seq2Seq model. The self-attention model is used for replacing the RNN of the encoder, and the position coding is used for preserving the time sequence relation of the input sequence, so that the effect of accelerating training while keeping the prediction precision is achieved. The prediction process adopts n-step recursive prediction, so that error accumulation can be effectively reduced, and the prediction precision is improved.
Drawings
Fig. 1 is a flow chart of the AVAQP training.
Fig. 2 is an internal structural diagram of the GRU.
Fig. 3 is a schematic diagram of a single decoding time step of the AVAQP.
Detailed Description
Taking air quality prediction as an example, the following is a detailed description of the present invention with reference to the examples and the accompanying drawings.
The present invention uses a PC and requires a GPU with sufficient computing power to accelerate training. As shown in fig. 1, the air quality prediction method based on the extreme learning machine provided by the invention comprises the following specific steps:
step 1, acquiring data and preprocessing the data to construct input and output;
the acquired data typically includes air quality data and weather data that need to be processed into an input sequence and an output sequence, typically the input sequence includes contaminant data and weather data over a period of time. Let D= { X, Y } beThe data set after processing. Where X is the input sequence, i.e., historical data, including contaminant data and weather data. For each input sequence x εR S×Q The length of the device is S, namely historical data of the past S hours, and the device has Q characteristics, namely pollutant data such as PM2.5, carbon monoxide, sulfur dioxide and the like and weather data such as temperature, humidity and the like. For each target sequence y εR T And has a length of T, i.e., pollutant data for a future T hours. In practice, y may contain multiple targets, such as e.g. PM2.5, carbon monoxide, sulphur dioxide, etc. as predicted by time.
And 2, dividing the data into training data and test data.
And (3) dividing the sample obtained in the step (2) into training data and test data, wherein the training data is used for training a model, and the test data is used for testing the effect of the model.
And 3, training the AVAQP model by using training data.
1) And inputting the input data and the position codes into the encoder to obtain the hidden layer state of the encoder at each moment.
Inputting the input data and the position codes into an encoder to obtain the hidden layer state of the encoder at each moment; performing linear transformation on the input data to obtain three groups of vector sequences Q, K, T; the query vector sequence, key vector sequence and value vector sequence in the self-attention mechanism are calculated as follows:
Q=W Q (X+PE)
K=(W K X+PE)
V=(W V X+PE)
wherein W is Q 、W K 、W V The PE is a position coding matrix and is the same as the dimension of input data; adding position codes to supplement sequence position information; each row corresponds to an input sequence.
Inputting the converted vector sequence into an encoder to obtain the hidden layer state of the encoder at each moment; the hidden layer state of the encoder is calculated as follows:
wherein the method comprises the steps ofIs the state of the hidden layer, i, j E [1, N]The positions of the current time step sequence and other sequences are respectively. Connection weight alpha ij Dynamically generated by an attention mechanism; note also that here the activation function is tan, which is to be consistent with the activation function of the decoder, defined as:
the attention scoring function uses a scaled dot product, which can be written as:
wherein d is s Is a manually set super parameter, in order to make the gradient more stable.
2) Constructing a variation inference model of the potential random variable, and calculating the potential random variable z j The method comprises the steps of carrying out a first treatment on the surface of the The key to VRNN is modeling the distribution associated with potentially random variables. The posterior probability and the prior probability are fitted with two neural networks, respectively, wherein the posterior probability model can be expressed asThe mean and variance calculation formula is:
wherein h is zτ Is a potential random variationThe semantic space of the quantities is estimated by a nonlinear fitting method. The prior probability model is similar to the posterior probability model, but it is noted that the parameters between them are not shared. z τ The calculation formula of (2) is as follows:
z τ =μ τ +σ τ ⊙∈
where epsilon is the noise introduced,z for each time step j Non-stationary, further improving the prediction robustness.
3) Taking the prediction result and the latent semantic variable obtained in the last time step as the input of the current time step, and obtaining the hidden layer state of the decoder VRNN; the decoder adopts a gate control circulation unit GRU, and the output of each moment of the GRU is output; firstly, calculating the value of an update gate in the GRU, and updating the information of the gate control entering the current unit; the update gate calculation formula for the τ+1th time step is:
u τ+1 =σ(W u h τ +U u x τ+1 +C u c τ +V u z τ +b u )
wherein u is τ Is an update door, W u 、U u 、C u 、V u And b u Respectively represent the weight and bias of the updates, h τ The hidden layer state of the GRU at the previous moment is the characteristic obtained after GRU processing at the previous moment, and x is the value of the hidden layer τ+1 The input data representing the current time may be y τ I.e. the result of the prediction of the last time step; weather forecast data can also be input together in the case of weather forecast, namely [ y ] τ ,wf τ ]Wherein wf τ Is weather forecast data required by the current time step; c τ Is the context variable calculated at the current moment; notably, z τ The method has important influence on the representation of the hidden layer state of the decoder, and can capture the characteristics between the prediction outputs of adjacent time steps; sigma represents a logistic function, which is defined as follows:
then calculating the value of a reset gate, wherein the reset gate is used for selectively forgetting the previous information, and if the current moment is winded, forgetting the information that the current moment is not winded; the meaning and calculation mode of the reset gate parameter are similar to those of the update gate, and the calculation formula is as follows:
r τ+1 =σ(W r h τ +U r x τ+1 +C r c τ +V r z τ +b r )
next, candidate output is calculatedIt represents new information obtained by fusing the information of the last step and the current information, and the calculation formula is as follows:
at the moment, the reset gate is responsible for controlling the information obtained in the last step to be forgotten, and the value range of the logistic function is (0, 1), so that the value range of the reset gate is (0, 1); when the value of the reset gate is close to 0, the information of the last step is almost completely forgotten, so that the effect of resetting is achieved; when the value of the reset gate is close to 1, the information of the last step is almost completely reserved; and finally, calculating the state of the GRU hidden layer, wherein the calculation formula is as follows:
the update gate controls the proportion of the new information and the information of the last step, and when the update gate value is close to 1, the new information proportion is close to 100%; when the value of the update gate is close to 0, the information of the last step is close to 100%.
4) The context vector is derived using the decoder hidden layer state and the encoder state. The attention vector determines the importance of each instant of the encoding result, which is measured by the similarity of the decoder hidden layer state and the encoder hidden layer state. The importance of each instant of the encoding result can therefore be calculated by the following formula:
after normalizing the result, the attention vector can be obtained:
a τ the greater the value, the greater the impact it has on the current decoding time. Use a τ Calculating a weighted average for the encoding result to obtain context c τ It represents a feature of the past contamination and meteorological data that is useful for predicting the current moment. Finally, the prediction result can be obtained by the following formula:
y τ =W p *[h τ ,c τ ,z τ ]+b p
5) Generating a predictive probability distribution using the input data at the next time, including the predicted concentration at the previous time and the weather data at the next time, the potentially random information, the decoder hidden layer state, and the context information, defined as:
p(y τ |X,y <τ ,z τ )=exp{g(W d [y τ-1 ;h τ ;c τ ;z τ ]+b d )}
where g is the activation function.
6) Constructing a loss function and optimizing by using a gradient descent algorithm; for deep learning model training, small batch and batch gradient descent is adopted, and due to probability expectation, a Monte Carlo method is adopted to approximate expectation. So for a small batch of data, its loss function is calculated by the following formula:
where L is the number of samples in a small batch of data; the parameters in the model can ultimately be adjusted using a gradient descent algorithm to minimize the loss function, while the gradient used for gradient descent is calculated using a back-propagation algorithm or an automatic differentiation tool.
Step 4, testing the prediction result by using the test data
Inputting the test data into an AVAQP model to obtain a prediction sequence of each sample, and adjusting parameters of the neural network to obtain a better result if the test result is not ideal.
The above embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, the scope of which is defined by the claims. Various modifications and equivalent arrangements of this invention will occur to those skilled in the art, and are intended to be within the spirit and scope of the invention.
Claims (4)
1. An air quality prediction method based on a self-attention mechanism and a variation recursion network is characterized in that: the method comprises the following steps:
step 1, acquiring air quality data and atmospheric data, and performing arrangement and cleaning pretreatment on the data to construct input data and output data; the input data of the encoder includes contaminant data and historical meteorological data; the input data of the decoder comprises the output result of the encoder, weather forecast data and pollutant data at the last moment;
step 2, dividing the data into training data and test data;
step 3, constructing an AVAQP model, and training the AVAQP model by using training data:
1) Inputting the input data and the position codes into an encoder to obtain the hidden layer state of the encoder at each moment;
2) Constructing a variation inference model of the potential random variable, and calculating the potential random variable z j ;
3) Taking the prediction result and the latent semantic variable obtained in the last time step as the input of the current time step, and obtaining the hidden layer state of the decoder VRNN;
4) Obtaining a context vector using the decoder hidden layer state and the encoder state;
5) Generating a predictive probability distribution using input data at a next time, including a predicted concentration at a previous time and weather data at a next time, potentially random information, decoder hidden layer state, and context information;
6) Constructing a loss function and optimizing using a gradient descent algorithm
Step 4, testing the prediction result by using the test data;
in step 3, constructing an AVAQP model, and training the AVAQP model by using training data;
1) Inputting the input data and the position codes into an encoder to obtain the hidden layer state of the encoder at each moment; performing linear transformation on the input data to obtain three groups of vector sequences Q, K, V; the query vector sequence, key vector sequence and value vector sequence in the self-attention mechanism are calculated as follows:
Q=W Q (X+PE)
K=W K (X+PE)
V=W V (X+PE)
wherein W is Q 、W K 、W V The PE is a position coding matrix and is the same as the dimension of input data; adding position codes to supplement sequence position information; each row corresponds to an input sequence;
inputting the converted vector sequence into an encoder to obtain the hidden layer state of the encoder at each moment; the hidden layer state of the encoder is calculated as follows:
wherein the method comprises the steps ofIs the state of the hidden layer, i, j E [1, N]The positions of the current time step sequence and other sequences are respectively; connection weight alpha ij Dynamically generated by an attention mechanism; note also that here the activation function is tan, which is to be consistent with the activation function of the decoder, defined as:
the attention scoring function uses a scaled dot product, written as:
wherein d is s The super parameter is artificially set, and the purpose is to make the gradient more stable;
2) Constructing a variation inference model of the potential random variable, and calculating the potential random variable z j The method comprises the steps of carrying out a first treatment on the surface of the The key to VRNN is modeling the distribution associated with potentially random variables; the posterior probability and the prior probability are respectively fitted by two neural networks, wherein the posterior probability model is expressed asThe mean and variance calculation formula is:
wherein h is zτ Is the semantic space of potential random variables, estimated by a nonlinear fitting method; the prior probability model is similar to the posterior probability model, and parameters between the prior probability model and the posterior probability model are not shared; z τ The calculation formula of (2) is as follows:
z τ =μ τ +σ τ ⊙∈
where epsilon is the noise introduced,let +.>Non-stationary, further improving the predictive robustness;
3) Taking the prediction result and the latent semantic variable obtained in the last time step as the input of the current time step, and obtaining the hidden layer state of the decoder VRNN; the decoder adopts a gate control circulation unit GRU, and the output of each moment of the GRU is output; firstly, calculating the value of an update gate in the GRU, and updating the information of the gate control entering the current unit;
the update gate calculation formula for the τ+1th time step is:
wherein u is τ Is an update door, W u 、U u 、C u 、V u And b u Respectively represent the weight and bias of the update gate, h τ The hidden layer state of the GRU at the previous moment is the characteristic obtained after GRU processing at the previous moment, and x is the value of the hidden layer τ+1 Input data y representing the current time τ I.e. the result of the prediction of the last time step; the weather forecast data are input together under the condition of weather forecast, namely [ y ] τ ,wf τ ]Wherein wf τ Is weather forecast data required by the current time step; c τ Is the context variable calculated at the current moment; it is to be noted that,the method has important influence on the representation of the hidden layer state of the decoder, and can capture the characteristics between the prediction outputs of adjacent time steps; sigma represents a logistic function, which is defined as follows:
then calculating the value of a reset gate, wherein the reset gate is used for selectively forgetting the previous information, and if the current moment is winded, forgetting the information that the current moment is not winded; the meaning and the calculation mode of the reset gate parameter are consistent with those of the update gate, and the calculation formula is as follows:
next, candidate output is calculatedIt represents new information obtained by fusing the information of the last step and the current information, and the calculation formula is as follows:
at the moment, the reset gate is responsible for controlling the information obtained in the last step to be forgotten, and the value range of the logistic function is (0, 1), so that the value range of the reset gate is (0, 1); when the value of the reset gate is 0, the information of the previous step is forgotten completely, so that the reset effect is achieved; when the value of the reset gate is 1, the information of the last step is almost completely reserved; and finally, calculating the state of the GRU hidden layer, wherein the calculation formula is as follows:
the update gate controls the proportion of the new information and the information of the last step, and when the update gate takes a value of 1, the new information accounts for 100 percent; when the value of the update gate is 0, the information of the previous step accounts for 100 percent;
4) Obtaining a context vector using the decoder hidden layer state and the encoder state; the attention vector determines the importance of each moment of the encoding result, the importance being measured by the similarity of the decoder hidden layer state and the encoder hidden layer state; the importance of each instant of the encoding result is therefore calculated by the following formula:
after normalizing the result, an attention vector is obtained:
a τ the greater the value, the greater the impact it has on the current decoding moment; use a τ Calculating a weighted average for the encoding result to obtain context c τ It represents a feature of past contamination and meteorological data useful for current time prediction; finally, the prediction result can be obtained by the following formula:
5) Generating a predictive probability distribution using the input data at the next time, including the predicted concentration at the previous time and the weather data at the next time, the potentially random information, the decoder hidden layer state, and the context information, defined as:
wherein g is an activation function;
6) Constructing a loss function and optimizing by using a gradient descent algorithm; for deep learning model training, small batch gradient descent is adopted, and due to probability expectation, a Monte Carlo method is adopted to approximate expectation; so for a small batch of data, its loss function is calculated by the following formula:
where L is the number of samples in a small batch of data; finally, the gradient descent algorithm is used to adjust the parameters in the model to minimize the loss function, while the gradient used for gradient descent is calculated using a back-propagation algorithm or an automatic differentiation tool.
2. An air quality prediction method based on a self-attention mechanism and a variational recursive network according to claim 1, wherein: the implementation process of step 1 is as follows,
the atmospheric data crawled through python comprises atmospheric pollutant data and weather data, and is preprocessed, wherein the preprocessing comprises the steps of deleting repeated values, filling the missing values, and then carrying out normalization processing to divide the input sequence and the output sequence; the input data includes contaminant data and weather data for 72 hours of history; let d= { X, Y } be the dataset after processing; wherein X is an input sequence, i.e., historical data, including contaminant data and weather data; for each input sequence x εR S×Q The length of the sensor is S, namely historical data of the past S hours, and the sensor has Q characteristics, namely PM2.5, carbon monoxide, sulfur dioxide pollutant data and temperature and humidity weather data; for each target sequence y εR T The length of the sample is T, namely pollutant data of the future T hours; y contains multiple targets.
3. An air quality prediction method based on a self-attention mechanism and a variational recursive network according to claim 1, wherein: and (3) dividing the sample obtained in the step (2) into training data and test data, wherein the training data is used for training a model, and the test data is used for testing the effect of the model.
4. An air quality prediction method based on a self-attention mechanism and a variational recursive network according to claim 1, wherein: the implementation process of step 4 is as follows,
inputting the test data into an AVAQP model to obtain a prediction sequence of each sample, and adjusting parameters of the neural network to obtain a better result if the test result is not ideal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110322814.7A CN113095550B (en) | 2021-03-26 | 2021-03-26 | Air quality prediction method based on variational recursive network and self-attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110322814.7A CN113095550B (en) | 2021-03-26 | 2021-03-26 | Air quality prediction method based on variational recursive network and self-attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113095550A CN113095550A (en) | 2021-07-09 |
CN113095550B true CN113095550B (en) | 2023-12-08 |
Family
ID=76669979
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110322814.7A Active CN113095550B (en) | 2021-03-26 | 2021-03-26 | Air quality prediction method based on variational recursive network and self-attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113095550B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113743648B (en) * | 2021-07-30 | 2022-07-15 | 中科三清科技有限公司 | Air quality ensemble forecasting method, device, equipment and readable storage medium |
CN113762351B (en) * | 2021-08-12 | 2023-12-05 | 吉林大学 | Air quality prediction method based on deep transition network |
CN113657122B (en) * | 2021-09-07 | 2023-12-15 | 内蒙古工业大学 | Mongolian machine translation method of pseudo parallel corpus integrating transfer learning |
CN114492974A (en) * | 2022-01-18 | 2022-05-13 | 国网浙江省电力有限公司电力科学研究院 | GIS gas state prediction method and system |
CN114403486B (en) * | 2022-02-17 | 2022-11-22 | 四川大学 | Intelligent control method of airflow type cut-tobacco drier based on local peak value coding circulation network |
CN114611792B (en) * | 2022-03-11 | 2023-05-02 | 南通大学 | Atmospheric ozone concentration prediction method based on mixed CNN-converter model |
CN117111646B (en) * | 2023-09-10 | 2024-05-24 | 福建天甫电子材料有限公司 | Etching solution concentration automatic control system |
CN117316334B (en) * | 2023-11-30 | 2024-03-12 | 南京邮电大学 | Water plant coagulant dosage prediction method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197736A (en) * | 2017-12-29 | 2018-06-22 | 北京工业大学 | A kind of Air Quality Forecast method based on variation self-encoding encoder and extreme learning machine |
CN109142171A (en) * | 2018-06-15 | 2019-01-04 | 上海师范大学 | The city PM10 concentration prediction method of fused neural network based on feature expansion |
CN110070224A (en) * | 2019-04-20 | 2019-07-30 | 北京工业大学 | A kind of Air Quality Forecast method based on multi-step recursive prediction |
-
2021
- 2021-03-26 CN CN202110322814.7A patent/CN113095550B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197736A (en) * | 2017-12-29 | 2018-06-22 | 北京工业大学 | A kind of Air Quality Forecast method based on variation self-encoding encoder and extreme learning machine |
CN109142171A (en) * | 2018-06-15 | 2019-01-04 | 上海师范大学 | The city PM10 concentration prediction method of fused neural network based on feature expansion |
CN110070224A (en) * | 2019-04-20 | 2019-07-30 | 北京工业大学 | A kind of Air Quality Forecast method based on multi-step recursive prediction |
Non-Patent Citations (1)
Title |
---|
A Sequence-to-Sequence Air Quality Predictor Based on the n-Step Recurrent Prediction;BO LIU 等;《IEEE ACCESS》;第43331-43343页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113095550A (en) | 2021-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113095550B (en) | Air quality prediction method based on variational recursive network and self-attention mechanism | |
CN108197736B (en) | Air quality prediction method based on variational self-encoder and extreme learning machine | |
Zhou et al. | A model for real-time failure prognosis based on hidden Markov model and belief rule base | |
Fan et al. | A novel machine learning method based approach for Li-ion battery prognostic and health management | |
CN110070224A (en) | A kind of Air Quality Forecast method based on multi-step recursive prediction | |
CN113065703A (en) | Time series prediction method combining multiple models | |
CN110987436B (en) | Bearing fault diagnosis method based on excitation mechanism | |
CN114218872B (en) | DBN-LSTM semi-supervised joint model-based residual service life prediction method | |
CN112949894B (en) | Output water BOD prediction method based on simplified long-short-term memory neural network | |
CN115542429A (en) | XGboost-based ozone quality prediction method and system | |
CN116187835A (en) | Data-driven-based method and system for estimating theoretical line loss interval of transformer area | |
CN115018191A (en) | Carbon emission prediction method based on small sample data | |
CN116757057A (en) | Air quality prediction method based on PSO-GA-LSTM model | |
Osman et al. | Soft Sensor Modeling of Key Effluent Parameters in Wastewater Treatment Process Based on SAE‐NN | |
CN117114184A (en) | Urban carbon emission influence factor feature extraction and medium-long-term prediction method and device | |
CN113762591A (en) | Short-term electric quantity prediction method and system based on GRU and multi-core SVM counterstudy | |
CN113537539B (en) | Multi-time-step heat and gas consumption prediction model based on attention mechanism | |
CN117521511A (en) | Granary temperature prediction method based on improved wolf algorithm for optimizing LSTM | |
CN116861256A (en) | Furnace temperature prediction method, system, equipment and medium for solid waste incineration process | |
CN110648023A (en) | Method for establishing data prediction model based on quadratic exponential smoothing improved GM (1,1) | |
CN116307115A (en) | Secondary water supply water consumption prediction method based on improved transducer model | |
CN115062528A (en) | Prediction method for industrial process time sequence data | |
Kang et al. | Research on forecasting method for effluent ammonia nitrogen concentration based on GRA-TCN | |
Liu et al. | A water quality prediction method based on long short-term memory neural network optimized by Cuckoo search algorithm | |
CN115879569B (en) | Online learning method and system for IoT observation data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |