CN113095550B - Air quality prediction method based on variational recursive network and self-attention mechanism - Google Patents

Air quality prediction method based on variational recursive network and self-attention mechanism Download PDF

Info

Publication number
CN113095550B
CN113095550B CN202110322814.7A CN202110322814A CN113095550B CN 113095550 B CN113095550 B CN 113095550B CN 202110322814 A CN202110322814 A CN 202110322814A CN 113095550 B CN113095550 B CN 113095550B
Authority
CN
China
Prior art keywords
data
hidden layer
encoder
information
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110322814.7A
Other languages
Chinese (zh)
Other versions
CN113095550A (en
Inventor
刘博�
李依楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110322814.7A priority Critical patent/CN113095550B/en
Publication of CN113095550A publication Critical patent/CN113095550A/en
Application granted granted Critical
Publication of CN113095550B publication Critical patent/CN113095550B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Artificial Intelligence (AREA)
  • Strategic Management (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Development Economics (AREA)
  • Mathematical Analysis (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computational Mathematics (AREA)
  • Operations Research (AREA)
  • Geometry (AREA)
  • Game Theory and Decision Science (AREA)
  • Medical Informatics (AREA)
  • Quality & Reliability (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computer Hardware Design (AREA)

Abstract

The invention discloses an air quality prediction method based on a variational recurrent neural network and a self-attention mechanism, which comprises the following steps: air quality data and weather data are acquired and preprocessed to construct input data and output data. The input data to the encoder includes contaminant data and historical meteorological data. The input data of the decoder comprises the output result of the encoder, weather forecast data and pollutant data of the last moment. The data is partitioned into training data and test data. Training the Seq2Seq model using training data: the predicted outcome is tested using the test data. The present invention predicts air quality using the Seq2Seq model. Firstly, a self-attention mechanism is introduced in the input stage of the encoder, so that characteristic factors are selected, dependence relations are needed for a long time are mastered, VRNN is used for replacing the RNN of the decoder in a model, complex dependence relations between different time steps of an output end are further captured, error accumulation is effectively reduced, and accordingly prediction accuracy is improved.

Description

Air quality prediction method based on variational recursive network and self-attention mechanism
Technical Field
The invention belongs to the technical field of data mining, and is mainly used for establishing an air quality prediction model.
Background
The accurate prediction research result of the air quality not only can control the change trend of the air pollution more intuitively, but also has important guiding significance in the fields of urban environmental pollution treatment, urban construction, public health and the like, and a plurality of scholars aim at the prediction research of the air quality in recent decades. In recent years, the deep learning method is widely applied to various time Sequence prediction problems, and the current mainstream model is the Sequence 2Seq through gradual development from RNN (Recurrent Neural Network), LSTM (Long Short-Term Memory), GRU (Gated Recurrent Unit) to the Sequence 2Seq (Sequence-to-Sequence), and the time Sequence prediction problem is also very suitable for air quality prediction, because the task of air quality prediction is to obtain a pollutant Sequence for a future period by using historical pollutants and weather information sequences. Current studies will generally employ the Seq2Seq and attention mechanisms. However, the current research has two problems, namely, the training speed of the Seq2Seq is too slow, because a model is generally built for each monitoring station by using deep learning to predict the air quality, and the prediction accuracy of such a statistical model is low along with the time, so that retraining is often needed for a period of time, and if a large number of models are trained simultaneously, a large amount of time is consumed, and therefore, the training needs to be accelerated. The other is that the air quality data are space-time heterogeneous data, and a large amount of noise exists at the same time, so that the current mainstream model cannot model the high variability of the prediction data, and therefore the prediction precision disturbance is extremely large, and the problem of low prediction precision is caused.
Disclosure of Invention
The invention aims to solve the problem of slow training speed of the Seq2Seq model, and introduces a strong dependency relationship of potential semantic variable capturing prediction time steps so as to improve prediction accuracy.
For the problem of slow training of the Seq2Seq, the root of the problem is that the training speed of the RNN is slow, and the calculation of each time step of the RNN needs to wait for the last time step to finish, so that the calculation cannot be performed in parallel. And the sequence coding of the RNN is only suitable for the short-distance dependency relationship due to the gradient disappearance problem when the long-distance dependency relationship is processed. A long-distance dependency relationship between input sequences is established, a fully-connected network can be used, but the fully-connected network cannot process variable-length sequences, so that an attention model capable of dynamically generating weights is used for replacing a fully-connected layer, and position codes are added to retain time sequence information of the input sequences. After each time step uses the self-attention mechanism, all time steps can complete calculation in parallel, and can process variable-length sequences, and because the self-attention mechanism can capture the dependency relationship of input sequences, the training speed can be effectively improved. In addition, VRNN recursive prediction is applied to the decoder, as shown in fig. 1. The reason for the large fluctuation of the prediction error is that the air quality data is space-time heterogeneous data, is highly structured data, has extremely large fluctuation due to disturbance of environmental noise, and has relatively small error in predicting the previous time steps, but the prediction error of the current time step is larger because the prediction is recursive, the prediction input is the prediction result of the last time step and the last time step is error as the latter time step is reached. The decoder is replaced by VRNN, potential semantic information among different time steps in a prediction stage can be captured, internal relations of the different time steps are examined, potential random variables are introduced into the Seq2Seq model to guide the generation process of hidden layer variables, prediction input depends on hidden layer states, and therefore the generation of prediction output is indirectly influenced by the introduced potential random variables. Meanwhile, in order to train the posterior probability model in a deep learning environment, a neural network and a heavy parameter method are adopted to approximate the posterior probability. Therefore, the method can be used for mutually constraining in different time steps in the prediction stage to generate a robust and complex dependency relation model, and capturing global context semantics, so that the performance of the Seq2Seq model is improved, and errors are reduced.
The technical scheme adopted by the invention is an air quality prediction method based on a variation recursive network and a self-attention mechanism, and the method comprises the following steps:
step 1, acquiring air quality data and atmospheric data, performing pretreatment operations such as arrangement and cleaning on the data, and constructing input data and output data; the input data of the encoder includes contaminant data and historical meteorological data; the input data of the decoder comprises the output result of the encoder, weather forecast data and pollutant data at the last moment;
step 2, dividing the data into training data and test data;
step 3, constructing an AVAQP model, and training the AVAQP model by using training data:
1) And inputting the input data and the position codes into the encoder to obtain the hidden layer state of the encoder at each moment.
2) Constructing a variation inference model of the potential random variable, and calculating the potential random variable z j
3) And taking the prediction result and the latent semantic variable obtained in the last time step as the input of the current time step, and obtaining the hidden layer state of the decoder VRNN.
4) The context vector is derived using the decoder hidden layer state and the encoder state.
5) The input data at the next time, including the predicted concentration at the previous time and the weather data at the next time, the potentially random information, the decoder hidden layer state, and the context information are used to generate a predicted probability distribution.
6) Constructing a loss function and optimizing using a gradient descent algorithm
And 4, testing the prediction result by using the test data.
The present invention predicts air quality using the Seq2Seq model. The self-attention model is used for replacing the RNN of the encoder, and the position coding is used for preserving the time sequence relation of the input sequence, so that the effect of accelerating training while keeping the prediction precision is achieved. The prediction process adopts n-step recursive prediction, so that error accumulation can be effectively reduced, and the prediction precision is improved.
Drawings
Fig. 1 is a flow chart of the AVAQP training.
Fig. 2 is an internal structural diagram of the GRU.
Fig. 3 is a schematic diagram of a single decoding time step of the AVAQP.
Detailed Description
Taking air quality prediction as an example, the following is a detailed description of the present invention with reference to the examples and the accompanying drawings.
The present invention uses a PC and requires a GPU with sufficient computing power to accelerate training. As shown in fig. 1, the air quality prediction method based on the extreme learning machine provided by the invention comprises the following specific steps:
step 1, acquiring data and preprocessing the data to construct input and output;
the acquired data typically includes air quality data and weather data that need to be processed into an input sequence and an output sequence, typically the input sequence includes contaminant data and weather data over a period of time. Let D= { X, Y } beThe data set after processing. Where X is the input sequence, i.e., historical data, including contaminant data and weather data. For each input sequence x εR S×Q The length of the device is S, namely historical data of the past S hours, and the device has Q characteristics, namely pollutant data such as PM2.5, carbon monoxide, sulfur dioxide and the like and weather data such as temperature, humidity and the like. For each target sequence y εR T And has a length of T, i.e., pollutant data for a future T hours. In practice, y may contain multiple targets, such as e.g. PM2.5, carbon monoxide, sulphur dioxide, etc. as predicted by time.
And 2, dividing the data into training data and test data.
And (3) dividing the sample obtained in the step (2) into training data and test data, wherein the training data is used for training a model, and the test data is used for testing the effect of the model.
And 3, training the AVAQP model by using training data.
1) And inputting the input data and the position codes into the encoder to obtain the hidden layer state of the encoder at each moment.
Inputting the input data and the position codes into an encoder to obtain the hidden layer state of the encoder at each moment; performing linear transformation on the input data to obtain three groups of vector sequences Q, K, T; the query vector sequence, key vector sequence and value vector sequence in the self-attention mechanism are calculated as follows:
Q=W Q (X+PE)
K=(W K X+PE)
V=(W V X+PE)
wherein W is Q 、W K 、W V The PE is a position coding matrix and is the same as the dimension of input data; adding position codes to supplement sequence position information; each row corresponds to an input sequence.
Inputting the converted vector sequence into an encoder to obtain the hidden layer state of the encoder at each moment; the hidden layer state of the encoder is calculated as follows:
wherein the method comprises the steps ofIs the state of the hidden layer, i, j E [1, N]The positions of the current time step sequence and other sequences are respectively. Connection weight alpha ij Dynamically generated by an attention mechanism; note also that here the activation function is tan, which is to be consistent with the activation function of the decoder, defined as:
the attention scoring function uses a scaled dot product, which can be written as:
wherein d is s Is a manually set super parameter, in order to make the gradient more stable.
2) Constructing a variation inference model of the potential random variable, and calculating the potential random variable z j The method comprises the steps of carrying out a first treatment on the surface of the The key to VRNN is modeling the distribution associated with potentially random variables. The posterior probability and the prior probability are fitted with two neural networks, respectively, wherein the posterior probability model can be expressed asThe mean and variance calculation formula is:
wherein h is Is a potential random variationThe semantic space of the quantities is estimated by a nonlinear fitting method. The prior probability model is similar to the posterior probability model, but it is noted that the parameters between them are not shared. z τ The calculation formula of (2) is as follows:
z τ =μ ττ ⊙∈
where epsilon is the noise introduced,z for each time step j Non-stationary, further improving the prediction robustness.
3) Taking the prediction result and the latent semantic variable obtained in the last time step as the input of the current time step, and obtaining the hidden layer state of the decoder VRNN; the decoder adopts a gate control circulation unit GRU, and the output of each moment of the GRU is output; firstly, calculating the value of an update gate in the GRU, and updating the information of the gate control entering the current unit; the update gate calculation formula for the τ+1th time step is:
u τ+1 =σ(W u h τ +U u x τ+1 +C u c τ +V u z τ +b u )
wherein u is τ Is an update door, W u 、U u 、C u 、V u And b u Respectively represent the weight and bias of the updates, h τ The hidden layer state of the GRU at the previous moment is the characteristic obtained after GRU processing at the previous moment, and x is the value of the hidden layer τ+1 The input data representing the current time may be y τ I.e. the result of the prediction of the last time step; weather forecast data can also be input together in the case of weather forecast, namely [ y ] τ ,wf τ ]Wherein wf τ Is weather forecast data required by the current time step; c τ Is the context variable calculated at the current moment; notably, z τ The method has important influence on the representation of the hidden layer state of the decoder, and can capture the characteristics between the prediction outputs of adjacent time steps; sigma represents a logistic function, which is defined as follows:
then calculating the value of a reset gate, wherein the reset gate is used for selectively forgetting the previous information, and if the current moment is winded, forgetting the information that the current moment is not winded; the meaning and calculation mode of the reset gate parameter are similar to those of the update gate, and the calculation formula is as follows:
r τ+1 =σ(W r h τ +U r x τ+1 +C r c τ +V r z τ +b r )
next, candidate output is calculatedIt represents new information obtained by fusing the information of the last step and the current information, and the calculation formula is as follows:
at the moment, the reset gate is responsible for controlling the information obtained in the last step to be forgotten, and the value range of the logistic function is (0, 1), so that the value range of the reset gate is (0, 1); when the value of the reset gate is close to 0, the information of the last step is almost completely forgotten, so that the effect of resetting is achieved; when the value of the reset gate is close to 1, the information of the last step is almost completely reserved; and finally, calculating the state of the GRU hidden layer, wherein the calculation formula is as follows:
the update gate controls the proportion of the new information and the information of the last step, and when the update gate value is close to 1, the new information proportion is close to 100%; when the value of the update gate is close to 0, the information of the last step is close to 100%.
4) The context vector is derived using the decoder hidden layer state and the encoder state. The attention vector determines the importance of each instant of the encoding result, which is measured by the similarity of the decoder hidden layer state and the encoder hidden layer state. The importance of each instant of the encoding result can therefore be calculated by the following formula:
after normalizing the result, the attention vector can be obtained:
a τ the greater the value, the greater the impact it has on the current decoding time. Use a τ Calculating a weighted average for the encoding result to obtain context c τ It represents a feature of the past contamination and meteorological data that is useful for predicting the current moment. Finally, the prediction result can be obtained by the following formula:
y τ =W p *[h τ ,c τ ,z τ ]+b p
5) Generating a predictive probability distribution using the input data at the next time, including the predicted concentration at the previous time and the weather data at the next time, the potentially random information, the decoder hidden layer state, and the context information, defined as:
p(y τ |X,y ,z τ )=exp{g(W d [y τ-1 ;h τ ;c τ ;z τ ]+b d )}
where g is the activation function.
6) Constructing a loss function and optimizing by using a gradient descent algorithm; for deep learning model training, small batch and batch gradient descent is adopted, and due to probability expectation, a Monte Carlo method is adopted to approximate expectation. So for a small batch of data, its loss function is calculated by the following formula:
where L is the number of samples in a small batch of data; the parameters in the model can ultimately be adjusted using a gradient descent algorithm to minimize the loss function, while the gradient used for gradient descent is calculated using a back-propagation algorithm or an automatic differentiation tool.
Step 4, testing the prediction result by using the test data
Inputting the test data into an AVAQP model to obtain a prediction sequence of each sample, and adjusting parameters of the neural network to obtain a better result if the test result is not ideal.
The above embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, the scope of which is defined by the claims. Various modifications and equivalent arrangements of this invention will occur to those skilled in the art, and are intended to be within the spirit and scope of the invention.

Claims (4)

1. An air quality prediction method based on a self-attention mechanism and a variation recursion network is characterized in that: the method comprises the following steps:
step 1, acquiring air quality data and atmospheric data, and performing arrangement and cleaning pretreatment on the data to construct input data and output data; the input data of the encoder includes contaminant data and historical meteorological data; the input data of the decoder comprises the output result of the encoder, weather forecast data and pollutant data at the last moment;
step 2, dividing the data into training data and test data;
step 3, constructing an AVAQP model, and training the AVAQP model by using training data:
1) Inputting the input data and the position codes into an encoder to obtain the hidden layer state of the encoder at each moment;
2) Constructing a variation inference model of the potential random variable, and calculating the potential random variable z j
3) Taking the prediction result and the latent semantic variable obtained in the last time step as the input of the current time step, and obtaining the hidden layer state of the decoder VRNN;
4) Obtaining a context vector using the decoder hidden layer state and the encoder state;
5) Generating a predictive probability distribution using input data at a next time, including a predicted concentration at a previous time and weather data at a next time, potentially random information, decoder hidden layer state, and context information;
6) Constructing a loss function and optimizing using a gradient descent algorithm
Step 4, testing the prediction result by using the test data;
in step 3, constructing an AVAQP model, and training the AVAQP model by using training data;
1) Inputting the input data and the position codes into an encoder to obtain the hidden layer state of the encoder at each moment; performing linear transformation on the input data to obtain three groups of vector sequences Q, K, V; the query vector sequence, key vector sequence and value vector sequence in the self-attention mechanism are calculated as follows:
Q=W Q (X+PE)
K=W K (X+PE)
V=W V (X+PE)
wherein W is Q 、W K 、W V The PE is a position coding matrix and is the same as the dimension of input data; adding position codes to supplement sequence position information; each row corresponds to an input sequence;
inputting the converted vector sequence into an encoder to obtain the hidden layer state of the encoder at each moment; the hidden layer state of the encoder is calculated as follows:
wherein the method comprises the steps ofIs the state of the hidden layer, i, j E [1, N]The positions of the current time step sequence and other sequences are respectively; connection weight alpha ij Dynamically generated by an attention mechanism; note also that here the activation function is tan, which is to be consistent with the activation function of the decoder, defined as:
the attention scoring function uses a scaled dot product, written as:
wherein d is s The super parameter is artificially set, and the purpose is to make the gradient more stable;
2) Constructing a variation inference model of the potential random variable, and calculating the potential random variable z j The method comprises the steps of carrying out a first treatment on the surface of the The key to VRNN is modeling the distribution associated with potentially random variables; the posterior probability and the prior probability are respectively fitted by two neural networks, wherein the posterior probability model is expressed asThe mean and variance calculation formula is:
wherein h is Is the semantic space of potential random variables, estimated by a nonlinear fitting method; the prior probability model is similar to the posterior probability model, and parameters between the prior probability model and the posterior probability model are not shared; z τ The calculation formula of (2) is as follows:
z τ =μ ττ ⊙∈
where epsilon is the noise introduced,let +.>Non-stationary, further improving the predictive robustness;
3) Taking the prediction result and the latent semantic variable obtained in the last time step as the input of the current time step, and obtaining the hidden layer state of the decoder VRNN; the decoder adopts a gate control circulation unit GRU, and the output of each moment of the GRU is output; firstly, calculating the value of an update gate in the GRU, and updating the information of the gate control entering the current unit;
the update gate calculation formula for the τ+1th time step is:
wherein u is τ Is an update door, W u 、U u 、C u 、V u And b u Respectively represent the weight and bias of the update gate, h τ The hidden layer state of the GRU at the previous moment is the characteristic obtained after GRU processing at the previous moment, and x is the value of the hidden layer τ+1 Input data y representing the current time τ I.e. the result of the prediction of the last time step; the weather forecast data are input together under the condition of weather forecast, namely [ y ] τ ,wf τ ]Wherein wf τ Is weather forecast data required by the current time step; c τ Is the context variable calculated at the current moment; it is to be noted that,the method has important influence on the representation of the hidden layer state of the decoder, and can capture the characteristics between the prediction outputs of adjacent time steps; sigma represents a logistic function, which is defined as follows:
then calculating the value of a reset gate, wherein the reset gate is used for selectively forgetting the previous information, and if the current moment is winded, forgetting the information that the current moment is not winded; the meaning and the calculation mode of the reset gate parameter are consistent with those of the update gate, and the calculation formula is as follows:
next, candidate output is calculatedIt represents new information obtained by fusing the information of the last step and the current information, and the calculation formula is as follows:
at the moment, the reset gate is responsible for controlling the information obtained in the last step to be forgotten, and the value range of the logistic function is (0, 1), so that the value range of the reset gate is (0, 1); when the value of the reset gate is 0, the information of the previous step is forgotten completely, so that the reset effect is achieved; when the value of the reset gate is 1, the information of the last step is almost completely reserved; and finally, calculating the state of the GRU hidden layer, wherein the calculation formula is as follows:
the update gate controls the proportion of the new information and the information of the last step, and when the update gate takes a value of 1, the new information accounts for 100 percent; when the value of the update gate is 0, the information of the previous step accounts for 100 percent;
4) Obtaining a context vector using the decoder hidden layer state and the encoder state; the attention vector determines the importance of each moment of the encoding result, the importance being measured by the similarity of the decoder hidden layer state and the encoder hidden layer state; the importance of each instant of the encoding result is therefore calculated by the following formula:
after normalizing the result, an attention vector is obtained:
a τ the greater the value, the greater the impact it has on the current decoding moment; use a τ Calculating a weighted average for the encoding result to obtain context c τ It represents a feature of past contamination and meteorological data useful for current time prediction; finally, the prediction result can be obtained by the following formula:
5) Generating a predictive probability distribution using the input data at the next time, including the predicted concentration at the previous time and the weather data at the next time, the potentially random information, the decoder hidden layer state, and the context information, defined as:
wherein g is an activation function;
6) Constructing a loss function and optimizing by using a gradient descent algorithm; for deep learning model training, small batch gradient descent is adopted, and due to probability expectation, a Monte Carlo method is adopted to approximate expectation; so for a small batch of data, its loss function is calculated by the following formula:
where L is the number of samples in a small batch of data; finally, the gradient descent algorithm is used to adjust the parameters in the model to minimize the loss function, while the gradient used for gradient descent is calculated using a back-propagation algorithm or an automatic differentiation tool.
2. An air quality prediction method based on a self-attention mechanism and a variational recursive network according to claim 1, wherein: the implementation process of step 1 is as follows,
the atmospheric data crawled through python comprises atmospheric pollutant data and weather data, and is preprocessed, wherein the preprocessing comprises the steps of deleting repeated values, filling the missing values, and then carrying out normalization processing to divide the input sequence and the output sequence; the input data includes contaminant data and weather data for 72 hours of history; let d= { X, Y } be the dataset after processing; wherein X is an input sequence, i.e., historical data, including contaminant data and weather data; for each input sequence x εR S×Q The length of the sensor is S, namely historical data of the past S hours, and the sensor has Q characteristics, namely PM2.5, carbon monoxide, sulfur dioxide pollutant data and temperature and humidity weather data; for each target sequence y εR T The length of the sample is T, namely pollutant data of the future T hours; y contains multiple targets.
3. An air quality prediction method based on a self-attention mechanism and a variational recursive network according to claim 1, wherein: and (3) dividing the sample obtained in the step (2) into training data and test data, wherein the training data is used for training a model, and the test data is used for testing the effect of the model.
4. An air quality prediction method based on a self-attention mechanism and a variational recursive network according to claim 1, wherein: the implementation process of step 4 is as follows,
inputting the test data into an AVAQP model to obtain a prediction sequence of each sample, and adjusting parameters of the neural network to obtain a better result if the test result is not ideal.
CN202110322814.7A 2021-03-26 2021-03-26 Air quality prediction method based on variational recursive network and self-attention mechanism Active CN113095550B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110322814.7A CN113095550B (en) 2021-03-26 2021-03-26 Air quality prediction method based on variational recursive network and self-attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110322814.7A CN113095550B (en) 2021-03-26 2021-03-26 Air quality prediction method based on variational recursive network and self-attention mechanism

Publications (2)

Publication Number Publication Date
CN113095550A CN113095550A (en) 2021-07-09
CN113095550B true CN113095550B (en) 2023-12-08

Family

ID=76669979

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110322814.7A Active CN113095550B (en) 2021-03-26 2021-03-26 Air quality prediction method based on variational recursive network and self-attention mechanism

Country Status (1)

Country Link
CN (1) CN113095550B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113743648B (en) * 2021-07-30 2022-07-15 中科三清科技有限公司 Air quality ensemble forecasting method, device, equipment and readable storage medium
CN113762351B (en) * 2021-08-12 2023-12-05 吉林大学 Air quality prediction method based on deep transition network
CN113657122B (en) * 2021-09-07 2023-12-15 内蒙古工业大学 Mongolian machine translation method of pseudo parallel corpus integrating transfer learning
CN114492974A (en) * 2022-01-18 2022-05-13 国网浙江省电力有限公司电力科学研究院 GIS gas state prediction method and system
CN114403486B (en) * 2022-02-17 2022-11-22 四川大学 Intelligent control method of airflow type cut-tobacco drier based on local peak value coding circulation network
CN114611792B (en) * 2022-03-11 2023-05-02 南通大学 Atmospheric ozone concentration prediction method based on mixed CNN-converter model
CN117111646B (en) * 2023-09-10 2024-05-24 福建天甫电子材料有限公司 Etching solution concentration automatic control system
CN117316334B (en) * 2023-11-30 2024-03-12 南京邮电大学 Water plant coagulant dosage prediction method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197736A (en) * 2017-12-29 2018-06-22 北京工业大学 A kind of Air Quality Forecast method based on variation self-encoding encoder and extreme learning machine
CN109142171A (en) * 2018-06-15 2019-01-04 上海师范大学 The city PM10 concentration prediction method of fused neural network based on feature expansion
CN110070224A (en) * 2019-04-20 2019-07-30 北京工业大学 A kind of Air Quality Forecast method based on multi-step recursive prediction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197736A (en) * 2017-12-29 2018-06-22 北京工业大学 A kind of Air Quality Forecast method based on variation self-encoding encoder and extreme learning machine
CN109142171A (en) * 2018-06-15 2019-01-04 上海师范大学 The city PM10 concentration prediction method of fused neural network based on feature expansion
CN110070224A (en) * 2019-04-20 2019-07-30 北京工业大学 A kind of Air Quality Forecast method based on multi-step recursive prediction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
A Sequence-to-Sequence Air Quality Predictor Based on the n-Step Recurrent Prediction;BO LIU 等;《IEEE ACCESS》;第43331-43343页 *

Also Published As

Publication number Publication date
CN113095550A (en) 2021-07-09

Similar Documents

Publication Publication Date Title
CN113095550B (en) Air quality prediction method based on variational recursive network and self-attention mechanism
CN108197736B (en) Air quality prediction method based on variational self-encoder and extreme learning machine
Zhou et al. A model for real-time failure prognosis based on hidden Markov model and belief rule base
Fan et al. A novel machine learning method based approach for Li-ion battery prognostic and health management
CN110070224A (en) A kind of Air Quality Forecast method based on multi-step recursive prediction
CN113065703A (en) Time series prediction method combining multiple models
CN110987436B (en) Bearing fault diagnosis method based on excitation mechanism
CN114218872B (en) DBN-LSTM semi-supervised joint model-based residual service life prediction method
CN112949894B (en) Output water BOD prediction method based on simplified long-short-term memory neural network
CN115542429A (en) XGboost-based ozone quality prediction method and system
CN116187835A (en) Data-driven-based method and system for estimating theoretical line loss interval of transformer area
CN115018191A (en) Carbon emission prediction method based on small sample data
CN116757057A (en) Air quality prediction method based on PSO-GA-LSTM model
Osman et al. Soft Sensor Modeling of Key Effluent Parameters in Wastewater Treatment Process Based on SAE‐NN
CN117114184A (en) Urban carbon emission influence factor feature extraction and medium-long-term prediction method and device
CN113762591A (en) Short-term electric quantity prediction method and system based on GRU and multi-core SVM counterstudy
CN113537539B (en) Multi-time-step heat and gas consumption prediction model based on attention mechanism
CN117521511A (en) Granary temperature prediction method based on improved wolf algorithm for optimizing LSTM
CN116861256A (en) Furnace temperature prediction method, system, equipment and medium for solid waste incineration process
CN110648023A (en) Method for establishing data prediction model based on quadratic exponential smoothing improved GM (1,1)
CN116307115A (en) Secondary water supply water consumption prediction method based on improved transducer model
CN115062528A (en) Prediction method for industrial process time sequence data
Kang et al. Research on forecasting method for effluent ammonia nitrogen concentration based on GRA-TCN
Liu et al. A water quality prediction method based on long short-term memory neural network optimized by Cuckoo search algorithm
CN115879569B (en) Online learning method and system for IoT observation data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant