CN112884243A - Air quality analysis and prediction method based on deep learning and Bayesian model - Google Patents
Air quality analysis and prediction method based on deep learning and Bayesian model Download PDFInfo
- Publication number
- CN112884243A CN112884243A CN202110282474.XA CN202110282474A CN112884243A CN 112884243 A CN112884243 A CN 112884243A CN 202110282474 A CN202110282474 A CN 202110282474A CN 112884243 A CN112884243 A CN 112884243A
- Authority
- CN
- China
- Prior art keywords
- data
- model
- deep learning
- prediction
- network model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013135 deep learning Methods 0.000 title claims abstract description 52
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000004458 analytical method Methods 0.000 title claims abstract description 27
- 238000012544 monitoring process Methods 0.000 claims abstract description 60
- 238000003062 neural network model Methods 0.000 claims abstract description 33
- 125000004122 cyclic group Chemical group 0.000 claims abstract description 27
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000004422 calculation algorithm Methods 0.000 claims description 21
- 238000012549 training Methods 0.000 claims description 18
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 10
- 230000002159 abnormal effect Effects 0.000 claims description 10
- 238000010219 correlation analysis Methods 0.000 claims description 10
- 238000012360 testing method Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 8
- 230000000306 recurrent effect Effects 0.000 claims description 7
- 238000011160 research Methods 0.000 claims description 7
- 230000003044 adaptive effect Effects 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 6
- 230000002596 correlated effect Effects 0.000 claims description 5
- 238000011109 contamination Methods 0.000 claims description 4
- 238000013461 design Methods 0.000 claims description 4
- 230000006872 improvement Effects 0.000 claims description 4
- 239000003344 environmental pollutant Substances 0.000 description 15
- 231100000719 pollutant Toxicity 0.000 description 15
- 238000013528 artificial neural network Methods 0.000 description 13
- 238000013527 convolutional neural network Methods 0.000 description 12
- 230000015654 memory Effects 0.000 description 12
- 238000000605 extraction Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 5
- 239000000809 air pollutant Substances 0.000 description 4
- 231100001243 air pollutant Toxicity 0.000 description 4
- 230000006403 short-term memory Effects 0.000 description 4
- 230000002776 aggregation Effects 0.000 description 3
- 238000004220 aggregation Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 238000009792 diffusion process Methods 0.000 description 3
- 210000002569 neuron Anatomy 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 239000000356 contaminant Substances 0.000 description 2
- 230000008034 disappearance Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000003915 air pollution Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000012412 chemical coupling Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000013277 forecasting method Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012821 model calculation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013618 particulate matter Substances 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Tourism & Hospitality (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Mathematical Analysis (AREA)
- Game Theory and Decision Science (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Entrepreneurship & Innovation (AREA)
- Computational Mathematics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Algebra (AREA)
- Probability & Statistics with Applications (AREA)
- Educational Administration (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an air quality analysis and prediction method based on deep learning and Bayesian model, which has the technical scheme that the method comprises the steps of obtaining AQI data of target monitoring points; preprocessing the AQI data, and normalizing the AQI data; respectively constructing a deep learning convolution network model, a cyclic neural network model and a leaf bass dynamic linear model; respectively inputting AQI data into a deep learning convolution network model and a leaf-Bayesian dynamic linear model, and outputting first prediction AQI data after the leaf-Bayesian dynamic linear model operates; inputting the features extracted by the deep learning convolutional network model into a cyclic neural network model, and outputting second prediction AQI data after operation; the first prediction AQI data and the second prediction AQI data are input into a mixed model, and the mixed model outputs final prediction AQI data after operation.
Description
Technical Field
The invention relates to an atmospheric pollutant concentration prediction method, in particular to an air quality analysis prediction method based on deep learning and a Bayesian model.
Background
The quality of the atmospheric quality is a problem which continuously receives attention in recent years, and a large number of atmospheric quality monitoring stations are added in China for monitoring local atmospheric quality and meteorological data. The atmospheric quality data that monitoring station can monitor wherein comprises 6 factors, is respectively: particulate matter (PM2.5 and PM10) and gaseous matter (NO)2,CO,O3And SO2) The data are called AQI data in a unified way; in addition, the monitoring points can also acquire meteorological data of the area, such as weather, temperature, pressure, humidity, wind direction and wind speed, which are collectively called as MEO data.
Because meteorological environment factors are complex, index prediction of atmospheric pollutant concentration is always a complex problem. At present, commonly used prediction methods include a mechanism prediction method based on an atmospheric chemical transmission model and a statistical prediction method based on a machine learning model. The former method is widely applied to actual engineering, but because the atmosphere is a very complex system and is theoretically difficult to operate and fully quantize, a mechanism forecasting method has a large error.
At present, the forecast of weather conditions and the concentration of various pollutants by the national weather bureau is obtained by adopting an atmospheric chemical coupling mode (WRF-Chem) operation. Because numerical model calculations and emission source inventory data have errors of different degrees, the prediction effect of the model on the pollutant concentration is not ideal.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an air quality analysis and prediction method based on deep learning and a Bayesian model, which can analyze and predict air quality, evaluate the atmosphere improvement condition, clarify the pollution source and provide air pollution prevention and control suggestions.
In order to achieve the purpose, the invention provides the following technical scheme: an air quality analysis and prediction method based on deep learning and Bayesian model comprises the following steps:
step S1: acquiring AQI data of a target monitoring point;
step S2: preprocessing AQI data, judging abnormal values in a data sequence according to a Laobe criterion, removing the abnormal values, and completing missing data at a certain moment by adopting a linear interpolation method;
step S3: carrying out normalization processing on the AQI data;
step S4: respectively constructing a deep learning convolution network model, a cyclic neural network model and a leaf bass dynamic linear model;
step S5: respectively inputting the normalized AQI data into a deep learning convolution network model and a leaf-Bayesian dynamic linear model, wherein after the deep learning convolution network model operates, a long input sequence is converted into a short sequence formed by high-level features, and after the leaf-Bayesian dynamic linear model operates, a first prediction AQI data is output;
step S6: inputting a sequence consisting of features extracted by the deep learning convolutional network model into a cyclic neural network model, and outputting second prediction AQI data after the cyclic neural network model operates;
step S7: and constructing a mixed model, inputting the first prediction AQI data and the second prediction AQI data into the value mixed model, and outputting final prediction AQI data after the mixed model operates.
The invention is further configured to: the normalization processing in step S3 is to reduce the influence of different orders of magnitude or different dimensions on the data by keeping the value range of the data within a relatively small fluctuation range, set the characteristic distribution as a normal distribution, and map the characteristic to the standard normal distribution by the variance and the mean, and the calculation formula is:
wherein y ismeanIs the mean value of all the sample data,ystdis the standard deviation of all sample data.
The invention is further configured to: the step S4 specifically includes:
step S41, selecting training data and test data from the AQI data according to the constructed model, and completing initialization of a deep learning convolution network model, a cyclic neural network model and a leaf Bayes dynamic linear model;
step S42, training a deep learning convolution network model, a cyclic neural network model and a leaf bass dynamic linear model by using training data;
step S43, obtaining a test prediction result according to the test data by utilizing the trained deep learning convolution network model, the trained cyclic neural network model and the trained leaf-Bayes dynamic linear model;
and step S44, predicting by using the trained deep learning convolution network model, the trained cyclic neural network model and the trained leaf-Bayes dynamic linear model.
The invention is further configured to: in step S5, the bayesian dynamic linear model includes: observing an equation, a state equation and initial information, regarding the prediction distribution as conditional probability distribution, solving the prediction distribution according to prior information, solving posterior information by using a Bayesian formula, and correcting the prior information to solve a predicted value.
The invention is further configured to: for the recurrent neural network model, the loss function of the training phase is as follows:
where a is the prediction value and y is the sample value.
The invention is further configured to: the cyclic neural network model also comprises an Adam algorithm and a Dropout algorithm;
the Adam algorithm is used for calculating a first moment estimation and a second moment estimation of the gradient to design independent adaptive learning rates for different parameters;
the Dropout algorithm is used to reduce the dependency between features, reducing the probability of over-fitting occurring.
The invention is further configured to: step S8, obtaining MEO data;
step S9, carrying out correlation analysis based on the MEO data and the AQI data;
step S10, carrying out backward trace and potential source contribution analysis based on the MEO data and the AQI data;
and step S11, importing the correlation analysis result, the backward trace and the potential source contribution analysis result into the final prediction AQI data together to obtain a comprehensive improvement suggestion.
The invention is further configured to: the correlation analysis in step S9 specifically includes: taking PM2.5 and PM10 as first variables, and taking weather, temperature, air pressure, humidity, wind speed and wind direction as second variables, the following formulas are introduced:
wherein xiAnd yiIn order to compare the two variables of the correlation,is a variable xiThe average value of (a) of (b),is a variable yiR is a spearman correlation coefficient, r is +1 or-1 when the two variables are perfectly monotonically correlated, and r is 0 when the two variables are uncorrelated.
The invention is further configured to: the backward trajectory and potential source contribution analysis in step S10 specifically includes: dividing a research area into i multiplied by j grids according to the longitude and latitude, wherein the PSCF calculation formula is as follows:
wherein n isijTo pass through a certain pointNumber of all air flow paths, m, of grid (i, j)ijIs the number of contamination traces passing through grid (i, j).
In conclusion, the invention has the following beneficial effects: obtaining air quality data AQI (PM2.5, PM10, NO)2,CO,O3,SO2) The historical monitoring data is obtained by considering the time sequence characteristics of air quality data, judging abnormal values in a data sequence by adopting a Lauda criterion and removing the abnormal values, completing missing data at a certain moment by adopting a linear interpolation method, mapping different characteristic data onto the same scale before data modeling, carrying out normalization processing on the characteristic data, and then constructing a deep learning convolution network model, a cyclic neural network model and a leaf Bayesian dynamic linear model.
The deep learning convolutional neural network CNN is used as a feature extraction: the air quality data has multiple dimensions and difficult feature extraction, the deep learning convolutional neural network CNN locally extracts features through convolutional kernels, and weights are shared, so that the defect of excessive parameters of an artificial neural network is overcome, the feature extraction effect is good, the deep learning convolutional neural network CNN has strong feature extraction capability, a long input sequence can be converted into a Short sequence consisting of high-level features, and the sequence consisting of the extracted features is used as the input of a recurrent neural network-long Short-Term memory neural network LSTM (Long Short Term memory).
The recurrent neural network model (long-short term memory neural network LSTM) is used as a prediction model: because the concentration of air pollutants has strong correlation with time, the memory-related problem can be well treated by using the long-short term memory neural network LSTM. The LSTM is improved and optimized on the basis of the RNN, the problem of gradient disappearance in the training process is solved, a group of memory modules are contained in a model structure and are mutually associated to replace memory units in the common RNN, the LSTM is easier to train than the common RNN, and the LSTM has good research effects in multiple fields at present.
The LSTM input is an hour characteristic, namely AQI and six pollutant indexes at a certain moment, and the output is a neuron for predicting AQL
Bayesian dynamic linear model DLM: bayesian prediction is a predictive method developed to predict the need for an incident. The method not only depends on historical measurement data to predict according to the knowledge of a model, but also comprises the experience information and subjective judgment of experts to predict the emergency, and is particularly useful for predicting the emergency.
The basic idea of Bayesian prediction is to establish a dynamic model, regard the prediction distribution as conditional probability distribution, solve the prediction distribution according to prior information, solve posterior information by using Bayesian formula, and correct the prior information to solve the prediction value. The Bayesian dynamic linear model consists of an observation equation, a state equation and initial information.
Mixing the models: after the model framework is built, a long-short term memory neural network LSTM + Bayesian dynamic linear model DLM hybrid model is built. The input of the LSTM model is historical AQI data and six pollutant indexes, and the output is prediction AQI; the input of the Bayesian dynamic linear model is historical AQI data and empirical information, and the output is predicted AQI. 2 prediction model outputs AQI are fused to obtain a new prediction result, so that the model becomes feature-diversified, and has stronger learning ability and higher prediction accuracy.
Drawings
Fig. 1 is a schematic block diagram of an air quality analysis prediction method.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. In which like parts are designated by like reference numerals. It should be noted that the terms "front," "back," "left," "right," "upper" and "lower" used in the following description refer to directions in the drawings, and the terms "bottom" and "top," "inner" and "outer" refer to directions toward and away from, respectively, the geometric center of a particular component.
The first embodiment is as follows: referring to fig. 1, in order to achieve the above object, the present invention provides the following technical solutions: an air quality analysis and prediction method based on deep learning and Bayesian model comprises the following steps:
step S1: acquiring AQI data of a target monitoring point;
step S2: preprocessing AQI data, judging abnormal values in a data sequence according to a Laobe criterion, removing the abnormal values, and completing missing data at a certain moment by adopting a linear interpolation method;
step S3: carrying out normalization processing on the AQI data;
step S4: respectively constructing a deep learning convolution network model, a cyclic neural network model and a leaf bass dynamic linear model;
step S5: inputting the normalized AQI data into a deep learning convolution network model and a leaf Bayes dynamic linear model respectively, converting a long input sequence into a short sequence consisting of high-level features after the deep learning convolution network model operates, and outputting first prediction AQI data after the leaf Bayes dynamic linear model operates;
step S6: inputting a sequence consisting of features extracted by the deep learning convolutional network model into the cyclic neural network model, and outputting second prediction AQI data after the cyclic neural network model operates;
step S7: and constructing a mixed model, inputting the first prediction AQI data and the second prediction AQI data into the mixed model, and outputting final prediction AQI data after the mixed model operates.
The design of the invention is as follows: obtaining air quality data AQI (PM2.5, PM10, NO)2,CO,O3,SO2) The historical monitoring data is obtained by considering the time sequence characteristics of air quality data, judging abnormal values in a data sequence by adopting a Lauda criterion and removing the abnormal values, completing missing data at a certain moment by adopting a linear interpolation method, mapping different characteristic data onto the same scale before data modeling, carrying out normalization processing on the characteristic data, and then constructing a deep learning convolution network model, a cyclic neural network model and a leaf Bayesian dynamic linear model.
The deep learning convolutional neural network CNN is used as a feature extraction: the air quality data has multiple dimensions and difficult feature extraction, the deep learning convolutional neural network CNN locally extracts features through convolutional kernels, and weights are shared, so that the defect of excessive parameters of an artificial neural network is overcome, the feature extraction effect is good, the deep learning convolutional neural network CNN has strong feature extraction capability, a long input sequence can be converted into a Short sequence consisting of high-level features, and the sequence consisting of the extracted features is used as the input of a recurrent neural network-long Short-Term memory neural network LSTM (Long Short Term memory).
The recurrent neural network model (long-short term memory neural network LSTM) is used as a prediction model: because the concentration of air pollutants has strong correlation with time, the memory-related problem can be well treated by using the long-short term memory neural network LSTM. The LSTM is improved and optimized on the basis of the RNN, the problem of gradient disappearance in the training process is solved, a group of memory modules are contained in a model structure and are mutually associated to replace memory units in the common RNN, the LSTM is easier to train than the common RNN, and the LSTM has good research effects in multiple fields at present.
The LSTM input is an hour characteristic, namely AQI and six pollutant indexes at a certain moment, and the output is a neuron for predicting AQL
Bayesian dynamic linear model DLM: bayesian prediction is a predictive method developed to predict the need for an incident. The method not only depends on historical measurement data to predict according to the knowledge of a model, but also comprises the experience information and subjective judgment of experts to predict the emergency, and is particularly useful for predicting the emergency.
The basic idea of Bayesian prediction is to establish a dynamic model, regard the prediction distribution as conditional probability distribution, solve the prediction distribution according to prior information, solve posterior information by using Bayesian formula, and correct the prior information to solve the prediction value. The Bayesian dynamic linear model consists of an observation equation, a state equation and initial information.
Mixing the models: after the model framework is built, a long-short term memory neural network LSTM + Bayesian dynamic linear model DLM hybrid model is built. The input of the LSTM model is historical AQI data and six pollutant indexes, and the output is prediction AQI; the input of the Bayesian dynamic linear model is historical AQI data and empirical information, and the output is predicted AQI. 2 prediction model outputs AQI are fused to obtain a new prediction result, so that the model becomes feature-diversified, and has stronger learning ability and higher prediction accuracy.
The normalization processing in step S3 is to reduce the influence of different orders of magnitude or different dimensions on the data by keeping the value range of the data within a relatively small fluctuation range, set the characteristic distribution as a normal distribution, and map the characteristic to the standard normal distribution by the variance and the mean, and the calculation formula is:
wherein y ismeanIs the mean of all sample data, ystdIs the standard deviation of all sample data.
Step S4 specifically includes:
step S41, selecting training data and test data from the AQI data according to the constructed model, and completing initialization of a deep learning convolution network model, a cyclic neural network model and a leaf Bayes dynamic linear model;
step S42, training a deep learning convolution network model, a cyclic neural network model and a leaf bass dynamic linear model by using training data;
step S43, obtaining a test prediction result according to the test data by utilizing the trained deep learning convolution network model, the trained cyclic neural network model and the trained leaf-Bayes dynamic linear model;
and step S44, predicting by using the trained deep learning convolution network model, the trained cyclic neural network model and the trained leaf-Bayes dynamic linear model.
In step S5, the bayesian dynamic linear model includes: observing an equation, a state equation and initial information, regarding the prediction distribution as conditional probability distribution, solving the prediction distribution according to prior information, solving posterior information by using a Bayesian formula, and correcting the prior information to solve a predicted value.
The LSTM neural network model effect and optimization target are defined by loss functions, and the degree of inconsistency between the predicted value and the true value of the network model is estimated. The optimization problem aims to minimize a loss function, and network parameters are optimized according to the proximity degree of a predicted value and a true value to obtain an optimal model. The air quality prediction problem belongs to a regression problem, and a mean square error loss function is adopted and defined as follows:
where a is the prediction value and y is the sample value.
The recurrent neural network model also comprises an Adam algorithm and a Dropout algorithm;
the Adam algorithm is used for calculating a first moment estimation and a second moment estimation of the gradient and designing independent adaptive learning rates for different parameters;
adam designs independent adaptive learning rates for different parameters by computing first and second moment estimates of the gradient. The Adam algorithm takes advantage of both the adaptive gradient algorithm (AdaGrad) and the root mean square propagation (RMSProp) algorithm. Adam not only calculates the adaptive parameter learning rate based on the first moment mean value like the RMSProp algorithm, but also fully utilizes the second moment mean value of the gradient, and the Adam algorithm can adapt to the harsh conditions of sparse parameters, unstable target, noise and the like, has high calculation speed and self-adjustment of parameters and can be suitable for most occasions.
The Dropout algorithm is used to reduce the dependency between features, reducing the probability of over-fitting occurring.
The Dropout algorithm can effectively relieve the occurrence of overfitting and improve the accuracy of prediction. When a complex feedforward neural network training sample is small, the trained model is easy to generate overfitting. In the process of training the neural network, a Dropout algorithm is adopted to randomly discard a part of neural network units, the training process is temporarily removed, and the activation value of a certain neuron stops working with a certain probability p during forward propagation, so that the generalization of the model is stronger, the training load is reduced, and the training speed is improved.
After the data is prepared and the model and parameters are set, deep learning will be trained and verified several times until a best-fit target and desired model are generated.
Step S8, obtaining MEO data;
step S9, carrying out correlation analysis based on the MEO data and the AQI data; and performing correlation analysis between the monitoring meteorological data and the atmospheric quality data by using a Spearman correlation coefficient. Meteorological conditions are one of the important factors restricting air quality, and influence the generation, diffusion, transportation and the like of air pollutants. A Spearman correlation coefficient method is adopted to analyze the relationship between AQI, six air pollutants and meteorological factors. The Spearman correlation coefficient is used for evaluating the correlation of two statistical variables by using a monotonic equation, when the two variables are completely monotonically correlated, the Spearman correlation coefficient is +1 or-1, and if the coefficient is 0, the two variables are not correlated.
Step S10, carrying out backward trace and potential source contribution analysis based on the MEO data and the AQI data; potential source regions and the contribution of different source regions to the contaminant concentration affecting the contaminant concentration are analyzed. The backward track is a model for analyzing pollutant diffusion and motion paths according to meteorological parameters such as temperature, air pressure and wind direction, and is widely used for research on pollutant loosening paths. The potential source contribution factor PSCF analysis method is used for analyzing the potential source and distribution of a specific pollutant by utilizing backward locus and pollutant concentration combination. The method divides a research area into i multiplied by j grids according to longitude and latitude, and records all airflow tracks passing through a certain grid (i, j) as nijThe number of contamination tracks passing through the grid (i, j) is recorded as mij。
And step S11, importing the correlation analysis result, the backward trace and the potential source contribution analysis result into the final prediction AQI data together to obtain a comprehensive improvement suggestion.
The correlation analysis in step S9 specifically includes: taking PM2.5 and PM10 as first variables, and taking weather, temperature, air pressure, humidity, wind speed and wind direction as second variables, the following formulas are introduced:
wherein xiAnd yiIn order to compare the two variables of the correlation,is a variable xiThe average value of (a) of (b),is a variable yiR is a spearman correlation coefficient, r is +1 or-1 when the two variables are perfectly monotonically correlated, and r is 0 when the two variables are uncorrelated.
The backward trajectory and potential source contribution analysis in step S10 specifically includes: dividing a research area into i multiplied by j grids according to the longitude and latitude, wherein the PSCF calculation formula is as follows:
wherein n isijFor all the gas flow trajectories through a certain grid (i, j), mijIs the number of contamination traces passing through grid (i, j).
Example two:
the spatial correlation among the atmospheric pollutants is researched, and a spatial conversion method is provided. Through airspace division, airspace aggregation and an airspace difference value, the areas around the target monitoring station are divided, so that each area can acquire the atmospheric quality data and the meteorological data in the same format, the atmospheric quality data with sparse space is finally converted into uniform consistent input, and the characteristics among the airspace data are extracted.
Acquiring a set S ═ S of a central monitoring station and a monitoring station in an adjacent area of a target area by collecting historical atmospheric quality observation data and meteorological data1,S2,S3,...SnAnd historical atmospheric quality monitoring data of each monitoring stationAnd historical meteorological monitoring data for each monitoring siteThe three are used as the input of a deep learning model to obtain the atmospheric quality data of the central monitoring point of the target area in the future period of time
Since the atmospheric pollutants float in a wide geographic space and are in a movable diffusion state at any time under the influence of time and terrain, the atmospheric quality index of a target area in the future of 48 hours is predicted, and not only the historical atmospheric quality index of the target area needs to be considered in detailAnd historical meteorological monitoring dataIt is also necessary to set the peripheral region S to { S ═ S1,S2,S3,...SnThe two data of the four-dimensional space are taken into consideration together, and the spatial correlation of the two data is taken into consideration comprehensively.
1) The diffusivity of atmospheric pollution. Because atmospheric pollutants are scattered in different places and can be diffused and transferred under the condition of regional geographic environment over time, more information can be further predicted by utilizing data from a neighborhood space.
2) Spatial correlation. The spatial domain partitions merge the dispersed atmospheric quality data into a certain target region, with closer regions having finer granularity and farther regions having coarser granularity. In addition, regions of different distances show different effects as a function of distance.
3) And (4) expandability. It reduces complexity compared to the conventional spatial aggregation method by determining the upper limit (number of regions) of the input. In addition, the spatial interpolation method overcomes spatial sparsity by filling missing values of the partitioned regions and generating consistent inputs for all monitoring stations, which enables us to train a model using data of different stations together, increasing the accuracy of the model to a certain extent.
The process of the space conversion method comprises the steps of firstly, selecting a target atmospheric quality monitoring station needing to be predicted as a circle center, and generating an inner monitoring area by taking 5 kilometers as a first radius; generating an outer ring by taking 20 kilometers as a second radius, and taking an area outside the inner monitoring area and inside the outer ring as an outer monitoring area; connecting all monitoring stations in an internal monitoring area with a target monitoring point, acquiring internal monitoring angles between two adjacent monitoring stations and the target monitoring point, taking an angular bisector of the internal monitoring angle with the smallest angle in all the internal monitoring angles as an initial axis, taking every 45 degrees as an internal sector area, and dividing 8 internal sector areas; all monitoring stations in the outer monitoring area are connected with target monitoring points, the outer monitoring angle between two adjacent monitoring stations and the target monitoring points is obtained, the angular bisector of the outer monitoring angle with the minimum angle in all the outer monitoring angles is used as an initial axis, every 45 degrees is used as an outer sector area, and 8 inner sector areas are divided.
Therefore, monitoring stations are arranged in each sector area as much as possible, the use of virtual monitoring stations is reduced, and the accuracy is improved.
Then, judging each sector area, and if one or more monitoring stations exist in one area, distributing weights to the recorded data of each monitoring station in the area according to the distances between the monitoring stations and a target monitoring station to perform regression operation so as to obtain the average monitoring data of the area; if the area has no monitoring station, a virtual monitoring station is generated in the center of the areas, and the data of the virtual monitoring station is interpolated by using a classical spatial interpolation method and inverse Distance weighted IDW (inverse Distance weighted).
The key point of this method is to designate one feature as a primary feature and the other features as secondary features. Wherein the main characteristic refers to the historical atmospheric quality index of a target monitoring stationAnd historical meteorological monitoring dataIts data and predicted target dataAll from the same monitoring station, with auxiliary featuresAndit is a monitored site from 16 sectors of the perimeter.
And (3) a spatial domain aggregation algorithm: when the airspace is divided, due to the distribution unevenness of the monitoring stations on the geographic factors and the limitation of other factors, a plurality of detection stations may exist in some areas, the data is excessive, the redundancy is increased, the weight is distributed to the recorded data of each monitoring station in the area for regression operation, the average monitoring data of the area is obtained, and the following formula is used for calculation:
wherein y is the average monitoring data of the area, W is different weight values, and the size of W is determined according to the distance between each monitoring point in the area and the target monitoring point.
And (3) space domain difference algorithm: when the space domain is divided, areas obtained by dividing some remote target monitoring stations do not have monitoring stations, a virtual monitoring station is generated in the area to complement the missing value in the area, and the data of the virtual monitoring station in the area are generated by utilizing the captured data of the monitoring stations in the surrounding area. An inverse distance weighting method is to be used which uses a linear weighted set of available values at known points to calculate the assigned value for an unknown point, using the following formula:
where Z (x, y) is the difference prediction output, (x, y) is the difference point coordinates, (xi,yi) Is a discrete point coordinate, wiIs the weight of the discrete point.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.
Claims (9)
1. An air quality analysis and prediction method based on deep learning and Bayesian model is characterized by comprising the following steps:
step S1: acquiring AQI data of a target monitoring point;
step S2: preprocessing AQI data, judging abnormal values in a data sequence according to a Laobe criterion, removing the abnormal values, and completing missing data at a certain moment by adopting a linear interpolation method;
step S3: carrying out normalization processing on the AQI data;
step S4: respectively constructing a deep learning convolution network model, a cyclic neural network model and a leaf bass dynamic linear model;
step S5: respectively inputting the normalized AQI data into a deep learning convolution network model and a leaf-Bayesian dynamic linear model, wherein after the deep learning convolution network model operates, a long input sequence is converted into a short sequence formed by high-level features, and after the leaf-Bayesian dynamic linear model operates, a first prediction AQI data is output;
step S6: inputting a sequence consisting of features extracted by the deep learning convolutional network model into a cyclic neural network model, and outputting second prediction AQI data after the cyclic neural network model operates;
step S7: and constructing a mixed model, inputting the first prediction AQI data and the second prediction AQI data into the value mixed model, and outputting final prediction AQI data after the mixed model operates.
2. The air quality analysis and prediction method based on the deep learning and Bayesian model as recited in claim 1, wherein: the normalization processing in step S3 is to reduce the influence of different orders of magnitude or different dimensions on the data by keeping the value range of the data within a relatively small fluctuation range, set the characteristic distribution as a normal distribution, and map the characteristic to the standard normal distribution by the variance and the mean, and the calculation formula is:
wherein y ismeanIs the mean of all sample data, ystdIs the standard deviation of all sample data.
3. The air quality analysis and prediction method based on the deep learning and Bayesian model as recited in claim 1, wherein: the step S4 specifically includes:
step S41, selecting training data and test data from the AQI data according to the constructed model, and completing initialization of a deep learning convolution network model, a cyclic neural network model and a leaf Bayes dynamic linear model;
step S42, training a deep learning convolution network model, a cyclic neural network model and a leaf bass dynamic linear model by using training data;
step S43, obtaining a test prediction result according to the test data by utilizing the trained deep learning convolution network model, the trained cyclic neural network model and the trained leaf-Bayes dynamic linear model;
and step S44, predicting by using the trained deep learning convolution network model, the trained cyclic neural network model and the trained leaf-Bayes dynamic linear model.
4. The air quality analysis and prediction method based on the deep learning and Bayesian model as recited in claim 3, wherein: in step S5, the bayesian dynamic linear model includes: observing an equation, a state equation and initial information, regarding the prediction distribution as conditional probability distribution, solving the prediction distribution according to prior information, solving posterior information by using a Bayesian formula, and correcting the prior information to solve a predicted value.
6. The air quality analysis and prediction method based on the deep learning and Bayesian model as recited in claim 3, wherein: the cyclic neural network model also comprises an Adam algorithm and a Dropout algorithm;
the Adam algorithm is used for calculating a first moment estimation and a second moment estimation of the gradient to design independent adaptive learning rates for different parameters;
the Dropout algorithm is used to reduce the dependency between features, reducing the probability of over-fitting occurring.
7. The air quality analysis and prediction method based on the deep learning and Bayesian model as recited in claim 1, wherein: step S8, obtaining MEO data;
step S9, carrying out correlation analysis based on the MEO data and the AQI data;
step S10, carrying out backward trace and potential source contribution analysis based on the MEO data and the AQI data;
and step S11, importing the correlation analysis result, the backward trace and the potential source contribution analysis result into the final prediction AQI data together to obtain a comprehensive improvement suggestion.
8. The air quality analysis and prediction method based on deep learning and Bayesian model as claimed in claim 7, wherein: the correlation analysis in step S9 specifically includes: taking PM2.5 and PM10 as first variables, and taking weather, temperature, air pressure, humidity, wind speed and wind direction as second variables, the following formulas are introduced:
wherein xiAnd yiIn order to compare the two variables of the correlation,is a variable xiThe average value of (a) of (b),is a variable yiR is a spearman correlation coefficient, r is +1 or-1 when the two variables are perfectly monotonically correlated, and r is 0 when the two variables are uncorrelated.
9. The air quality analysis and prediction method based on deep learning and Bayesian model as claimed in claim 7, wherein: the backward trajectory and potential source contribution analysis in step S10 specifically includes: dividing a research area into i multiplied by j grids according to the longitude and latitude, wherein the PSCF calculation formula is as follows:
wherein n isijFor all the gas flow trajectories through a certain grid (i, j), mijIs the number of contamination traces passing through grid (i, j).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110282474.XA CN112884243A (en) | 2021-03-16 | 2021-03-16 | Air quality analysis and prediction method based on deep learning and Bayesian model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110282474.XA CN112884243A (en) | 2021-03-16 | 2021-03-16 | Air quality analysis and prediction method based on deep learning and Bayesian model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112884243A true CN112884243A (en) | 2021-06-01 |
Family
ID=76042656
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110282474.XA Pending CN112884243A (en) | 2021-03-16 | 2021-03-16 | Air quality analysis and prediction method based on deep learning and Bayesian model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112884243A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113988400A (en) * | 2021-10-22 | 2022-01-28 | 重庆工商大学 | AQI index prediction method based on PEARSON-LSTM multi-step fusion network |
CN114219345A (en) * | 2021-12-24 | 2022-03-22 | 武汉工程大学 | Secondary air quality prediction optimization method based on data mining |
CN114757296A (en) * | 2022-04-29 | 2022-07-15 | 广东技术师范大学 | Cooperative data-based pollutant analysis method and device |
US20220316734A1 (en) * | 2021-04-14 | 2022-10-06 | Jiangnan University | Deep Spatial-Temporal Similarity Method for Air Quality Prediction |
CN115512849A (en) * | 2022-09-15 | 2022-12-23 | 北京理工大学 | Low-oxygen closed-loop intervention system of plateau pre-learning clothes |
CN115878695A (en) * | 2023-02-20 | 2023-03-31 | 中国民用航空局空中交通管理局航空气象中心 | Data visualization adjusting method and system based on meteorological database |
CN116304913A (en) * | 2023-04-07 | 2023-06-23 | 中国长江三峡集团有限公司 | Water quality state monitoring method and device based on Bayesian model and electronic equipment |
CN116451853A (en) * | 2023-04-06 | 2023-07-18 | 湖南工商大学 | Atmospheric quality monitoring method, system, electronic equipment and storage medium |
CN117909931A (en) * | 2024-01-19 | 2024-04-19 | 江苏智伦数字技术研究有限公司 | Air quality deducing method, terminal and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107909206A (en) * | 2017-11-15 | 2018-04-13 | 电子科技大学 | A kind of PM2.5 Forecasting Methodologies based on deep structure Recognition with Recurrent Neural Network |
CN108268935A (en) * | 2018-01-11 | 2018-07-10 | 浙江工业大学 | A kind of PM2.5 concentration values Forecasting Methodology and system based on sequential Recognition with Recurrent Neural Network |
CN108898261A (en) * | 2018-07-24 | 2018-11-27 | 深圳市源广浩电子有限公司 | A kind of air quality monitoring method and system based on environmentally friendly big data |
CN109492830A (en) * | 2018-12-17 | 2019-03-19 | 杭州电子科技大学 | A kind of mobile pollution source concentration of emission prediction technique based on space-time deep learning |
CN110161183A (en) * | 2019-05-30 | 2019-08-23 | 广东柯内特环境科技有限公司 | A kind of air quality monitoring method |
CN111798051A (en) * | 2020-07-02 | 2020-10-20 | 杭州电子科技大学 | Air quality space-time prediction method based on long-short term memory neural network |
CN112115004A (en) * | 2020-07-29 | 2020-12-22 | 西安交通大学 | Hard disk service life prediction method based on back propagation Bayes deep learning |
-
2021
- 2021-03-16 CN CN202110282474.XA patent/CN112884243A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107909206A (en) * | 2017-11-15 | 2018-04-13 | 电子科技大学 | A kind of PM2.5 Forecasting Methodologies based on deep structure Recognition with Recurrent Neural Network |
CN108268935A (en) * | 2018-01-11 | 2018-07-10 | 浙江工业大学 | A kind of PM2.5 concentration values Forecasting Methodology and system based on sequential Recognition with Recurrent Neural Network |
CN108898261A (en) * | 2018-07-24 | 2018-11-27 | 深圳市源广浩电子有限公司 | A kind of air quality monitoring method and system based on environmentally friendly big data |
CN109492830A (en) * | 2018-12-17 | 2019-03-19 | 杭州电子科技大学 | A kind of mobile pollution source concentration of emission prediction technique based on space-time deep learning |
CN110161183A (en) * | 2019-05-30 | 2019-08-23 | 广东柯内特环境科技有限公司 | A kind of air quality monitoring method |
CN111798051A (en) * | 2020-07-02 | 2020-10-20 | 杭州电子科技大学 | Air quality space-time prediction method based on long-short term memory neural network |
CN112115004A (en) * | 2020-07-29 | 2020-12-22 | 西安交通大学 | Hard disk service life prediction method based on back propagation Bayes deep learning |
Non-Patent Citations (1)
Title |
---|
孔珊珊等: "基于后向轨迹模式的北京市PM2.5来源分布及传输特征探讨", 中国环境管理, no. 1, 6 April 2017 (2017-04-06), pages 86 - 90 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220316734A1 (en) * | 2021-04-14 | 2022-10-06 | Jiangnan University | Deep Spatial-Temporal Similarity Method for Air Quality Prediction |
US11512864B2 (en) * | 2021-04-14 | 2022-11-29 | Jiangnan University | Deep spatial-temporal similarity method for air quality prediction |
CN113988400B (en) * | 2021-10-22 | 2024-04-09 | 重庆工商大学 | AQI index prediction method based on PEARSON-LSTM multi-step fusion network |
CN113988400A (en) * | 2021-10-22 | 2022-01-28 | 重庆工商大学 | AQI index prediction method based on PEARSON-LSTM multi-step fusion network |
CN114219345A (en) * | 2021-12-24 | 2022-03-22 | 武汉工程大学 | Secondary air quality prediction optimization method based on data mining |
CN114757296A (en) * | 2022-04-29 | 2022-07-15 | 广东技术师范大学 | Cooperative data-based pollutant analysis method and device |
CN114757296B (en) * | 2022-04-29 | 2022-12-13 | 广东技术师范大学 | Cooperative data-based pollutant analysis method and device |
CN115512849A (en) * | 2022-09-15 | 2022-12-23 | 北京理工大学 | Low-oxygen closed-loop intervention system of plateau pre-learning clothes |
CN115878695B (en) * | 2023-02-20 | 2023-05-19 | 中国民用航空局空中交通管理局航空气象中心 | Data visualization adjustment method and system based on meteorological database |
CN115878695A (en) * | 2023-02-20 | 2023-03-31 | 中国民用航空局空中交通管理局航空气象中心 | Data visualization adjusting method and system based on meteorological database |
CN116451853A (en) * | 2023-04-06 | 2023-07-18 | 湖南工商大学 | Atmospheric quality monitoring method, system, electronic equipment and storage medium |
CN116451853B (en) * | 2023-04-06 | 2023-12-15 | 湖南工商大学 | Atmospheric quality monitoring method, system, electronic equipment and storage medium |
CN116304913A (en) * | 2023-04-07 | 2023-06-23 | 中国长江三峡集团有限公司 | Water quality state monitoring method and device based on Bayesian model and electronic equipment |
CN117909931A (en) * | 2024-01-19 | 2024-04-19 | 江苏智伦数字技术研究有限公司 | Air quality deducing method, terminal and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112884243A (en) | Air quality analysis and prediction method based on deep learning and Bayesian model | |
Chang et al. | An LSTM-based aggregated model for air pollution forecasting | |
Seng et al. | Spatiotemporal prediction of air quality based on LSTM neural network | |
CN109492830B (en) | Mobile pollution source emission concentration prediction method based on time-space deep learning | |
Farmani et al. | Pipe failure prediction in water distribution systems considering static and dynamic factors | |
Lindemann et al. | Anomaly detection and prediction in discrete manufacturing based on cooperative LSTM networks | |
CN111080070B (en) | Urban land utilization cellular automaton simulation method based on space errors | |
CN110555551B (en) | Air quality big data management method and system for smart city | |
CN112131731B (en) | Urban growth cellular simulation method based on spatial feature vector filtering | |
CN113902580B (en) | Historical farmland distribution reconstruction method based on random forest model | |
CN111814956B (en) | Multi-task learning air quality prediction method based on multi-dimensional secondary feature extraction | |
CN101893674A (en) | Pollution flashover index forecasting method for regional power grid | |
CN110533239B (en) | Smart city air quality high-precision measurement method | |
CN113011455B (en) | Air quality prediction SVM model construction method | |
CN104506162A (en) | Fault prognosis method for high-order particle filter on basis of LS-SVR (least squares support vector regression) modeling | |
Ashkboos et al. | Ens-10: A dataset for post-processing ensemble weather forecasts | |
CN116805439A (en) | Drought prediction method and system based on artificial intelligence and atmospheric circulation mechanism | |
CN115542429A (en) | XGboost-based ozone quality prediction method and system | |
Li et al. | A multi-factor combination prediction model of carbon emissions based on improved CEEMDAN | |
Panjapornpon et al. | Energy efficiency and savings analysis with multirate sampling for petrochemical process using convolutional neural network-based transfer learning | |
CN116525135B (en) | Method for predicting epidemic situation development situation by space-time model based on meteorological factors | |
CN117972625A (en) | Attention neural network data assimilation method based on four-dimensional variation constraint | |
CN117313351A (en) | Safe data center optimized cold prediction method and system | |
CN116070669B (en) | Workshop energy consumption prediction method and management system based on improved deep belief network | |
CN114970745B (en) | Intelligent security and environment big data system of Internet of things |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |