CN112884243A - Air quality analysis and prediction method based on deep learning and Bayesian model - Google Patents

Air quality analysis and prediction method based on deep learning and Bayesian model Download PDF

Info

Publication number
CN112884243A
CN112884243A CN202110282474.XA CN202110282474A CN112884243A CN 112884243 A CN112884243 A CN 112884243A CN 202110282474 A CN202110282474 A CN 202110282474A CN 112884243 A CN112884243 A CN 112884243A
Authority
CN
China
Prior art keywords
data
model
deep learning
prediction
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110282474.XA
Other languages
Chinese (zh)
Inventor
富众杰
林海平
黃炳强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Vocational and Technical College
Original Assignee
Hangzhou Vocational and Technical College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Vocational and Technical College filed Critical Hangzhou Vocational and Technical College
Priority to CN202110282474.XA priority Critical patent/CN112884243A/en
Publication of CN112884243A publication Critical patent/CN112884243A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Mathematical Analysis (AREA)
  • Game Theory and Decision Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computational Mathematics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an air quality analysis and prediction method based on deep learning and Bayesian model, which has the technical scheme that the method comprises the steps of obtaining AQI data of target monitoring points; preprocessing the AQI data, and normalizing the AQI data; respectively constructing a deep learning convolution network model, a cyclic neural network model and a leaf bass dynamic linear model; respectively inputting AQI data into a deep learning convolution network model and a leaf-Bayesian dynamic linear model, and outputting first prediction AQI data after the leaf-Bayesian dynamic linear model operates; inputting the features extracted by the deep learning convolutional network model into a cyclic neural network model, and outputting second prediction AQI data after operation; the first prediction AQI data and the second prediction AQI data are input into a mixed model, and the mixed model outputs final prediction AQI data after operation.

Description

Air quality analysis and prediction method based on deep learning and Bayesian model
Technical Field
The invention relates to an atmospheric pollutant concentration prediction method, in particular to an air quality analysis prediction method based on deep learning and a Bayesian model.
Background
The quality of the atmospheric quality is a problem which continuously receives attention in recent years, and a large number of atmospheric quality monitoring stations are added in China for monitoring local atmospheric quality and meteorological data. The atmospheric quality data that monitoring station can monitor wherein comprises 6 factors, is respectively: particulate matter (PM2.5 and PM10) and gaseous matter (NO)2,CO,O3And SO2) The data are called AQI data in a unified way; in addition, the monitoring points can also acquire meteorological data of the area, such as weather, temperature, pressure, humidity, wind direction and wind speed, which are collectively called as MEO data.
Because meteorological environment factors are complex, index prediction of atmospheric pollutant concentration is always a complex problem. At present, commonly used prediction methods include a mechanism prediction method based on an atmospheric chemical transmission model and a statistical prediction method based on a machine learning model. The former method is widely applied to actual engineering, but because the atmosphere is a very complex system and is theoretically difficult to operate and fully quantize, a mechanism forecasting method has a large error.
At present, the forecast of weather conditions and the concentration of various pollutants by the national weather bureau is obtained by adopting an atmospheric chemical coupling mode (WRF-Chem) operation. Because numerical model calculations and emission source inventory data have errors of different degrees, the prediction effect of the model on the pollutant concentration is not ideal.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an air quality analysis and prediction method based on deep learning and a Bayesian model, which can analyze and predict air quality, evaluate the atmosphere improvement condition, clarify the pollution source and provide air pollution prevention and control suggestions.
In order to achieve the purpose, the invention provides the following technical scheme: an air quality analysis and prediction method based on deep learning and Bayesian model comprises the following steps:
step S1: acquiring AQI data of a target monitoring point;
step S2: preprocessing AQI data, judging abnormal values in a data sequence according to a Laobe criterion, removing the abnormal values, and completing missing data at a certain moment by adopting a linear interpolation method;
step S3: carrying out normalization processing on the AQI data;
step S4: respectively constructing a deep learning convolution network model, a cyclic neural network model and a leaf bass dynamic linear model;
step S5: respectively inputting the normalized AQI data into a deep learning convolution network model and a leaf-Bayesian dynamic linear model, wherein after the deep learning convolution network model operates, a long input sequence is converted into a short sequence formed by high-level features, and after the leaf-Bayesian dynamic linear model operates, a first prediction AQI data is output;
step S6: inputting a sequence consisting of features extracted by the deep learning convolutional network model into a cyclic neural network model, and outputting second prediction AQI data after the cyclic neural network model operates;
step S7: and constructing a mixed model, inputting the first prediction AQI data and the second prediction AQI data into the value mixed model, and outputting final prediction AQI data after the mixed model operates.
The invention is further configured to: the normalization processing in step S3 is to reduce the influence of different orders of magnitude or different dimensions on the data by keeping the value range of the data within a relatively small fluctuation range, set the characteristic distribution as a normal distribution, and map the characteristic to the standard normal distribution by the variance and the mean, and the calculation formula is:
Figure BDA0002979129140000021
wherein y ismeanIs the mean value of all the sample data,ystdis the standard deviation of all sample data.
The invention is further configured to: the step S4 specifically includes:
step S41, selecting training data and test data from the AQI data according to the constructed model, and completing initialization of a deep learning convolution network model, a cyclic neural network model and a leaf Bayes dynamic linear model;
step S42, training a deep learning convolution network model, a cyclic neural network model and a leaf bass dynamic linear model by using training data;
step S43, obtaining a test prediction result according to the test data by utilizing the trained deep learning convolution network model, the trained cyclic neural network model and the trained leaf-Bayes dynamic linear model;
and step S44, predicting by using the trained deep learning convolution network model, the trained cyclic neural network model and the trained leaf-Bayes dynamic linear model.
The invention is further configured to: in step S5, the bayesian dynamic linear model includes: observing an equation, a state equation and initial information, regarding the prediction distribution as conditional probability distribution, solving the prediction distribution according to prior information, solving posterior information by using a Bayesian formula, and correcting the prior information to solve a predicted value.
The invention is further configured to: for the recurrent neural network model, the loss function of the training phase is as follows:
Figure BDA0002979129140000031
where a is the prediction value and y is the sample value.
The invention is further configured to: the cyclic neural network model also comprises an Adam algorithm and a Dropout algorithm;
the Adam algorithm is used for calculating a first moment estimation and a second moment estimation of the gradient to design independent adaptive learning rates for different parameters;
the Dropout algorithm is used to reduce the dependency between features, reducing the probability of over-fitting occurring.
The invention is further configured to: step S8, obtaining MEO data;
step S9, carrying out correlation analysis based on the MEO data and the AQI data;
step S10, carrying out backward trace and potential source contribution analysis based on the MEO data and the AQI data;
and step S11, importing the correlation analysis result, the backward trace and the potential source contribution analysis result into the final prediction AQI data together to obtain a comprehensive improvement suggestion.
The invention is further configured to: the correlation analysis in step S9 specifically includes: taking PM2.5 and PM10 as first variables, and taking weather, temperature, air pressure, humidity, wind speed and wind direction as second variables, the following formulas are introduced:
Figure BDA0002979129140000041
wherein xiAnd yiIn order to compare the two variables of the correlation,
Figure BDA0002979129140000042
is a variable xiThe average value of (a) of (b),
Figure BDA0002979129140000044
is a variable yiR is a spearman correlation coefficient, r is +1 or-1 when the two variables are perfectly monotonically correlated, and r is 0 when the two variables are uncorrelated.
The invention is further configured to: the backward trajectory and potential source contribution analysis in step S10 specifically includes: dividing a research area into i multiplied by j grids according to the longitude and latitude, wherein the PSCF calculation formula is as follows:
Figure BDA0002979129140000043
wherein n isijTo pass through a certain pointNumber of all air flow paths, m, of grid (i, j)ijIs the number of contamination traces passing through grid (i, j).
In conclusion, the invention has the following beneficial effects: obtaining air quality data AQI (PM2.5, PM10, NO)2,CO,O3,SO2) The historical monitoring data is obtained by considering the time sequence characteristics of air quality data, judging abnormal values in a data sequence by adopting a Lauda criterion and removing the abnormal values, completing missing data at a certain moment by adopting a linear interpolation method, mapping different characteristic data onto the same scale before data modeling, carrying out normalization processing on the characteristic data, and then constructing a deep learning convolution network model, a cyclic neural network model and a leaf Bayesian dynamic linear model.
The deep learning convolutional neural network CNN is used as a feature extraction: the air quality data has multiple dimensions and difficult feature extraction, the deep learning convolutional neural network CNN locally extracts features through convolutional kernels, and weights are shared, so that the defect of excessive parameters of an artificial neural network is overcome, the feature extraction effect is good, the deep learning convolutional neural network CNN has strong feature extraction capability, a long input sequence can be converted into a Short sequence consisting of high-level features, and the sequence consisting of the extracted features is used as the input of a recurrent neural network-long Short-Term memory neural network LSTM (Long Short Term memory).
The recurrent neural network model (long-short term memory neural network LSTM) is used as a prediction model: because the concentration of air pollutants has strong correlation with time, the memory-related problem can be well treated by using the long-short term memory neural network LSTM. The LSTM is improved and optimized on the basis of the RNN, the problem of gradient disappearance in the training process is solved, a group of memory modules are contained in a model structure and are mutually associated to replace memory units in the common RNN, the LSTM is easier to train than the common RNN, and the LSTM has good research effects in multiple fields at present.
The LSTM input is an hour characteristic, namely AQI and six pollutant indexes at a certain moment, and the output is a neuron for predicting AQL
Bayesian dynamic linear model DLM: bayesian prediction is a predictive method developed to predict the need for an incident. The method not only depends on historical measurement data to predict according to the knowledge of a model, but also comprises the experience information and subjective judgment of experts to predict the emergency, and is particularly useful for predicting the emergency.
The basic idea of Bayesian prediction is to establish a dynamic model, regard the prediction distribution as conditional probability distribution, solve the prediction distribution according to prior information, solve posterior information by using Bayesian formula, and correct the prior information to solve the prediction value. The Bayesian dynamic linear model consists of an observation equation, a state equation and initial information.
Mixing the models: after the model framework is built, a long-short term memory neural network LSTM + Bayesian dynamic linear model DLM hybrid model is built. The input of the LSTM model is historical AQI data and six pollutant indexes, and the output is prediction AQI; the input of the Bayesian dynamic linear model is historical AQI data and empirical information, and the output is predicted AQI. 2 prediction model outputs AQI are fused to obtain a new prediction result, so that the model becomes feature-diversified, and has stronger learning ability and higher prediction accuracy.
Drawings
Fig. 1 is a schematic block diagram of an air quality analysis prediction method.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. In which like parts are designated by like reference numerals. It should be noted that the terms "front," "back," "left," "right," "upper" and "lower" used in the following description refer to directions in the drawings, and the terms "bottom" and "top," "inner" and "outer" refer to directions toward and away from, respectively, the geometric center of a particular component.
The first embodiment is as follows: referring to fig. 1, in order to achieve the above object, the present invention provides the following technical solutions: an air quality analysis and prediction method based on deep learning and Bayesian model comprises the following steps:
step S1: acquiring AQI data of a target monitoring point;
step S2: preprocessing AQI data, judging abnormal values in a data sequence according to a Laobe criterion, removing the abnormal values, and completing missing data at a certain moment by adopting a linear interpolation method;
step S3: carrying out normalization processing on the AQI data;
step S4: respectively constructing a deep learning convolution network model, a cyclic neural network model and a leaf bass dynamic linear model;
step S5: inputting the normalized AQI data into a deep learning convolution network model and a leaf Bayes dynamic linear model respectively, converting a long input sequence into a short sequence consisting of high-level features after the deep learning convolution network model operates, and outputting first prediction AQI data after the leaf Bayes dynamic linear model operates;
step S6: inputting a sequence consisting of features extracted by the deep learning convolutional network model into the cyclic neural network model, and outputting second prediction AQI data after the cyclic neural network model operates;
step S7: and constructing a mixed model, inputting the first prediction AQI data and the second prediction AQI data into the mixed model, and outputting final prediction AQI data after the mixed model operates.
The design of the invention is as follows: obtaining air quality data AQI (PM2.5, PM10, NO)2,CO,O3,SO2) The historical monitoring data is obtained by considering the time sequence characteristics of air quality data, judging abnormal values in a data sequence by adopting a Lauda criterion and removing the abnormal values, completing missing data at a certain moment by adopting a linear interpolation method, mapping different characteristic data onto the same scale before data modeling, carrying out normalization processing on the characteristic data, and then constructing a deep learning convolution network model, a cyclic neural network model and a leaf Bayesian dynamic linear model.
The deep learning convolutional neural network CNN is used as a feature extraction: the air quality data has multiple dimensions and difficult feature extraction, the deep learning convolutional neural network CNN locally extracts features through convolutional kernels, and weights are shared, so that the defect of excessive parameters of an artificial neural network is overcome, the feature extraction effect is good, the deep learning convolutional neural network CNN has strong feature extraction capability, a long input sequence can be converted into a Short sequence consisting of high-level features, and the sequence consisting of the extracted features is used as the input of a recurrent neural network-long Short-Term memory neural network LSTM (Long Short Term memory).
The recurrent neural network model (long-short term memory neural network LSTM) is used as a prediction model: because the concentration of air pollutants has strong correlation with time, the memory-related problem can be well treated by using the long-short term memory neural network LSTM. The LSTM is improved and optimized on the basis of the RNN, the problem of gradient disappearance in the training process is solved, a group of memory modules are contained in a model structure and are mutually associated to replace memory units in the common RNN, the LSTM is easier to train than the common RNN, and the LSTM has good research effects in multiple fields at present.
The LSTM input is an hour characteristic, namely AQI and six pollutant indexes at a certain moment, and the output is a neuron for predicting AQL
Bayesian dynamic linear model DLM: bayesian prediction is a predictive method developed to predict the need for an incident. The method not only depends on historical measurement data to predict according to the knowledge of a model, but also comprises the experience information and subjective judgment of experts to predict the emergency, and is particularly useful for predicting the emergency.
The basic idea of Bayesian prediction is to establish a dynamic model, regard the prediction distribution as conditional probability distribution, solve the prediction distribution according to prior information, solve posterior information by using Bayesian formula, and correct the prior information to solve the prediction value. The Bayesian dynamic linear model consists of an observation equation, a state equation and initial information.
Mixing the models: after the model framework is built, a long-short term memory neural network LSTM + Bayesian dynamic linear model DLM hybrid model is built. The input of the LSTM model is historical AQI data and six pollutant indexes, and the output is prediction AQI; the input of the Bayesian dynamic linear model is historical AQI data and empirical information, and the output is predicted AQI. 2 prediction model outputs AQI are fused to obtain a new prediction result, so that the model becomes feature-diversified, and has stronger learning ability and higher prediction accuracy.
The normalization processing in step S3 is to reduce the influence of different orders of magnitude or different dimensions on the data by keeping the value range of the data within a relatively small fluctuation range, set the characteristic distribution as a normal distribution, and map the characteristic to the standard normal distribution by the variance and the mean, and the calculation formula is:
Figure BDA0002979129140000081
wherein y ismeanIs the mean of all sample data, ystdIs the standard deviation of all sample data.
Step S4 specifically includes:
step S41, selecting training data and test data from the AQI data according to the constructed model, and completing initialization of a deep learning convolution network model, a cyclic neural network model and a leaf Bayes dynamic linear model;
step S42, training a deep learning convolution network model, a cyclic neural network model and a leaf bass dynamic linear model by using training data;
step S43, obtaining a test prediction result according to the test data by utilizing the trained deep learning convolution network model, the trained cyclic neural network model and the trained leaf-Bayes dynamic linear model;
and step S44, predicting by using the trained deep learning convolution network model, the trained cyclic neural network model and the trained leaf-Bayes dynamic linear model.
In step S5, the bayesian dynamic linear model includes: observing an equation, a state equation and initial information, regarding the prediction distribution as conditional probability distribution, solving the prediction distribution according to prior information, solving posterior information by using a Bayesian formula, and correcting the prior information to solve a predicted value.
The LSTM neural network model effect and optimization target are defined by loss functions, and the degree of inconsistency between the predicted value and the true value of the network model is estimated. The optimization problem aims to minimize a loss function, and network parameters are optimized according to the proximity degree of a predicted value and a true value to obtain an optimal model. The air quality prediction problem belongs to a regression problem, and a mean square error loss function is adopted and defined as follows:
Figure BDA0002979129140000091
where a is the prediction value and y is the sample value.
The recurrent neural network model also comprises an Adam algorithm and a Dropout algorithm;
the Adam algorithm is used for calculating a first moment estimation and a second moment estimation of the gradient and designing independent adaptive learning rates for different parameters;
adam designs independent adaptive learning rates for different parameters by computing first and second moment estimates of the gradient. The Adam algorithm takes advantage of both the adaptive gradient algorithm (AdaGrad) and the root mean square propagation (RMSProp) algorithm. Adam not only calculates the adaptive parameter learning rate based on the first moment mean value like the RMSProp algorithm, but also fully utilizes the second moment mean value of the gradient, and the Adam algorithm can adapt to the harsh conditions of sparse parameters, unstable target, noise and the like, has high calculation speed and self-adjustment of parameters and can be suitable for most occasions.
The Dropout algorithm is used to reduce the dependency between features, reducing the probability of over-fitting occurring.
The Dropout algorithm can effectively relieve the occurrence of overfitting and improve the accuracy of prediction. When a complex feedforward neural network training sample is small, the trained model is easy to generate overfitting. In the process of training the neural network, a Dropout algorithm is adopted to randomly discard a part of neural network units, the training process is temporarily removed, and the activation value of a certain neuron stops working with a certain probability p during forward propagation, so that the generalization of the model is stronger, the training load is reduced, and the training speed is improved.
After the data is prepared and the model and parameters are set, deep learning will be trained and verified several times until a best-fit target and desired model are generated.
Step S8, obtaining MEO data;
step S9, carrying out correlation analysis based on the MEO data and the AQI data; and performing correlation analysis between the monitoring meteorological data and the atmospheric quality data by using a Spearman correlation coefficient. Meteorological conditions are one of the important factors restricting air quality, and influence the generation, diffusion, transportation and the like of air pollutants. A Spearman correlation coefficient method is adopted to analyze the relationship between AQI, six air pollutants and meteorological factors. The Spearman correlation coefficient is used for evaluating the correlation of two statistical variables by using a monotonic equation, when the two variables are completely monotonically correlated, the Spearman correlation coefficient is +1 or-1, and if the coefficient is 0, the two variables are not correlated.
Step S10, carrying out backward trace and potential source contribution analysis based on the MEO data and the AQI data; potential source regions and the contribution of different source regions to the contaminant concentration affecting the contaminant concentration are analyzed. The backward track is a model for analyzing pollutant diffusion and motion paths according to meteorological parameters such as temperature, air pressure and wind direction, and is widely used for research on pollutant loosening paths. The potential source contribution factor PSCF analysis method is used for analyzing the potential source and distribution of a specific pollutant by utilizing backward locus and pollutant concentration combination. The method divides a research area into i multiplied by j grids according to longitude and latitude, and records all airflow tracks passing through a certain grid (i, j) as nijThe number of contamination tracks passing through the grid (i, j) is recorded as mij
And step S11, importing the correlation analysis result, the backward trace and the potential source contribution analysis result into the final prediction AQI data together to obtain a comprehensive improvement suggestion.
The correlation analysis in step S9 specifically includes: taking PM2.5 and PM10 as first variables, and taking weather, temperature, air pressure, humidity, wind speed and wind direction as second variables, the following formulas are introduced:
Figure BDA0002979129140000111
wherein xiAnd yiIn order to compare the two variables of the correlation,
Figure BDA0002979129140000112
is a variable xiThe average value of (a) of (b),
Figure BDA0002979129140000113
is a variable yiR is a spearman correlation coefficient, r is +1 or-1 when the two variables are perfectly monotonically correlated, and r is 0 when the two variables are uncorrelated.
The backward trajectory and potential source contribution analysis in step S10 specifically includes: dividing a research area into i multiplied by j grids according to the longitude and latitude, wherein the PSCF calculation formula is as follows:
Figure BDA0002979129140000114
wherein n isijFor all the gas flow trajectories through a certain grid (i, j), mijIs the number of contamination traces passing through grid (i, j).
Example two:
the spatial correlation among the atmospheric pollutants is researched, and a spatial conversion method is provided. Through airspace division, airspace aggregation and an airspace difference value, the areas around the target monitoring station are divided, so that each area can acquire the atmospheric quality data and the meteorological data in the same format, the atmospheric quality data with sparse space is finally converted into uniform consistent input, and the characteristics among the airspace data are extracted.
Acquiring a set S ═ S of a central monitoring station and a monitoring station in an adjacent area of a target area by collecting historical atmospheric quality observation data and meteorological data1,S2,S3,...SnAnd historical atmospheric quality monitoring data of each monitoring station
Figure BDA0002979129140000115
And historical meteorological monitoring data for each monitoring site
Figure BDA0002979129140000116
The three are used as the input of a deep learning model to obtain the atmospheric quality data of the central monitoring point of the target area in the future period of time
Figure BDA0002979129140000117
Since the atmospheric pollutants float in a wide geographic space and are in a movable diffusion state at any time under the influence of time and terrain, the atmospheric quality index of a target area in the future of 48 hours is predicted, and not only the historical atmospheric quality index of the target area needs to be considered in detail
Figure BDA0002979129140000121
And historical meteorological monitoring data
Figure BDA0002979129140000122
It is also necessary to set the peripheral region S to { S ═ S1,S2,S3,...SnThe two data of the four-dimensional space are taken into consideration together, and the spatial correlation of the two data is taken into consideration comprehensively.
1) The diffusivity of atmospheric pollution. Because atmospheric pollutants are scattered in different places and can be diffused and transferred under the condition of regional geographic environment over time, more information can be further predicted by utilizing data from a neighborhood space.
2) Spatial correlation. The spatial domain partitions merge the dispersed atmospheric quality data into a certain target region, with closer regions having finer granularity and farther regions having coarser granularity. In addition, regions of different distances show different effects as a function of distance.
3) And (4) expandability. It reduces complexity compared to the conventional spatial aggregation method by determining the upper limit (number of regions) of the input. In addition, the spatial interpolation method overcomes spatial sparsity by filling missing values of the partitioned regions and generating consistent inputs for all monitoring stations, which enables us to train a model using data of different stations together, increasing the accuracy of the model to a certain extent.
The process of the space conversion method comprises the steps of firstly, selecting a target atmospheric quality monitoring station needing to be predicted as a circle center, and generating an inner monitoring area by taking 5 kilometers as a first radius; generating an outer ring by taking 20 kilometers as a second radius, and taking an area outside the inner monitoring area and inside the outer ring as an outer monitoring area; connecting all monitoring stations in an internal monitoring area with a target monitoring point, acquiring internal monitoring angles between two adjacent monitoring stations and the target monitoring point, taking an angular bisector of the internal monitoring angle with the smallest angle in all the internal monitoring angles as an initial axis, taking every 45 degrees as an internal sector area, and dividing 8 internal sector areas; all monitoring stations in the outer monitoring area are connected with target monitoring points, the outer monitoring angle between two adjacent monitoring stations and the target monitoring points is obtained, the angular bisector of the outer monitoring angle with the minimum angle in all the outer monitoring angles is used as an initial axis, every 45 degrees is used as an outer sector area, and 8 inner sector areas are divided.
Therefore, monitoring stations are arranged in each sector area as much as possible, the use of virtual monitoring stations is reduced, and the accuracy is improved.
Then, judging each sector area, and if one or more monitoring stations exist in one area, distributing weights to the recorded data of each monitoring station in the area according to the distances between the monitoring stations and a target monitoring station to perform regression operation so as to obtain the average monitoring data of the area; if the area has no monitoring station, a virtual monitoring station is generated in the center of the areas, and the data of the virtual monitoring station is interpolated by using a classical spatial interpolation method and inverse Distance weighted IDW (inverse Distance weighted).
The key point of this method is to designate one feature as a primary feature and the other features as secondary features. Wherein the main characteristic refers to the historical atmospheric quality index of a target monitoring station
Figure BDA0002979129140000131
And historical meteorological monitoring data
Figure BDA0002979129140000132
Its data and predicted target data
Figure BDA0002979129140000133
All from the same monitoring station, with auxiliary features
Figure BDA0002979129140000134
And
Figure BDA0002979129140000136
it is a monitored site from 16 sectors of the perimeter.
And (3) a spatial domain aggregation algorithm: when the airspace is divided, due to the distribution unevenness of the monitoring stations on the geographic factors and the limitation of other factors, a plurality of detection stations may exist in some areas, the data is excessive, the redundancy is increased, the weight is distributed to the recorded data of each monitoring station in the area for regression operation, the average monitoring data of the area is obtained, and the following formula is used for calculation:
Figure BDA0002979129140000137
wherein y is the average monitoring data of the area, W is different weight values, and the size of W is determined according to the distance between each monitoring point in the area and the target monitoring point.
And (3) space domain difference algorithm: when the space domain is divided, areas obtained by dividing some remote target monitoring stations do not have monitoring stations, a virtual monitoring station is generated in the area to complement the missing value in the area, and the data of the virtual monitoring station in the area are generated by utilizing the captured data of the monitoring stations in the surrounding area. An inverse distance weighting method is to be used which uses a linear weighted set of available values at known points to calculate the assigned value for an unknown point, using the following formula:
Figure BDA0002979129140000141
where Z (x, y) is the difference prediction output, (x, y) is the difference point coordinates, (xi,yi) Is a discrete point coordinate, wiIs the weight of the discrete point.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims (9)

1. An air quality analysis and prediction method based on deep learning and Bayesian model is characterized by comprising the following steps:
step S1: acquiring AQI data of a target monitoring point;
step S2: preprocessing AQI data, judging abnormal values in a data sequence according to a Laobe criterion, removing the abnormal values, and completing missing data at a certain moment by adopting a linear interpolation method;
step S3: carrying out normalization processing on the AQI data;
step S4: respectively constructing a deep learning convolution network model, a cyclic neural network model and a leaf bass dynamic linear model;
step S5: respectively inputting the normalized AQI data into a deep learning convolution network model and a leaf-Bayesian dynamic linear model, wherein after the deep learning convolution network model operates, a long input sequence is converted into a short sequence formed by high-level features, and after the leaf-Bayesian dynamic linear model operates, a first prediction AQI data is output;
step S6: inputting a sequence consisting of features extracted by the deep learning convolutional network model into a cyclic neural network model, and outputting second prediction AQI data after the cyclic neural network model operates;
step S7: and constructing a mixed model, inputting the first prediction AQI data and the second prediction AQI data into the value mixed model, and outputting final prediction AQI data after the mixed model operates.
2. The air quality analysis and prediction method based on the deep learning and Bayesian model as recited in claim 1, wherein: the normalization processing in step S3 is to reduce the influence of different orders of magnitude or different dimensions on the data by keeping the value range of the data within a relatively small fluctuation range, set the characteristic distribution as a normal distribution, and map the characteristic to the standard normal distribution by the variance and the mean, and the calculation formula is:
Figure FDA0002979129130000011
wherein y ismeanIs the mean of all sample data, ystdIs the standard deviation of all sample data.
3. The air quality analysis and prediction method based on the deep learning and Bayesian model as recited in claim 1, wherein: the step S4 specifically includes:
step S41, selecting training data and test data from the AQI data according to the constructed model, and completing initialization of a deep learning convolution network model, a cyclic neural network model and a leaf Bayes dynamic linear model;
step S42, training a deep learning convolution network model, a cyclic neural network model and a leaf bass dynamic linear model by using training data;
step S43, obtaining a test prediction result according to the test data by utilizing the trained deep learning convolution network model, the trained cyclic neural network model and the trained leaf-Bayes dynamic linear model;
and step S44, predicting by using the trained deep learning convolution network model, the trained cyclic neural network model and the trained leaf-Bayes dynamic linear model.
4. The air quality analysis and prediction method based on the deep learning and Bayesian model as recited in claim 3, wherein: in step S5, the bayesian dynamic linear model includes: observing an equation, a state equation and initial information, regarding the prediction distribution as conditional probability distribution, solving the prediction distribution according to prior information, solving posterior information by using a Bayesian formula, and correcting the prior information to solve a predicted value.
5. The air quality analysis and prediction method based on the deep learning and Bayesian model as recited in claim 3, wherein: for the recurrent neural network model, the loss function of the training phase is as follows:
Figure FDA0002979129130000021
where a is the prediction value and y is the sample value.
6. The air quality analysis and prediction method based on the deep learning and Bayesian model as recited in claim 3, wherein: the cyclic neural network model also comprises an Adam algorithm and a Dropout algorithm;
the Adam algorithm is used for calculating a first moment estimation and a second moment estimation of the gradient to design independent adaptive learning rates for different parameters;
the Dropout algorithm is used to reduce the dependency between features, reducing the probability of over-fitting occurring.
7. The air quality analysis and prediction method based on the deep learning and Bayesian model as recited in claim 1, wherein: step S8, obtaining MEO data;
step S9, carrying out correlation analysis based on the MEO data and the AQI data;
step S10, carrying out backward trace and potential source contribution analysis based on the MEO data and the AQI data;
and step S11, importing the correlation analysis result, the backward trace and the potential source contribution analysis result into the final prediction AQI data together to obtain a comprehensive improvement suggestion.
8. The air quality analysis and prediction method based on deep learning and Bayesian model as claimed in claim 7, wherein: the correlation analysis in step S9 specifically includes: taking PM2.5 and PM10 as first variables, and taking weather, temperature, air pressure, humidity, wind speed and wind direction as second variables, the following formulas are introduced:
Figure FDA0002979129130000031
wherein xiAnd yiIn order to compare the two variables of the correlation,
Figure FDA0002979129130000032
is a variable xiThe average value of (a) of (b),
Figure FDA0002979129130000033
is a variable yiR is a spearman correlation coefficient, r is +1 or-1 when the two variables are perfectly monotonically correlated, and r is 0 when the two variables are uncorrelated.
9. The air quality analysis and prediction method based on deep learning and Bayesian model as claimed in claim 7, wherein: the backward trajectory and potential source contribution analysis in step S10 specifically includes: dividing a research area into i multiplied by j grids according to the longitude and latitude, wherein the PSCF calculation formula is as follows:
Figure FDA0002979129130000034
wherein n isijFor all the gas flow trajectories through a certain grid (i, j), mijIs the number of contamination traces passing through grid (i, j).
CN202110282474.XA 2021-03-16 2021-03-16 Air quality analysis and prediction method based on deep learning and Bayesian model Pending CN112884243A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110282474.XA CN112884243A (en) 2021-03-16 2021-03-16 Air quality analysis and prediction method based on deep learning and Bayesian model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110282474.XA CN112884243A (en) 2021-03-16 2021-03-16 Air quality analysis and prediction method based on deep learning and Bayesian model

Publications (1)

Publication Number Publication Date
CN112884243A true CN112884243A (en) 2021-06-01

Family

ID=76042656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110282474.XA Pending CN112884243A (en) 2021-03-16 2021-03-16 Air quality analysis and prediction method based on deep learning and Bayesian model

Country Status (1)

Country Link
CN (1) CN112884243A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113988400A (en) * 2021-10-22 2022-01-28 重庆工商大学 AQI index prediction method based on PEARSON-LSTM multi-step fusion network
CN114219345A (en) * 2021-12-24 2022-03-22 武汉工程大学 Secondary air quality prediction optimization method based on data mining
CN114757296A (en) * 2022-04-29 2022-07-15 广东技术师范大学 Cooperative data-based pollutant analysis method and device
US20220316734A1 (en) * 2021-04-14 2022-10-06 Jiangnan University Deep Spatial-Temporal Similarity Method for Air Quality Prediction
CN115512849A (en) * 2022-09-15 2022-12-23 北京理工大学 Low-oxygen closed-loop intervention system of plateau pre-learning clothes
CN115878695A (en) * 2023-02-20 2023-03-31 中国民用航空局空中交通管理局航空气象中心 Data visualization adjusting method and system based on meteorological database
CN116304913A (en) * 2023-04-07 2023-06-23 中国长江三峡集团有限公司 Water quality state monitoring method and device based on Bayesian model and electronic equipment
CN116451853A (en) * 2023-04-06 2023-07-18 湖南工商大学 Atmospheric quality monitoring method, system, electronic equipment and storage medium
CN117909931A (en) * 2024-01-19 2024-04-19 江苏智伦数字技术研究有限公司 Air quality deducing method, terminal and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107909206A (en) * 2017-11-15 2018-04-13 电子科技大学 A kind of PM2.5 Forecasting Methodologies based on deep structure Recognition with Recurrent Neural Network
CN108268935A (en) * 2018-01-11 2018-07-10 浙江工业大学 A kind of PM2.5 concentration values Forecasting Methodology and system based on sequential Recognition with Recurrent Neural Network
CN108898261A (en) * 2018-07-24 2018-11-27 深圳市源广浩电子有限公司 A kind of air quality monitoring method and system based on environmentally friendly big data
CN109492830A (en) * 2018-12-17 2019-03-19 杭州电子科技大学 A kind of mobile pollution source concentration of emission prediction technique based on space-time deep learning
CN110161183A (en) * 2019-05-30 2019-08-23 广东柯内特环境科技有限公司 A kind of air quality monitoring method
CN111798051A (en) * 2020-07-02 2020-10-20 杭州电子科技大学 Air quality space-time prediction method based on long-short term memory neural network
CN112115004A (en) * 2020-07-29 2020-12-22 西安交通大学 Hard disk service life prediction method based on back propagation Bayes deep learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107909206A (en) * 2017-11-15 2018-04-13 电子科技大学 A kind of PM2.5 Forecasting Methodologies based on deep structure Recognition with Recurrent Neural Network
CN108268935A (en) * 2018-01-11 2018-07-10 浙江工业大学 A kind of PM2.5 concentration values Forecasting Methodology and system based on sequential Recognition with Recurrent Neural Network
CN108898261A (en) * 2018-07-24 2018-11-27 深圳市源广浩电子有限公司 A kind of air quality monitoring method and system based on environmentally friendly big data
CN109492830A (en) * 2018-12-17 2019-03-19 杭州电子科技大学 A kind of mobile pollution source concentration of emission prediction technique based on space-time deep learning
CN110161183A (en) * 2019-05-30 2019-08-23 广东柯内特环境科技有限公司 A kind of air quality monitoring method
CN111798051A (en) * 2020-07-02 2020-10-20 杭州电子科技大学 Air quality space-time prediction method based on long-short term memory neural network
CN112115004A (en) * 2020-07-29 2020-12-22 西安交通大学 Hard disk service life prediction method based on back propagation Bayes deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孔珊珊等: "基于后向轨迹模式的北京市PM2.5来源分布及传输特征探讨", 中国环境管理, no. 1, 6 April 2017 (2017-04-06), pages 86 - 90 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220316734A1 (en) * 2021-04-14 2022-10-06 Jiangnan University Deep Spatial-Temporal Similarity Method for Air Quality Prediction
US11512864B2 (en) * 2021-04-14 2022-11-29 Jiangnan University Deep spatial-temporal similarity method for air quality prediction
CN113988400B (en) * 2021-10-22 2024-04-09 重庆工商大学 AQI index prediction method based on PEARSON-LSTM multi-step fusion network
CN113988400A (en) * 2021-10-22 2022-01-28 重庆工商大学 AQI index prediction method based on PEARSON-LSTM multi-step fusion network
CN114219345A (en) * 2021-12-24 2022-03-22 武汉工程大学 Secondary air quality prediction optimization method based on data mining
CN114757296A (en) * 2022-04-29 2022-07-15 广东技术师范大学 Cooperative data-based pollutant analysis method and device
CN114757296B (en) * 2022-04-29 2022-12-13 广东技术师范大学 Cooperative data-based pollutant analysis method and device
CN115512849A (en) * 2022-09-15 2022-12-23 北京理工大学 Low-oxygen closed-loop intervention system of plateau pre-learning clothes
CN115878695B (en) * 2023-02-20 2023-05-19 中国民用航空局空中交通管理局航空气象中心 Data visualization adjustment method and system based on meteorological database
CN115878695A (en) * 2023-02-20 2023-03-31 中国民用航空局空中交通管理局航空气象中心 Data visualization adjusting method and system based on meteorological database
CN116451853A (en) * 2023-04-06 2023-07-18 湖南工商大学 Atmospheric quality monitoring method, system, electronic equipment and storage medium
CN116451853B (en) * 2023-04-06 2023-12-15 湖南工商大学 Atmospheric quality monitoring method, system, electronic equipment and storage medium
CN116304913A (en) * 2023-04-07 2023-06-23 中国长江三峡集团有限公司 Water quality state monitoring method and device based on Bayesian model and electronic equipment
CN117909931A (en) * 2024-01-19 2024-04-19 江苏智伦数字技术研究有限公司 Air quality deducing method, terminal and storage medium

Similar Documents

Publication Publication Date Title
CN112884243A (en) Air quality analysis and prediction method based on deep learning and Bayesian model
Chang et al. An LSTM-based aggregated model for air pollution forecasting
Seng et al. Spatiotemporal prediction of air quality based on LSTM neural network
CN109492830B (en) Mobile pollution source emission concentration prediction method based on time-space deep learning
Farmani et al. Pipe failure prediction in water distribution systems considering static and dynamic factors
Lindemann et al. Anomaly detection and prediction in discrete manufacturing based on cooperative LSTM networks
CN111080070B (en) Urban land utilization cellular automaton simulation method based on space errors
CN110555551B (en) Air quality big data management method and system for smart city
CN112131731B (en) Urban growth cellular simulation method based on spatial feature vector filtering
CN113902580B (en) Historical farmland distribution reconstruction method based on random forest model
CN111814956B (en) Multi-task learning air quality prediction method based on multi-dimensional secondary feature extraction
CN101893674A (en) Pollution flashover index forecasting method for regional power grid
CN110533239B (en) Smart city air quality high-precision measurement method
CN113011455B (en) Air quality prediction SVM model construction method
CN104506162A (en) Fault prognosis method for high-order particle filter on basis of LS-SVR (least squares support vector regression) modeling
Ashkboos et al. Ens-10: A dataset for post-processing ensemble weather forecasts
CN116805439A (en) Drought prediction method and system based on artificial intelligence and atmospheric circulation mechanism
CN115542429A (en) XGboost-based ozone quality prediction method and system
Li et al. A multi-factor combination prediction model of carbon emissions based on improved CEEMDAN
Panjapornpon et al. Energy efficiency and savings analysis with multirate sampling for petrochemical process using convolutional neural network-based transfer learning
CN116525135B (en) Method for predicting epidemic situation development situation by space-time model based on meteorological factors
CN117972625A (en) Attention neural network data assimilation method based on four-dimensional variation constraint
CN117313351A (en) Safe data center optimized cold prediction method and system
CN116070669B (en) Workshop energy consumption prediction method and management system based on improved deep belief network
CN114970745B (en) Intelligent security and environment big data system of Internet of things

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination