CN117408128A - Air quality simulation and observation machine learning NO 2 Coupling forecasting method - Google Patents
Air quality simulation and observation machine learning NO 2 Coupling forecasting method Download PDFInfo
- Publication number
- CN117408128A CN117408128A CN202310394676.2A CN202310394676A CN117408128A CN 117408128 A CN117408128 A CN 117408128A CN 202310394676 A CN202310394676 A CN 202310394676A CN 117408128 A CN117408128 A CN 117408128A
- Authority
- CN
- China
- Prior art keywords
- machine learning
- lstm
- model
- cmaq
- wrf
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010801 machine learning Methods 0.000 title claims abstract description 63
- 230000008878 coupling Effects 0.000 title claims abstract description 43
- 238000010168 coupling process Methods 0.000 title claims abstract description 43
- 238000005859 coupling reaction Methods 0.000 title claims abstract description 43
- 238000004088 simulation Methods 0.000 title claims abstract description 28
- 238000013277 forecasting method Methods 0.000 title claims description 21
- 239000003344 environmental pollutant Substances 0.000 claims abstract description 39
- 231100000719 pollutant Toxicity 0.000 claims abstract description 39
- 238000012937 correction Methods 0.000 claims abstract description 29
- 238000012544 monitoring process Methods 0.000 claims abstract description 26
- 238000010187 selection method Methods 0.000 claims abstract description 6
- 238000012549 training Methods 0.000 claims description 17
- 230000005855 radiation Effects 0.000 claims description 15
- 230000004907 flux Effects 0.000 claims description 10
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 230000005251 gamma ray Effects 0.000 claims description 3
- 238000003066 decision tree Methods 0.000 claims 1
- 238000007637 random forest analysis Methods 0.000 claims 1
- 238000000034 method Methods 0.000 abstract description 19
- MWUXSHHQAYIFBG-UHFFFAOYSA-N Nitric oxide Chemical compound O=[N] MWUXSHHQAYIFBG-UHFFFAOYSA-N 0.000 abstract 3
- 238000009792 diffusion process Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 4
- 239000000809 air pollutant Substances 0.000 description 3
- 231100001243 air pollutant Toxicity 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000002347 injection Methods 0.000 description 2
- 239000007924 injection Substances 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000003915 air pollution Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Abstract
The invention discloses an air quality simulation and observation machine learning NO 2 A method of coupling forecasting, the method comprising the steps of: establishing a database; establishing a feature selection method based on greedy ideas, and determining an optimal feature variable set, wherein the features comprise meteorological indexes and pollutants; determining a machine learning model for correcting the WRF-CMAQ simulation based on the optimal characteristic variable set, and adopting the machine learning model to correct the WRF-The CMAQ forecast result is corrected, and the obtained correction result is marked as Pred wrf‑cmaq The method comprises the steps of carrying out a first treatment on the surface of the Establishing LSTM vs. NO based on monitoring data 2 Is used for NO 2 Concentration is predicted to obtain a prediction result Pred based on the monitoring data lstm The method comprises the steps of carrying out a first treatment on the surface of the Correction result Pred based on Lasso model coupling WRF-CMAQ simulation wrf‑cmaq Pred with LSTM prediction result lstm Obtaining NO 2 Is a final predicted concentration of (c). The invention can provide NO based on each air quality monitoring point position for each large city 2 The concentration forecast better serves for preventing the nitrogen oxide pollution in the bad pollution diffusion weather.
Description
Technical Field
The invention belongs to the technical field of air quality management, and particularly relates to an air quality simulation and observation machine learning NO 2 A coupling forecasting method.
Background
At present, a plurality of existing air quality early warning and forecasting systems exist, for example, a pre-driving medium-long-term air quality forecasting system and method are disclosed in Chinese patent publication No. CN110489836A, and a BP neural network air quality forecasting method based on space grouping modeling is disclosed in Chinese patent publication No. CN115639628A, however, the existing air quality early warning and forecasting systems only use a numerical mode or only use a machine learning method, so that the air quality forecasting deviation of some extreme weather is larger, and even the air quality grade forecasting is wrong. In the current situation, the main air quality pollutant concentration prediction is realized by numerical mode simulation and manual correction, and the accurate air quality prediction cannot be realized only by means of a model.
Therefore, a method for accurately forecasting the air quality pollutants is urgently needed for air quality early warning and decision support so as to promote more cities or different scale areas to effectively improve the air quality, and the air quality standard is reached early and the world health organization guidance value is reached early.
Through the above analysis, the problems and defects existing in the prior art are as follows: 1) The existing air quality forecasting system has low accuracy for extreme weather forecast, and cannot provide guidance for extreme weather pollutant concentration early warning. 2) In the prior art, the air quality pollutants are predicted only through a numerical mode or a machine learning method, and the prediction result has low accuracy and poor interpretability; 3) Most of the existing air quality forecasting systems need manual consultation to correct, and are high in cost and low in accuracy.
Disclosure of Invention
The invention aims at the traditional air quality model pair NO 2 The concentration forecast is inaccurate, and the interpretability of the single machine learning forecast result is low, and the air quality simulation and observation machine learning NO is provided 2 A coupling forecasting method.
To achieve the aim of the invention, the invention provides NO based on secondary modeling 2 A method of coupling forecasting, the method comprising the steps of:
s11, establishing a database;
s12, establishing a greedy thought-based feature selection method, and determining an optimal feature variable set, wherein the features comprise meteorological indexes and pollutants; determining a machine learning model for correcting the WRF-CMAQ simulation based on the optimal characteristic variable set, correcting a WRF-CMAQ forecasting result by adopting the machine learning model, and marking the obtained correction result as Pred wrf-cmaq ;
S13, establishing LSTM to NO based on monitoring data 2 Is used for NO 2 Concentration is predicted to obtain a monitoring-based solutionPrediction result Pred of data lstm ;
S14, modifying result Pred based on Lasso model coupling WRF-CMAQ simulation wrf-cmaq Pred with LSTM prediction result lstm Obtaining a coupling prediction model to obtain NO 2 Is a final predicted concentration of (c).
Further, the database is established, comprising the following steps:
s111, selecting cities and sites which need air quality early warning and forecasting;
s112, acquiring an hour forecast value of a meteorological index from a WRF mode; acquiring an hour concentration forecast value of each pollutant from the CMAQ mode; acquiring the hour concentration data of each atmospheric pollutant from an air quality real-time release network and a provincial level and municipal level atmospheric pollutant monitoring network; the hour weather monitoring data are obtained from the weather station, cleaned and stored in a database.
Further, the meteorological indexes in step S112 include 14 items of wind direction, wind speed, temperature, humidity, specific humidity, rainfall, cloud cover, air pressure, boundary layer height, heat sensing flux, latent heat flux, long wave radiation, short wave radiation and ground solar radiation, and the pollutants include NO 2 、SO 2 、PM 10 、PM 2.5 、O 3 And CO.
Further, the specific steps of step S12 include:
s121, selecting NO 2 Inputting all the collected characteristics into a plurality of machine learning models one by one as a target value for training, selecting one item with the lowest average absolute error for each machine learning model to be put into a characteristic variable set, introducing the next characteristic on the basis, training and optimizing the next item, repeating the steps until the error is not reduced, and stopping introducing the next item to obtain an optimal characteristic variable set for each machine learning model;
s122, respectively inputting the optimal characteristic variable sets of the machine learning models into the corresponding machine learning models for training, and forecasting the NO of the WRF-CMAQ of the preset days in the future 2 Pre-correcting the concentration value;
s123, respectively evaluating the accuracy of the correction results of the models, and determining an optimal machine learning model;
s124, correcting the prediction result of the WRF-CMAQ by adopting the optimal machine learning model, and marking the obtained correction result as Pred wrf-cmaq 。
Further, in step S121, the collected feature data is expressed as:
where n is the number of features and m is the feature variable, i.e., the value of each feature per hour, for the ith feature can be expressed as a vector: (x) i1 ,x i2 ,…x im ) T ,i=1,2,…n,x im For the value of the ith feature, mth hour, superscript T denotes transpose; target pollutant NO 2 Denoted as y= { Y 1 ,y 2 ,…,y m Respectively representing target pollutants NO of future preset days 2 Concentration values.
Further, in step S121, the expression of the average absolute error is
Wherein N is the number of target values, sim dt NO representing day t of model d 2 Predicted value, obs t NO on day t 2 The concentration value is monitored.
Further, the machine learning model includes an XGBoost model, an SVR model, an RF model, a FNN model, a GBDT model, a LightGBM model, and a GRU model.
Further, in step S123, the correlation coefficient, the average absolute error and the root mean square error are selected to evaluate the accuracy of the model correction result, and the machine learning model with the highest accuracy is selected as the optimal machine learning model through comparison.
Further, in step 13, the optimal feature set Inp lstm Inputting LSTM to NO 2 To obtain a prediction result Pred lstm Wherein the optimal feature set Inp lstm The acquisition mode of (a) is as follows:
the acquired hour monitoring data and hour meteorological monitoring data of the atmospheric pollutants are respectively input into an LSTM model one by one for training;
selecting a item with the lowest average absolute error, putting the item into a feature variable set, introducing the next feature on the basis, training and optimizing the next feature, repeating the steps until the error is not reduced, stopping introducing the next feature, and obtaining a feature variable Inp which is optimal for the LSTM model finally lstm 。
Further, in step S14, the obtained coupling prediction model is
Wherein: gamma ray j For the j-th forecast value, corresponding to the j-th day NO 2 Coupling the forecast values; x is x j.f Correction result Pred representing the j th day wrf-cmaq And prediction result Pred lstm ;b f Regression coefficients for the f-th input variable; epsilon is the offset; n=2, representing the coupling of the WRF-CMAQ correction and LSTM prediction values to the two model output values.
Compared with the prior art, the invention at least has the following beneficial effects:
according to the method, based on the air quality model forecasting result, a machine learning forecasting model containing pollutant concentration and meteorological observation data is built, so that the change trend of pollutants and meteorological parameters can be considered simultaneously, and the problem that the interpretability is low when statistical modeling is carried out based on observation alone can be solved. In the process of model construction, taking account of collinearity existing between meteorological parameters and air pollutants and between the air pollutants and the air pollutants, the invention firstly introduces a feature selection method based on greedy ideas to solve the problem of collinearity among features. The method has the advantages of outstanding advantages, definite physical meaning of parameters and strong applicability.
Drawings
The accompanying drawings, which are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a diagram of air quality simulation and observation machine learning NO provided by an embodiment of the present invention 2 A flow chart of the steps of the coupling forecasting method.
FIG. 2 is a schematic view of a predicted area according to an embodiment of the present invention.
FIG. 3 shows ten street-suppressing NO in the future three days in the area predicted by the coupled prediction model and WRF-CMAQ, FNN, LSTM pair in the embodiment of the invention 2 The concentration forecasting effect is evaluated and compared with a graph, wherein (a) is a comparison schematic diagram of a correlation coefficient R, (b) is a comparison schematic diagram of an average absolute error MAE, and (c) is a comparison schematic diagram of a root mean square error RMSE.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the objects of the present invention will be further described in detail below with reference to the accompanying drawings and specific examples, which are not to be construed as being limiting, but the exemplary embodiments of the present invention and the descriptions thereof are only for explaining the present invention.
Referring to FIG. 1, the present invention provides an air quality simulation and observation machine learning NO 2 The coupling forecasting method comprises the following steps:
step 11: establishing NO based on each meteorological index, pollutant concentration and meteorological monitoring data 2 And forecasting the database required by the coupling model.
The method specifically comprises the following steps:
s111, selecting cities and sites which need air quality early warning and forecasting according to the air pollution prevention and control requirements;
s112, acquiring the hour forecast values of 14 weather indexes including wind direction, wind speed, temperature, humidity, specific humidity, rainfall, cloud cover, air pressure, boundary layer height, heat sensing flux, latent heat flux, long wave radiation, short wave radiation and ground solar radiation from a WRF mode; from CMAQ modeAcquisition of NO 2 、SO 2 、PM 10 、PM 2.5 、O 3 And the predicted values of the hourly concentration of six conventional pollutants of CO; acquiring the hour concentration data of each atmospheric pollutant from an air quality real-time release network and a provincial level and municipal level atmospheric pollutant monitoring network; the hour weather monitoring data are obtained from the weather station, cleaned and stored in a database.
In some embodiments of the present invention, a certain area is selected as an area to be forecasted, as shown in fig. 2, the occupied areas of d01, d02, d03 and d04 are sequentially reduced, the d04 area is selected as the forecasted area, and the d04 area includes 10 town streets. Four-layer nested simulation based on WRF-CMAQ obtains weather data (wind direction, wind speed, temperature, humidity, specific humidity, rainfall, cloud cover, air pressure, boundary layer height, heat sensing flux, latent heat flux, long wave radiation, short wave radiation, ground solar radiation) and six conventional pollutant data (NO) 2 、SO 2 、PM 10 、PM 2.5 、O 3 CO); acquiring the hour concentration data of each atmospheric pollutant from an air quality real-time release network and a provincial level and municipal level atmospheric pollutant monitoring network; the hour weather monitoring data are obtained from the weather station, cleaned and stored in a database.
And 12, establishing a greedy thought-based feature selection method, and determining an optimal feature variable set, wherein the features comprise meteorological indexes and pollutants. Determining a machine learning model for performing simulated correction on the WRF-CMAQ based on the optimal characteristic variable set, correcting a WRF-CMAQ forecasting result by adopting the machine learning model, and recording the obtained correction result as Pred wrf-cmaq 。
Feature selection may reduce variable introduction of redundant information, improve modeling efficiency and accuracy, in some embodiments of the invention, select NO for three days in the future 2 As the target value.
The method comprises the following steps:
step 121: inputting all the collected characteristics into a plurality of machine learning models one by one for training, and obtaining an optimal characteristic variable set aiming at each machine learning model after greedy ideological characteristic selection;
all the features collected are
Wherein n is the characteristic quantity (characteristics include (1) meteorological data including wind direction, wind speed, temperature, humidity, specific humidity, rainfall, cloud cover, air pressure, boundary layer height, heat sensing flux, latent heat flux, long wave radiation, short wave radiation, ground solar radiation, and (2) pollutant concentration data including SO 2 、NO 2 、PM 10 、PM 2.5 、O 3 CO), m is the characteristic variable, i.e. the value of each characteristic per hour. The i-th feature can be expressed as a vector: (x) i1 ,x i2 ,…x im ) T ,i=1,2,…n,x im For the value of the ith feature, mth hour, superscript T denotes transpose; target pollutant NO 2 Denoted as y= { Y 1 ,y 2 ,…,y m Respectively representing target pollutants NO of future preset days 2 Concentration value, y m Indicating target pollutant NO on future day m 2 Concentration values. In some of the embodiments of the present invention, y= { Y 1 ,y 2 ,y 3 3 days in the future, target pollutant NO 2 Concentration values.
Step 122: respectively inputting the optimal characteristic variable set of each machine learning model into the corresponding machine learning model for training, and forecasting the NO of the WRF-CMAQ three days in the future 2 Pre-correcting the concentration value;
step 123: respectively evaluating the accuracy of the correction result of each machine learning model, and determining an optimal machine learning model;
step 124: correcting the prediction result of the WRF-CMAQ by adopting the optimal machine learning model, and marking the obtained correction result as Pred wrf-cmaq 。
Wherein the machine learning model comprises an XGBoost model, an SVR model, an RF model and a FNN model. The method is not limited to the above 4 models, and can be usedMachine learning models such as GBDT model, lightGBM model and GRU model. Specifically, in some embodiments of the present invention, in step 121, all the collected features are respectively input into XGBoost, SVR, RF, FNN models one by one for training, one with the lowest Mean Absolute Error (MAE) is selected to be put into the feature variable set, the next feature is introduced on the basis, the training and the preferential selection are performed similarly, the above steps are repeated until the errors are not reduced any more, and the introduction is stopped, so as to obtain the feature variable set Inp which is optimal for each machine learning model finally xgb ,Inp svr ,Inp rf ,Inp fnn See table 1.
Wherein the average absolute error is expressed as:
wherein N is the number of target values, sim dt NO representing day t of model d 2 Predicted value, obs t NO on day t 2 The concentration value is monitored.
TABLE 1 simulation value initial variable and each correction model characteristic variable
Based on the result of feature selection, an optimal feature variable set Inp of a single model xgb ,Inp svr ,Inp rf ,Inp fnn Respectively inputting into a machine learning model XGBoost, SVR, RF, FNN for training, and pre-correcting the pollutant concentration value predicted by WRF-CMAQ three days in the future, and marking as Pred xgb ,Pred svr ,Pred rf ,Pred fnn ;
Selecting a correlation coefficient (R), an average absolute error (MAE) and a Root Mean Square Error (RMSE) to respectively evaluate XGBoost, SVR, RF, FNN model correction result Pred xgb ,Pred svr ,Pred rf ,Pred fnn The accuracy of (2) is shown in Table 2;
through comparison, FNN with highest accuracy is selected as a correction model of the WRF-CMAQ, and the prediction result of the WRF-CMAQ is corrected by using the FNN model, so that the correction result is marked as Pred wrf-cmaq 。
Table 2 comparison of accuracy of correction models
( And (3) injection: day_1, day_2, day_3, and AVE in the table represent average values of forecast first, second, third, and third DAYs, respectively )
Step 13: establishing LSTM vs. NO based on monitoring data 2 Is used for NO 2 Concentration is predicted to obtain a prediction result Pred based on the monitoring data lstm 。
Atmospheric pollutants (SO) 2 、NO 2 、PM 10 、PM 2.5 、O 3 CO) and hour meteorological monitoring data (temperature, humidity, barometric pressure, wind direction, wind speed), in some embodiments of the invention, the target pollutant NO to be predicted is selected 2 As a tag value;
in order to reduce redundancy of input features and improve calculation speed and prediction accuracy, a greedy thought feature selection method is used for selecting the input features of LSTM, and the optimal feature set is selected and marked as Inp lstm See table 3 for details.
Wherein the greedy idea is characterized by the fact that the obtained hour monitoring data (SO 2 、NO 2 、PM 10 、PM 2.5 、O 3 And CO) and hour meteorological monitoring data (temperature, humidity, air pressure, wind direction and wind speed) are respectively input into the LSTM model one by one for training. Similarly, one item with the lowest Mean Absolute Error (MAE) is selected to be placed in the feature variable set, the next feature is introduced on the basis, training and preference are also carried out, and the steps are repeated until the error is reachedNo reduction is carried out, the introduction is stopped, and finally, the optimal characteristic variable Inp aiming at the LSTM model is obtained lstm See table 3. And uses the optimal feature set Inp lstm Inputting LSTM model to NO in three days in future 2 Predicting the concentration to obtain a prediction result Pred based on monitoring lstm 。
TABLE 3 initial variables of monitor values and LSTM model feature variables
Step 14: correction result Pred based on Lasso model coupling WRF-CMAQ simulation wrf-cmaq Pred with LSTM prediction result lstm Obtaining a coupling prediction model, and obtaining NO based on the coupling prediction model 2 Is a final predicted concentration of (c).
In some embodiments of the present invention, three future days of NO are obtained after the previous two steps of WRF-CMAQ correction modeling and LSTM monitor modeling 2 Daily correction value Pred wrf-cmaq And LSTM predicted output value Pred lstm 。
The two outputs are integrated, a coupling prediction model is established, the advantages of different models can be integrated, and a more accurate result is obtained compared with the output of a single model. Considering that the learning ability of the correction model and the LSTM model is strong, if an integration method with higher complexity is selected, the occurrence of the coupling forecast over-fitting condition can be aggravated, so that the integration is carried out by adopting the Lasso method.
The obtained coupling forecast model is
Wherein: gamma ray j For the j-th forecast value, corresponding to the j-th day NO 2 Coupling forecast values, j=1, 2, 3; x is x j.f Pred representing day j wrf-cmaq And Pred lstm ;b f Regression coefficients for the f-th input variable; epsilon is the offset; n=2, representing that the WRF-CMAQ correction value and the LSTM predictive value are divided into two modelsCoupling of output values.
The method compresses regression coefficient by constructing penalty term, thereby reducing complexity of model, and realizing Pred wrf-cmaq And Pred lstm As input to Lasso, the coupled forecast values were obtained by Lasso regression, see table 4. It can be seen that air quality simulates and observes machine learning NO 2 Compared with the traditional air quality forecasting method (WRF-CMAQ) and a single machine learning forecasting method (FNN, RF, SVR, XGBoost, LSTM), the coupling forecasting method has the advantages that the forecasting accuracy is greatly improved, and decision support can be provided for air quality forecasting and early warning.
Table 2 model pairs NO during test period 2 Forecast effect evaluation
( And (3) injection: day_1, day_2, day_3, and AVE in the table represent average values of forecast first, second, third, and third DAYs, respectively )
In some embodiments of the invention, the coupling model of the invention is used to determine NO for three days in the future for ten street-breaking sites in zone d04 2 The concentration was predicted, and the prediction effect was evaluated as shown in fig. 3 (a) - (c). From the graph, the coupling forecast model pair NO provided by the invention 2 The forecasting ability of the model is far better than other models, and the model can be used for NO 2 Concentration forecast, and also describes the coupling forecast model for the region NO 2 The concentration has better forecasting capability.
The coupling forecasting method provided by the embodiment of the invention fully exerts the fitting capacity of machine learning on nonlinear problems through the secondary modeling of the machine learning method on the basis of the traditional air quality numerical simulation, and realizes NO 2 Is a precise forecast of (1). The method has the advantages of outstanding advantages, definite physical meaning of parameters and strong applicability.
The sequence numbers before the steps of the method are used for convenience of description, and the sequence of the steps is not limited.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. Air quality simulation and observation machine learning NO 2 The coupling forecasting method is characterized by comprising the following steps of:
s11, establishing a database;
s12, establishing a greedy thought-based feature selection method, and determining an optimal feature variable set, wherein the features comprise meteorological indexes and pollutants; determining a machine learning model for correcting the WRF-CMAQ simulation based on the optimal characteristic variable set, correcting a WRF-CMAQ forecasting result by adopting the machine learning model, and marking the obtained correction result as Pred wrf-cmaq ;
S13, establishing a long and short memory neural network LSTM pair NO based on the monitoring data 2 Is used for NO 2 Concentration is predicted to obtain a prediction result Pred based on the monitoring data lstm ;
S14, modifying result Pred based on Lasso model coupling WRF-CMAQ simulation wrf-cmaq Pred with LSTM prediction result lstm Obtaining a coupling prediction model, and obtaining NO based on the coupling prediction model 2 Is a final predicted concentration of (c).
2. An air quality simulation and observation machine learning NO according to claim 1 2 The coupling forecasting method is characterized by establishing the database and comprises the following steps:
s111, selecting cities and sites which need air quality early warning and forecasting;
s112, acquiring an hour forecast value of a meteorological index from a WRF mode; acquiring an hour concentration forecast value of each pollutant from the CMAQ mode; acquiring the hour concentration data of each atmospheric pollutant from an air quality real-time release network and a provincial level and municipal level atmospheric pollutant monitoring network; the hour weather monitoring data are obtained from the weather station, cleaned and stored in a database.
3. An air quality simulation and observation machine learning NO according to claim 2 2 The coupling forecasting method is characterized in that the meteorological indexes in the step S112 comprise 14 items of wind direction, wind speed, temperature, humidity, specific humidity, rainfall, cloud cover, air pressure, boundary layer height, heat sensing flux, latent heat flux, long wave radiation, short wave radiation and ground solar radiation, and the pollutants comprise NO 2 、SO 2 、PM 10 、PM 2.5 、O 3 And CO.
4. An air quality simulation and observation machine learning NO according to claim 1 2 The coupling forecasting method is characterized in that the specific steps of the step S12 comprise:
s121, selecting NO 2 Inputting all the collected characteristics into a plurality of machine learning models one by one as a target value for training, selecting one item with the lowest average absolute error for each machine learning model to be put into a characteristic variable set, introducing the next characteristic on the basis, training and optimizing the next item, repeating the steps until the error is not reduced, and stopping introducing the next item to obtain an optimal characteristic variable set for each machine learning model;
s122, respectively inputting the optimal characteristic variable sets of the machine learning models into the corresponding machine learning models for training, and forecasting the NO of the WRF-CMAQ of the preset days in the future 2 Pre-correcting the concentration value;
s123, respectively evaluating the accuracy of the correction results of the models, and determining an optimal machine learning model;
s124, correcting the prediction result of the WRF-CMAQ by adopting the optimal machine learning model, and marking the obtained correction result as Pred wrf-cmaq 。
5. An air quality simulation and observation machine learning NO according to claim 4 2 The coupling forecasting method is characterized in that in step S121, the collected characteristic data is expressed as:
where n is the number of features and m is the feature variable, i.e., the value of each feature per hour, for the ith feature can be expressed as a vector: (x) i1 ,x i2 ,…x im ) T ,i=1,2,…n,x im For the value of the ith feature, mth hour, superscript T denotes transpose; target pollutant NO 2 Denoted as y= { Y 1 ,y 2 ,…,y m },y m Indicating target pollutant NO on future day m 2 Concentration values.
6. An air quality simulation and observation machine learning NO according to claim 4 2 The coupling forecasting method is characterized in that in step S121, the expression of the average absolute error is that
Wherein N is the number of target values, sim dt NO representing day t of model d 2 Predicted value, obs t NO on day t 2 The concentration value is monitored.
7. An air quality simulation and observation machine learning NO according to claim 4 2 The coupling forecasting method is characterized in that,the machine learning model comprises an extreme gradient lifting tree XGBoost, a support vector regression SVR, a random forest RF, a feedforward neural network FNN model, a gradient lifting decision tree GBDT, a distributed gradient lifting framework LightGBM and a gate control unit GRU.
8. An air quality simulation and observation machine learning NO according to claim 4 2 The coupling forecasting method is characterized in that in step S123, the accuracy of the model correction result is evaluated by selecting a correlation coefficient, an average absolute error and a root mean square error, and the machine learning model with the highest accuracy is selected as the optimal machine learning model through comparison.
9. An air quality simulation and observation machine learning NO according to claim 1 2 The coupling forecasting method is characterized in that in step 13, the optimal feature set Inp lstm Inputting LSTM to NO 2 To obtain a prediction result Pred lstm Wherein the optimal feature set Inp lstm The acquisition mode of (a) is as follows:
the acquired hour monitoring data and hour meteorological monitoring data of the atmospheric pollutants are respectively input into an LSTM model one by one for training;
selecting a item with the lowest average absolute error, putting the item into a feature variable set, introducing the next feature on the basis, training and optimizing the next feature, repeating the steps until the error is not reduced, stopping introducing the next feature, and obtaining a feature variable Inp which is optimal for the LSTM model finally lstm 。
10. An air quality simulation and observation machine learning NO according to any one of claims 1-9 2 The coupling forecasting method is characterized in that in the step S14, the obtained coupling forecasting model is that
Wherein: gamma ray j For the j-th forecast value, corresponding to the j-th day NO 2 Coupling the forecast values; x is x j.f Correction result Pred representing the j th day wrf-cmaq And prediction result Pred lstm ;b f Regression coefficients for the f-th input variable; epsilon is the offset; n=2, representing the coupling of the WRF-CMAQ correction and LSTM prediction values to the two model output values.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310394676.2A CN117408128A (en) | 2023-04-13 | 2023-04-13 | Air quality simulation and observation machine learning NO 2 Coupling forecasting method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310394676.2A CN117408128A (en) | 2023-04-13 | 2023-04-13 | Air quality simulation and observation machine learning NO 2 Coupling forecasting method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117408128A true CN117408128A (en) | 2024-01-16 |
Family
ID=89489639
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310394676.2A Pending CN117408128A (en) | 2023-04-13 | 2023-04-13 | Air quality simulation and observation machine learning NO 2 Coupling forecasting method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117408128A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110261547A (en) * | 2019-07-04 | 2019-09-20 | 北京思路创新科技有限公司 | A kind of Urban Air Pollution Methods and equipment |
CN113280378A (en) * | 2021-05-27 | 2021-08-20 | 华南理工大学 | Online oil smoke monitoring system with self-cleaning function and control method |
CN113627529A (en) * | 2021-08-11 | 2021-11-09 | 成都佳华物链云科技有限公司 | Air quality prediction method, device, electronic equipment and storage medium |
CN115881239A (en) * | 2022-09-13 | 2023-03-31 | 重庆市生态环境大数据应用中心 | Method for dividing atmospheric pollution weak diffusion area based on extended WRF and CMAQ models |
-
2023
- 2023-04-13 CN CN202310394676.2A patent/CN117408128A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110261547A (en) * | 2019-07-04 | 2019-09-20 | 北京思路创新科技有限公司 | A kind of Urban Air Pollution Methods and equipment |
CN113280378A (en) * | 2021-05-27 | 2021-08-20 | 华南理工大学 | Online oil smoke monitoring system with self-cleaning function and control method |
CN113627529A (en) * | 2021-08-11 | 2021-11-09 | 成都佳华物链云科技有限公司 | Air quality prediction method, device, electronic equipment and storage medium |
CN115881239A (en) * | 2022-09-13 | 2023-03-31 | 重庆市生态环境大数据应用中心 | Method for dividing atmospheric pollution weak diffusion area based on extended WRF and CMAQ models |
Non-Patent Citations (2)
Title |
---|
LU, H等: "Adjusting prediction of ozone concentration based on CMAQ model and machine learning methods in Sichuan-Chongqing region, China", 《ATMOSPHERIC POLLUTION RESEARCH》, vol. 12, no. 6, 30 June 2021 (2021-06-30), pages 1 - 13 * |
康俊锋等: "XGBoost-LSTM 变权组合模型支持下短期PM2.5 浓度预测——以上海为例", 《中国环境科学》, vol. 41, no. 9, 30 April 2021 (2021-04-30), pages 4016 - 4025 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Heydari et al. | A novel composite neural network based method for wind and solar power forecasting in microgrids | |
Ibrahim et al. | A novel hybrid model for hourly global solar radiation prediction using random forests technique and firefly algorithm | |
Ramsami et al. | A hybrid method for forecasting the energy output of photovoltaic systems | |
CN110782093B (en) | PM fusing SSAE deep feature learning and LSTM2.5Hourly concentration prediction method and system | |
CN111695731B (en) | Load prediction method, system and equipment based on multi-source data and hybrid neural network | |
CN111665575B (en) | Medium-and-long-term rainfall grading coupling forecasting method and system based on statistical power | |
CN113537600B (en) | Medium-long-term precipitation prediction modeling method for whole-process coupling machine learning | |
CN116451879B (en) | Drought risk prediction method and system and electronic equipment | |
CN111833202B (en) | Farmland evapotranspiration short-term prediction method considering crop coefficient dynamic change and rainfall | |
CN113919231A (en) | PM2.5 concentration space-time change prediction method and system based on space-time diagram neural network | |
CN111488896B (en) | Distribution line time-varying fault probability calculation method based on multi-source data mining | |
CN110533239B (en) | Smart city air quality high-precision measurement method | |
CN116013426A (en) | Site ozone concentration prediction method with high space-time resolution | |
Wen et al. | Applying an artificial neural network to simulate and predict Chinese fir (Cunninghamia lanceolata) plantation carbon flux in subtropical China | |
CN114240003A (en) | New energy output prediction method, system, storage medium and equipment | |
CN117031582A (en) | Ozone hour concentration forecasting method based on recursive space-time learning and simulation monitoring fusion | |
CN117408128A (en) | Air quality simulation and observation machine learning NO 2 Coupling forecasting method | |
CN113723670B (en) | Photovoltaic power generation power short-term prediction method with variable time window | |
CN113344290B (en) | Method for correcting sub-season rainfall weather forecast based on U-Net network | |
CN115859789A (en) | Method for improving inversion accuracy of polar atmosphere temperature profile | |
CN115639628A (en) | BP neural network air quality forecasting method based on space grouping modeling | |
CN114565136B (en) | Air quality prediction optimization method based on generation countermeasure network | |
CN116702610B (en) | GBDT and numerical mode-based wind speed prediction method and system | |
CN117829368A (en) | Urban temperature rapid prediction method integrating meteorological simulation and neural network | |
CN117057490A (en) | Prediction method and system for wet stress heat wave-flood composite disaster and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |