CN117408128A - Air quality simulation and observation machine learning NO 2 Coupling forecasting method - Google Patents

Air quality simulation and observation machine learning NO 2 Coupling forecasting method Download PDF

Info

Publication number
CN117408128A
CN117408128A CN202310394676.2A CN202310394676A CN117408128A CN 117408128 A CN117408128 A CN 117408128A CN 202310394676 A CN202310394676 A CN 202310394676A CN 117408128 A CN117408128 A CN 117408128A
Authority
CN
China
Prior art keywords
machine learning
lstm
model
cmaq
wrf
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310394676.2A
Other languages
Chinese (zh)
Inventor
朱云
刘子义
李金盈
黄泳熙
游志强
龙世程
田勇
朱振华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huayun Chuangxin Guangdong Ecological Environment Technology Co ltd
South China University of Technology SCUT
Original Assignee
Huayun Chuangxin Guangdong Ecological Environment Technology Co ltd
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huayun Chuangxin Guangdong Ecological Environment Technology Co ltd, South China University of Technology SCUT filed Critical Huayun Chuangxin Guangdong Ecological Environment Technology Co ltd
Priority to CN202310394676.2A priority Critical patent/CN117408128A/en
Publication of CN117408128A publication Critical patent/CN117408128A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Abstract

The invention discloses an air quality simulation and observation machine learning NO 2 A method of coupling forecasting, the method comprising the steps of: establishing a database; establishing a feature selection method based on greedy ideas, and determining an optimal feature variable set, wherein the features comprise meteorological indexes and pollutants; determining a machine learning model for correcting the WRF-CMAQ simulation based on the optimal characteristic variable set, and adopting the machine learning model to correct the WRF-The CMAQ forecast result is corrected, and the obtained correction result is marked as Pred wrf‑cmaq The method comprises the steps of carrying out a first treatment on the surface of the Establishing LSTM vs. NO based on monitoring data 2 Is used for NO 2 Concentration is predicted to obtain a prediction result Pred based on the monitoring data lstm The method comprises the steps of carrying out a first treatment on the surface of the Correction result Pred based on Lasso model coupling WRF-CMAQ simulation wrf‑cmaq Pred with LSTM prediction result lstm Obtaining NO 2 Is a final predicted concentration of (c). The invention can provide NO based on each air quality monitoring point position for each large city 2 The concentration forecast better serves for preventing the nitrogen oxide pollution in the bad pollution diffusion weather.

Description

Air quality simulation and observation machine learning NO 2 Coupling forecasting method
Technical Field
The invention belongs to the technical field of air quality management, and particularly relates to an air quality simulation and observation machine learning NO 2 A coupling forecasting method.
Background
At present, a plurality of existing air quality early warning and forecasting systems exist, for example, a pre-driving medium-long-term air quality forecasting system and method are disclosed in Chinese patent publication No. CN110489836A, and a BP neural network air quality forecasting method based on space grouping modeling is disclosed in Chinese patent publication No. CN115639628A, however, the existing air quality early warning and forecasting systems only use a numerical mode or only use a machine learning method, so that the air quality forecasting deviation of some extreme weather is larger, and even the air quality grade forecasting is wrong. In the current situation, the main air quality pollutant concentration prediction is realized by numerical mode simulation and manual correction, and the accurate air quality prediction cannot be realized only by means of a model.
Therefore, a method for accurately forecasting the air quality pollutants is urgently needed for air quality early warning and decision support so as to promote more cities or different scale areas to effectively improve the air quality, and the air quality standard is reached early and the world health organization guidance value is reached early.
Through the above analysis, the problems and defects existing in the prior art are as follows: 1) The existing air quality forecasting system has low accuracy for extreme weather forecast, and cannot provide guidance for extreme weather pollutant concentration early warning. 2) In the prior art, the air quality pollutants are predicted only through a numerical mode or a machine learning method, and the prediction result has low accuracy and poor interpretability; 3) Most of the existing air quality forecasting systems need manual consultation to correct, and are high in cost and low in accuracy.
Disclosure of Invention
The invention aims at the traditional air quality model pair NO 2 The concentration forecast is inaccurate, and the interpretability of the single machine learning forecast result is low, and the air quality simulation and observation machine learning NO is provided 2 A coupling forecasting method.
To achieve the aim of the invention, the invention provides NO based on secondary modeling 2 A method of coupling forecasting, the method comprising the steps of:
s11, establishing a database;
s12, establishing a greedy thought-based feature selection method, and determining an optimal feature variable set, wherein the features comprise meteorological indexes and pollutants; determining a machine learning model for correcting the WRF-CMAQ simulation based on the optimal characteristic variable set, correcting a WRF-CMAQ forecasting result by adopting the machine learning model, and marking the obtained correction result as Pred wrf-cmaq
S13, establishing LSTM to NO based on monitoring data 2 Is used for NO 2 Concentration is predicted to obtain a monitoring-based solutionPrediction result Pred of data lstm
S14, modifying result Pred based on Lasso model coupling WRF-CMAQ simulation wrf-cmaq Pred with LSTM prediction result lstm Obtaining a coupling prediction model to obtain NO 2 Is a final predicted concentration of (c).
Further, the database is established, comprising the following steps:
s111, selecting cities and sites which need air quality early warning and forecasting;
s112, acquiring an hour forecast value of a meteorological index from a WRF mode; acquiring an hour concentration forecast value of each pollutant from the CMAQ mode; acquiring the hour concentration data of each atmospheric pollutant from an air quality real-time release network and a provincial level and municipal level atmospheric pollutant monitoring network; the hour weather monitoring data are obtained from the weather station, cleaned and stored in a database.
Further, the meteorological indexes in step S112 include 14 items of wind direction, wind speed, temperature, humidity, specific humidity, rainfall, cloud cover, air pressure, boundary layer height, heat sensing flux, latent heat flux, long wave radiation, short wave radiation and ground solar radiation, and the pollutants include NO 2 、SO 2 、PM 10 、PM 2.5 、O 3 And CO.
Further, the specific steps of step S12 include:
s121, selecting NO 2 Inputting all the collected characteristics into a plurality of machine learning models one by one as a target value for training, selecting one item with the lowest average absolute error for each machine learning model to be put into a characteristic variable set, introducing the next characteristic on the basis, training and optimizing the next item, repeating the steps until the error is not reduced, and stopping introducing the next item to obtain an optimal characteristic variable set for each machine learning model;
s122, respectively inputting the optimal characteristic variable sets of the machine learning models into the corresponding machine learning models for training, and forecasting the NO of the WRF-CMAQ of the preset days in the future 2 Pre-correcting the concentration value;
s123, respectively evaluating the accuracy of the correction results of the models, and determining an optimal machine learning model;
s124, correcting the prediction result of the WRF-CMAQ by adopting the optimal machine learning model, and marking the obtained correction result as Pred wrf-cmaq
Further, in step S121, the collected feature data is expressed as:
where n is the number of features and m is the feature variable, i.e., the value of each feature per hour, for the ith feature can be expressed as a vector: (x) i1 ,x i2 ,…x im ) T ,i=1,2,…n,x im For the value of the ith feature, mth hour, superscript T denotes transpose; target pollutant NO 2 Denoted as y= { Y 1 ,y 2 ,…,y m Respectively representing target pollutants NO of future preset days 2 Concentration values.
Further, in step S121, the expression of the average absolute error is
Wherein N is the number of target values, sim dt NO representing day t of model d 2 Predicted value, obs t NO on day t 2 The concentration value is monitored.
Further, the machine learning model includes an XGBoost model, an SVR model, an RF model, a FNN model, a GBDT model, a LightGBM model, and a GRU model.
Further, in step S123, the correlation coefficient, the average absolute error and the root mean square error are selected to evaluate the accuracy of the model correction result, and the machine learning model with the highest accuracy is selected as the optimal machine learning model through comparison.
Further, in step 13, the optimal feature set Inp lstm Inputting LSTM to NO 2 To obtain a prediction result Pred lstm Wherein the optimal feature set Inp lstm The acquisition mode of (a) is as follows:
the acquired hour monitoring data and hour meteorological monitoring data of the atmospheric pollutants are respectively input into an LSTM model one by one for training;
selecting a item with the lowest average absolute error, putting the item into a feature variable set, introducing the next feature on the basis, training and optimizing the next feature, repeating the steps until the error is not reduced, stopping introducing the next feature, and obtaining a feature variable Inp which is optimal for the LSTM model finally lstm
Further, in step S14, the obtained coupling prediction model is
Wherein: gamma ray j For the j-th forecast value, corresponding to the j-th day NO 2 Coupling the forecast values; x is x j.f Correction result Pred representing the j th day wrf-cmaq And prediction result Pred lstm ;b f Regression coefficients for the f-th input variable; epsilon is the offset; n=2, representing the coupling of the WRF-CMAQ correction and LSTM prediction values to the two model output values.
Compared with the prior art, the invention at least has the following beneficial effects:
according to the method, based on the air quality model forecasting result, a machine learning forecasting model containing pollutant concentration and meteorological observation data is built, so that the change trend of pollutants and meteorological parameters can be considered simultaneously, and the problem that the interpretability is low when statistical modeling is carried out based on observation alone can be solved. In the process of model construction, taking account of collinearity existing between meteorological parameters and air pollutants and between the air pollutants and the air pollutants, the invention firstly introduces a feature selection method based on greedy ideas to solve the problem of collinearity among features. The method has the advantages of outstanding advantages, definite physical meaning of parameters and strong applicability.
Drawings
The accompanying drawings, which are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a diagram of air quality simulation and observation machine learning NO provided by an embodiment of the present invention 2 A flow chart of the steps of the coupling forecasting method.
FIG. 2 is a schematic view of a predicted area according to an embodiment of the present invention.
FIG. 3 shows ten street-suppressing NO in the future three days in the area predicted by the coupled prediction model and WRF-CMAQ, FNN, LSTM pair in the embodiment of the invention 2 The concentration forecasting effect is evaluated and compared with a graph, wherein (a) is a comparison schematic diagram of a correlation coefficient R, (b) is a comparison schematic diagram of an average absolute error MAE, and (c) is a comparison schematic diagram of a root mean square error RMSE.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the objects of the present invention will be further described in detail below with reference to the accompanying drawings and specific examples, which are not to be construed as being limiting, but the exemplary embodiments of the present invention and the descriptions thereof are only for explaining the present invention.
Referring to FIG. 1, the present invention provides an air quality simulation and observation machine learning NO 2 The coupling forecasting method comprises the following steps:
step 11: establishing NO based on each meteorological index, pollutant concentration and meteorological monitoring data 2 And forecasting the database required by the coupling model.
The method specifically comprises the following steps:
s111, selecting cities and sites which need air quality early warning and forecasting according to the air pollution prevention and control requirements;
s112, acquiring the hour forecast values of 14 weather indexes including wind direction, wind speed, temperature, humidity, specific humidity, rainfall, cloud cover, air pressure, boundary layer height, heat sensing flux, latent heat flux, long wave radiation, short wave radiation and ground solar radiation from a WRF mode; from CMAQ modeAcquisition of NO 2 、SO 2 、PM 10 、PM 2.5 、O 3 And the predicted values of the hourly concentration of six conventional pollutants of CO; acquiring the hour concentration data of each atmospheric pollutant from an air quality real-time release network and a provincial level and municipal level atmospheric pollutant monitoring network; the hour weather monitoring data are obtained from the weather station, cleaned and stored in a database.
In some embodiments of the present invention, a certain area is selected as an area to be forecasted, as shown in fig. 2, the occupied areas of d01, d02, d03 and d04 are sequentially reduced, the d04 area is selected as the forecasted area, and the d04 area includes 10 town streets. Four-layer nested simulation based on WRF-CMAQ obtains weather data (wind direction, wind speed, temperature, humidity, specific humidity, rainfall, cloud cover, air pressure, boundary layer height, heat sensing flux, latent heat flux, long wave radiation, short wave radiation, ground solar radiation) and six conventional pollutant data (NO) 2 、SO 2 、PM 10 、PM 2.5 、O 3 CO); acquiring the hour concentration data of each atmospheric pollutant from an air quality real-time release network and a provincial level and municipal level atmospheric pollutant monitoring network; the hour weather monitoring data are obtained from the weather station, cleaned and stored in a database.
And 12, establishing a greedy thought-based feature selection method, and determining an optimal feature variable set, wherein the features comprise meteorological indexes and pollutants. Determining a machine learning model for performing simulated correction on the WRF-CMAQ based on the optimal characteristic variable set, correcting a WRF-CMAQ forecasting result by adopting the machine learning model, and recording the obtained correction result as Pred wrf-cmaq
Feature selection may reduce variable introduction of redundant information, improve modeling efficiency and accuracy, in some embodiments of the invention, select NO for three days in the future 2 As the target value.
The method comprises the following steps:
step 121: inputting all the collected characteristics into a plurality of machine learning models one by one for training, and obtaining an optimal characteristic variable set aiming at each machine learning model after greedy ideological characteristic selection;
all the features collected are
Wherein n is the characteristic quantity (characteristics include (1) meteorological data including wind direction, wind speed, temperature, humidity, specific humidity, rainfall, cloud cover, air pressure, boundary layer height, heat sensing flux, latent heat flux, long wave radiation, short wave radiation, ground solar radiation, and (2) pollutant concentration data including SO 2 、NO 2 、PM 10 、PM 2.5 、O 3 CO), m is the characteristic variable, i.e. the value of each characteristic per hour. The i-th feature can be expressed as a vector: (x) i1 ,x i2 ,…x im ) T ,i=1,2,…n,x im For the value of the ith feature, mth hour, superscript T denotes transpose; target pollutant NO 2 Denoted as y= { Y 1 ,y 2 ,…,y m Respectively representing target pollutants NO of future preset days 2 Concentration value, y m Indicating target pollutant NO on future day m 2 Concentration values. In some of the embodiments of the present invention, y= { Y 1 ,y 2 ,y 3 3 days in the future, target pollutant NO 2 Concentration values.
Step 122: respectively inputting the optimal characteristic variable set of each machine learning model into the corresponding machine learning model for training, and forecasting the NO of the WRF-CMAQ three days in the future 2 Pre-correcting the concentration value;
step 123: respectively evaluating the accuracy of the correction result of each machine learning model, and determining an optimal machine learning model;
step 124: correcting the prediction result of the WRF-CMAQ by adopting the optimal machine learning model, and marking the obtained correction result as Pred wrf-cmaq
Wherein the machine learning model comprises an XGBoost model, an SVR model, an RF model and a FNN model. The method is not limited to the above 4 models, and can be usedMachine learning models such as GBDT model, lightGBM model and GRU model. Specifically, in some embodiments of the present invention, in step 121, all the collected features are respectively input into XGBoost, SVR, RF, FNN models one by one for training, one with the lowest Mean Absolute Error (MAE) is selected to be put into the feature variable set, the next feature is introduced on the basis, the training and the preferential selection are performed similarly, the above steps are repeated until the errors are not reduced any more, and the introduction is stopped, so as to obtain the feature variable set Inp which is optimal for each machine learning model finally xgb ,Inp svr ,Inp rf ,Inp fnn See table 1.
Wherein the average absolute error is expressed as:
wherein N is the number of target values, sim dt NO representing day t of model d 2 Predicted value, obs t NO on day t 2 The concentration value is monitored.
TABLE 1 simulation value initial variable and each correction model characteristic variable
Based on the result of feature selection, an optimal feature variable set Inp of a single model xgb ,Inp svr ,Inp rf ,Inp fnn Respectively inputting into a machine learning model XGBoost, SVR, RF, FNN for training, and pre-correcting the pollutant concentration value predicted by WRF-CMAQ three days in the future, and marking as Pred xgb ,Pred svr ,Pred rf ,Pred fnn
Selecting a correlation coefficient (R), an average absolute error (MAE) and a Root Mean Square Error (RMSE) to respectively evaluate XGBoost, SVR, RF, FNN model correction result Pred xgb ,Pred svr ,Pred rf ,Pred fnn The accuracy of (2) is shown in Table 2;
through comparison, FNN with highest accuracy is selected as a correction model of the WRF-CMAQ, and the prediction result of the WRF-CMAQ is corrected by using the FNN model, so that the correction result is marked as Pred wrf-cmaq
Table 2 comparison of accuracy of correction models
( And (3) injection: day_1, day_2, day_3, and AVE in the table represent average values of forecast first, second, third, and third DAYs, respectively )
Step 13: establishing LSTM vs. NO based on monitoring data 2 Is used for NO 2 Concentration is predicted to obtain a prediction result Pred based on the monitoring data lstm
Atmospheric pollutants (SO) 2 、NO 2 、PM 10 、PM 2.5 、O 3 CO) and hour meteorological monitoring data (temperature, humidity, barometric pressure, wind direction, wind speed), in some embodiments of the invention, the target pollutant NO to be predicted is selected 2 As a tag value;
in order to reduce redundancy of input features and improve calculation speed and prediction accuracy, a greedy thought feature selection method is used for selecting the input features of LSTM, and the optimal feature set is selected and marked as Inp lstm See table 3 for details.
Wherein the greedy idea is characterized by the fact that the obtained hour monitoring data (SO 2 、NO 2 、PM 10 、PM 2.5 、O 3 And CO) and hour meteorological monitoring data (temperature, humidity, air pressure, wind direction and wind speed) are respectively input into the LSTM model one by one for training. Similarly, one item with the lowest Mean Absolute Error (MAE) is selected to be placed in the feature variable set, the next feature is introduced on the basis, training and preference are also carried out, and the steps are repeated until the error is reachedNo reduction is carried out, the introduction is stopped, and finally, the optimal characteristic variable Inp aiming at the LSTM model is obtained lstm See table 3. And uses the optimal feature set Inp lstm Inputting LSTM model to NO in three days in future 2 Predicting the concentration to obtain a prediction result Pred based on monitoring lstm
TABLE 3 initial variables of monitor values and LSTM model feature variables
Step 14: correction result Pred based on Lasso model coupling WRF-CMAQ simulation wrf-cmaq Pred with LSTM prediction result lstm Obtaining a coupling prediction model, and obtaining NO based on the coupling prediction model 2 Is a final predicted concentration of (c).
In some embodiments of the present invention, three future days of NO are obtained after the previous two steps of WRF-CMAQ correction modeling and LSTM monitor modeling 2 Daily correction value Pred wrf-cmaq And LSTM predicted output value Pred lstm
The two outputs are integrated, a coupling prediction model is established, the advantages of different models can be integrated, and a more accurate result is obtained compared with the output of a single model. Considering that the learning ability of the correction model and the LSTM model is strong, if an integration method with higher complexity is selected, the occurrence of the coupling forecast over-fitting condition can be aggravated, so that the integration is carried out by adopting the Lasso method.
The obtained coupling forecast model is
Wherein: gamma ray j For the j-th forecast value, corresponding to the j-th day NO 2 Coupling forecast values, j=1, 2, 3; x is x j.f Pred representing day j wrf-cmaq And Pred lstm ;b f Regression coefficients for the f-th input variable; epsilon is the offset; n=2, representing that the WRF-CMAQ correction value and the LSTM predictive value are divided into two modelsCoupling of output values.
The method compresses regression coefficient by constructing penalty term, thereby reducing complexity of model, and realizing Pred wrf-cmaq And Pred lstm As input to Lasso, the coupled forecast values were obtained by Lasso regression, see table 4. It can be seen that air quality simulates and observes machine learning NO 2 Compared with the traditional air quality forecasting method (WRF-CMAQ) and a single machine learning forecasting method (FNN, RF, SVR, XGBoost, LSTM), the coupling forecasting method has the advantages that the forecasting accuracy is greatly improved, and decision support can be provided for air quality forecasting and early warning.
Table 2 model pairs NO during test period 2 Forecast effect evaluation
( And (3) injection: day_1, day_2, day_3, and AVE in the table represent average values of forecast first, second, third, and third DAYs, respectively )
In some embodiments of the invention, the coupling model of the invention is used to determine NO for three days in the future for ten street-breaking sites in zone d04 2 The concentration was predicted, and the prediction effect was evaluated as shown in fig. 3 (a) - (c). From the graph, the coupling forecast model pair NO provided by the invention 2 The forecasting ability of the model is far better than other models, and the model can be used for NO 2 Concentration forecast, and also describes the coupling forecast model for the region NO 2 The concentration has better forecasting capability.
The coupling forecasting method provided by the embodiment of the invention fully exerts the fitting capacity of machine learning on nonlinear problems through the secondary modeling of the machine learning method on the basis of the traditional air quality numerical simulation, and realizes NO 2 Is a precise forecast of (1). The method has the advantages of outstanding advantages, definite physical meaning of parameters and strong applicability.
The sequence numbers before the steps of the method are used for convenience of description, and the sequence of the steps is not limited.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. Air quality simulation and observation machine learning NO 2 The coupling forecasting method is characterized by comprising the following steps of:
s11, establishing a database;
s12, establishing a greedy thought-based feature selection method, and determining an optimal feature variable set, wherein the features comprise meteorological indexes and pollutants; determining a machine learning model for correcting the WRF-CMAQ simulation based on the optimal characteristic variable set, correcting a WRF-CMAQ forecasting result by adopting the machine learning model, and marking the obtained correction result as Pred wrf-cmaq
S13, establishing a long and short memory neural network LSTM pair NO based on the monitoring data 2 Is used for NO 2 Concentration is predicted to obtain a prediction result Pred based on the monitoring data lstm
S14, modifying result Pred based on Lasso model coupling WRF-CMAQ simulation wrf-cmaq Pred with LSTM prediction result lstm Obtaining a coupling prediction model, and obtaining NO based on the coupling prediction model 2 Is a final predicted concentration of (c).
2. An air quality simulation and observation machine learning NO according to claim 1 2 The coupling forecasting method is characterized by establishing the database and comprises the following steps:
s111, selecting cities and sites which need air quality early warning and forecasting;
s112, acquiring an hour forecast value of a meteorological index from a WRF mode; acquiring an hour concentration forecast value of each pollutant from the CMAQ mode; acquiring the hour concentration data of each atmospheric pollutant from an air quality real-time release network and a provincial level and municipal level atmospheric pollutant monitoring network; the hour weather monitoring data are obtained from the weather station, cleaned and stored in a database.
3. An air quality simulation and observation machine learning NO according to claim 2 2 The coupling forecasting method is characterized in that the meteorological indexes in the step S112 comprise 14 items of wind direction, wind speed, temperature, humidity, specific humidity, rainfall, cloud cover, air pressure, boundary layer height, heat sensing flux, latent heat flux, long wave radiation, short wave radiation and ground solar radiation, and the pollutants comprise NO 2 、SO 2 、PM 10 、PM 2.5 、O 3 And CO.
4. An air quality simulation and observation machine learning NO according to claim 1 2 The coupling forecasting method is characterized in that the specific steps of the step S12 comprise:
s121, selecting NO 2 Inputting all the collected characteristics into a plurality of machine learning models one by one as a target value for training, selecting one item with the lowest average absolute error for each machine learning model to be put into a characteristic variable set, introducing the next characteristic on the basis, training and optimizing the next item, repeating the steps until the error is not reduced, and stopping introducing the next item to obtain an optimal characteristic variable set for each machine learning model;
s122, respectively inputting the optimal characteristic variable sets of the machine learning models into the corresponding machine learning models for training, and forecasting the NO of the WRF-CMAQ of the preset days in the future 2 Pre-correcting the concentration value;
s123, respectively evaluating the accuracy of the correction results of the models, and determining an optimal machine learning model;
s124, correcting the prediction result of the WRF-CMAQ by adopting the optimal machine learning model, and marking the obtained correction result as Pred wrf-cmaq
5. An air quality simulation and observation machine learning NO according to claim 4 2 The coupling forecasting method is characterized in that in step S121, the collected characteristic data is expressed as:
where n is the number of features and m is the feature variable, i.e., the value of each feature per hour, for the ith feature can be expressed as a vector: (x) i1 ,x i2 ,…x im ) T ,i=1,2,…n,x im For the value of the ith feature, mth hour, superscript T denotes transpose; target pollutant NO 2 Denoted as y= { Y 1 ,y 2 ,…,y m },y m Indicating target pollutant NO on future day m 2 Concentration values.
6. An air quality simulation and observation machine learning NO according to claim 4 2 The coupling forecasting method is characterized in that in step S121, the expression of the average absolute error is that
Wherein N is the number of target values, sim dt NO representing day t of model d 2 Predicted value, obs t NO on day t 2 The concentration value is monitored.
7. An air quality simulation and observation machine learning NO according to claim 4 2 The coupling forecasting method is characterized in that,the machine learning model comprises an extreme gradient lifting tree XGBoost, a support vector regression SVR, a random forest RF, a feedforward neural network FNN model, a gradient lifting decision tree GBDT, a distributed gradient lifting framework LightGBM and a gate control unit GRU.
8. An air quality simulation and observation machine learning NO according to claim 4 2 The coupling forecasting method is characterized in that in step S123, the accuracy of the model correction result is evaluated by selecting a correlation coefficient, an average absolute error and a root mean square error, and the machine learning model with the highest accuracy is selected as the optimal machine learning model through comparison.
9. An air quality simulation and observation machine learning NO according to claim 1 2 The coupling forecasting method is characterized in that in step 13, the optimal feature set Inp lstm Inputting LSTM to NO 2 To obtain a prediction result Pred lstm Wherein the optimal feature set Inp lstm The acquisition mode of (a) is as follows:
the acquired hour monitoring data and hour meteorological monitoring data of the atmospheric pollutants are respectively input into an LSTM model one by one for training;
selecting a item with the lowest average absolute error, putting the item into a feature variable set, introducing the next feature on the basis, training and optimizing the next feature, repeating the steps until the error is not reduced, stopping introducing the next feature, and obtaining a feature variable Inp which is optimal for the LSTM model finally lstm
10. An air quality simulation and observation machine learning NO according to any one of claims 1-9 2 The coupling forecasting method is characterized in that in the step S14, the obtained coupling forecasting model is that
Wherein: gamma ray j For the j-th forecast value, corresponding to the j-th day NO 2 Coupling the forecast values; x is x j.f Correction result Pred representing the j th day wrf-cmaq And prediction result Pred lstm ;b f Regression coefficients for the f-th input variable; epsilon is the offset; n=2, representing the coupling of the WRF-CMAQ correction and LSTM prediction values to the two model output values.
CN202310394676.2A 2023-04-13 2023-04-13 Air quality simulation and observation machine learning NO 2 Coupling forecasting method Pending CN117408128A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310394676.2A CN117408128A (en) 2023-04-13 2023-04-13 Air quality simulation and observation machine learning NO 2 Coupling forecasting method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310394676.2A CN117408128A (en) 2023-04-13 2023-04-13 Air quality simulation and observation machine learning NO 2 Coupling forecasting method

Publications (1)

Publication Number Publication Date
CN117408128A true CN117408128A (en) 2024-01-16

Family

ID=89489639

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310394676.2A Pending CN117408128A (en) 2023-04-13 2023-04-13 Air quality simulation and observation machine learning NO 2 Coupling forecasting method

Country Status (1)

Country Link
CN (1) CN117408128A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110261547A (en) * 2019-07-04 2019-09-20 北京思路创新科技有限公司 A kind of Urban Air Pollution Methods and equipment
CN113280378A (en) * 2021-05-27 2021-08-20 华南理工大学 Online oil smoke monitoring system with self-cleaning function and control method
CN113627529A (en) * 2021-08-11 2021-11-09 成都佳华物链云科技有限公司 Air quality prediction method, device, electronic equipment and storage medium
CN115881239A (en) * 2022-09-13 2023-03-31 重庆市生态环境大数据应用中心 Method for dividing atmospheric pollution weak diffusion area based on extended WRF and CMAQ models

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110261547A (en) * 2019-07-04 2019-09-20 北京思路创新科技有限公司 A kind of Urban Air Pollution Methods and equipment
CN113280378A (en) * 2021-05-27 2021-08-20 华南理工大学 Online oil smoke monitoring system with self-cleaning function and control method
CN113627529A (en) * 2021-08-11 2021-11-09 成都佳华物链云科技有限公司 Air quality prediction method, device, electronic equipment and storage medium
CN115881239A (en) * 2022-09-13 2023-03-31 重庆市生态环境大数据应用中心 Method for dividing atmospheric pollution weak diffusion area based on extended WRF and CMAQ models

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LU, H等: "Adjusting prediction of ozone concentration based on CMAQ model and machine learning methods in Sichuan-Chongqing region, China", 《ATMOSPHERIC POLLUTION RESEARCH》, vol. 12, no. 6, 30 June 2021 (2021-06-30), pages 1 - 13 *
康俊锋等: "XGBoost-LSTM 变权组合模型支持下短期PM2.5 浓度预测——以上海为例", 《中国环境科学》, vol. 41, no. 9, 30 April 2021 (2021-04-30), pages 4016 - 4025 *

Similar Documents

Publication Publication Date Title
Heydari et al. A novel composite neural network based method for wind and solar power forecasting in microgrids
Ibrahim et al. A novel hybrid model for hourly global solar radiation prediction using random forests technique and firefly algorithm
Ramsami et al. A hybrid method for forecasting the energy output of photovoltaic systems
CN110782093B (en) PM fusing SSAE deep feature learning and LSTM2.5Hourly concentration prediction method and system
CN111695731B (en) Load prediction method, system and equipment based on multi-source data and hybrid neural network
CN111665575B (en) Medium-and-long-term rainfall grading coupling forecasting method and system based on statistical power
CN113537600B (en) Medium-long-term precipitation prediction modeling method for whole-process coupling machine learning
CN116451879B (en) Drought risk prediction method and system and electronic equipment
CN111833202B (en) Farmland evapotranspiration short-term prediction method considering crop coefficient dynamic change and rainfall
CN113919231A (en) PM2.5 concentration space-time change prediction method and system based on space-time diagram neural network
CN111488896B (en) Distribution line time-varying fault probability calculation method based on multi-source data mining
CN110533239B (en) Smart city air quality high-precision measurement method
CN116013426A (en) Site ozone concentration prediction method with high space-time resolution
Wen et al. Applying an artificial neural network to simulate and predict Chinese fir (Cunninghamia lanceolata) plantation carbon flux in subtropical China
CN114240003A (en) New energy output prediction method, system, storage medium and equipment
CN117031582A (en) Ozone hour concentration forecasting method based on recursive space-time learning and simulation monitoring fusion
CN117408128A (en) Air quality simulation and observation machine learning NO 2 Coupling forecasting method
CN113723670B (en) Photovoltaic power generation power short-term prediction method with variable time window
CN113344290B (en) Method for correcting sub-season rainfall weather forecast based on U-Net network
CN115859789A (en) Method for improving inversion accuracy of polar atmosphere temperature profile
CN115639628A (en) BP neural network air quality forecasting method based on space grouping modeling
CN114565136B (en) Air quality prediction optimization method based on generation countermeasure network
CN116702610B (en) GBDT and numerical mode-based wind speed prediction method and system
CN117829368A (en) Urban temperature rapid prediction method integrating meteorological simulation and neural network
CN117057490A (en) Prediction method and system for wet stress heat wave-flood composite disaster and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination