CN115639628A - BP neural network air quality forecasting method based on space grouping modeling - Google Patents

BP neural network air quality forecasting method based on space grouping modeling Download PDF

Info

Publication number
CN115639628A
CN115639628A CN202211234483.2A CN202211234483A CN115639628A CN 115639628 A CN115639628 A CN 115639628A CN 202211234483 A CN202211234483 A CN 202211234483A CN 115639628 A CN115639628 A CN 115639628A
Authority
CN
China
Prior art keywords
data
forecast
model
forecasting
air quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211234483.2A
Other languages
Chinese (zh)
Inventor
朱媛媛
王淑莹
刘冰
尹翠芳
李翔宇
穆宏蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHINA NATIONAL ENVIRONMENTAL MONITORING CENTRE
Beijing Ladbrokes Victory Environmental Technology Co ltd
Original Assignee
CHINA NATIONAL ENVIRONMENTAL MONITORING CENTRE
Beijing Ladbrokes Victory Environmental Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHINA NATIONAL ENVIRONMENTAL MONITORING CENTRE, Beijing Ladbrokes Victory Environmental Technology Co ltd filed Critical CHINA NATIONAL ENVIRONMENTAL MONITORING CENTRE
Priority to CN202211234483.2A priority Critical patent/CN115639628A/en
Publication of CN115639628A publication Critical patent/CN115639628A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a BP neural network air quality forecasting method based on space grouping modeling, relates to the technical field of air quality forecasting, and comprises a model training stage and an automatic forecasting stage. The method adopts high-time-frequency high-precision actual measurement data as sample data of model training, and re-samples the heavily polluted weather sample; meanwhile, a method for carrying out space grouping modeling by combining a correlation matrix is used for dynamically screening a space model group selected by each station forecast according to the wind direction; then, the optimal model in the space model group is screened out by utilizing the statistical indexes and the special classification statistical indexes of the invention to simulate and generate a forecast result. Therefore, the accuracy of the forecast data is improved, and the forecast data output by the method can be accurate to the district and county monitoring sites.

Description

BP neural network air quality forecasting method based on space packet modeling
Technical Field
The invention relates to the technical field of air quality prediction, in particular to a BP neural network air quality prediction method based on space grouping modeling.
Background
The relationship between air quality and human health has attracted a great deal of attention and has been studied by a number of scholars. Air quality prediction is particularly important in the aspects of public early warning and disaster reduction plan making before a pollution event occurs, and domestic and foreign scholars also predict air quality by using a numerical mechanism model and a statistical model to try to accurately predict the occurrence of the pollution event and provide support for a disaster reduction plan.
At present, a commonly used BP neural network air quality forecasting method based on space grouping modeling at home and abroad comprises the following steps: a potentiality prediction method, a numerical mechanism prediction method and a statistical prediction method. Statistical forecasting methods collect weather data and monitored pollutant concentration data for a long time, and are implemented by statistical methods such as: and establishing a correlation between the meteorological conditions and the pollutant concentration data by methods such as multivariate linear regression, neural network or machine learning, and calculating or predicting the pollutant concentration data according to the forecasted meteorological conditions in the subsequent forecasting process.
The potential forecasting method is that a forecaster carries out subjective study and judgment according to weather conditions, has larger uncertainty and is rarely used independently; the numerical mechanism forecasting method is based on the pollution source emission list and weather forecast data, the concentration of pollutants emitted by an emission source in the air after a series of physical and chemical reactions in the ambient air is simulated and predicted, and the concentration of the pollutants in PM2.5 and O is subjected to the simulation and prediction 3 The numerical mechanism simulation and prediction of the secondary pollutants is a very complex system process, and the mechanism process generated by the secondary pollutants cannot be completely described by the conventional numerical mechanism prediction method; and numerical mechanism prediction methodThe forecasting accuracy rate greatly depends on the accuracy of the pollution source emission list, and in terms of the present, collecting the accurate pollution source emission list has certain difficulty, although most cities develop the work of forecasting the air quality by using a numerical mechanism forecasting method, the result is not ideal, and manual study and judgment are still needed to improve the forecasting accuracy rate; due to the defects of the numerical mechanism forecasting method, more and more environmental protection units begin to adopt a statistical forecasting method to forecast the air quality at present. The BP neural network algorithm is one of the commonly used statistical forecasting methods, the forecasting accuracy rate of the BP neural network algorithm is generally higher than that of a numerical mechanism forecasting method, and a better application effect is achieved in some cities. But because the method is not sensitive enough to pollution extreme values, the method has lower forecasting accuracy in the heavy pollution occurrence period.
The three forecasting methods have certain limitations, and the forecasting effect is not ideal. Although the statistical forecasting method based on the BP neural network algorithm has higher forecasting accuracy on general polluted weather, the forecasting accuracy on the heavy pollution event is still lower. Therefore, the existing algorithm needs to be continuously optimized to improve the forecast accuracy, which is also urgently needed by the environmental protection business department.
Based on the method, a BP neural network air quality forecasting method based on space packet modeling is provided.
Disclosure of Invention
The invention aims to overcome the problems in the prior art, and provides a BP neural network air quality forecasting method based on space grouping modeling, which is convenient for improving the accuracy of forecast data, and meanwhile, the forecast data output by adopting the method can be accurate to county monitoring sites.
In order to achieve the technical purpose and achieve the technical effect, the invention is realized by the following technical scheme:
a BP neural network air quality forecasting method based on space grouping modeling comprises a model training stage and an automatic forecasting stage, and specifically comprises the following steps:
s1: model training phase
S101: data acquisition: collecting pollutant concentration data of hour resolution of national control monitoring sites and county monitoring sites of a target forecast area for nearly three years;
s102: data cleaning: automatically cleaning the acquired data in various modes, and checking whether the cleaned data conforms to normal distribution;
s103: resampling the heavily polluted weather sample, and reserving the resampled data;
s104: meteorological data preprocessing: converting the downloaded weather forecast data into a required format, and carrying out standardization processing on the weather factor data;
s105: analyzing the spatial dependence relationship between each site of the target forecast area and the county site, namely analyzing the correlation relationship between the pollutant concentration of each national control monitoring site and the pollutant concentration of the county site;
s106: performing space grouping modeling according to the space dependency relationship between each site of the target forecasting region and the sites of the surrounding counties;
s107: analyzing the correlation between pollutant concentration data and meteorological factor data, analyzing the correlation between pollutant concentration data of each national control monitoring station in a target forecasting area and each meteorological factor data, and screening meteorological parameter sets;
s108: configuring and constructing a plurality of air quality forecasting models, wherein the configured models comprise: an input layer, 4 hidden layers and an output layer;
s109: training a model: training a plurality of constructed air quality prediction models, wherein the training is divided into two modes, one mode is a conventional mode; the other mode is a pollution standard exceeding mode; the data of the training model accounts for 70% of the total sample number, and the rest 30% of the sample number is used for model verification;
s110: and (3) model verification: two general statistical indicators are included: root mean square error and decision coefficient; and three categorical statistical indicators: predicting the correct proportion, success index and error early warning proportion of the exceeding days;
s111: automatically selecting an optimal model as a model used in an automatic forecasting stage according to the general statistical indexes and the classification statistical indexes described in the step S110 at different sites and different forecasting time periods of different pollutants;
s2: automatic forecast phase
S201: monitoring data acquisition: compiling data acquisition programs of state control monitoring sites and county monitoring sites of a target forecasting area, and continuously acquiring pollutant concentration data with hour resolution in real time to serve as input data of an automatic forecasting stage;
s202: meteorological data acquisition: a program for automatically downloading GFS weather forecast data is compiled, the weather data format can be converted into a format required by an automatic forecast stage, and the weather factor data is subjected to standardized processing;
s203: judging the wind direction: automatically judging the wind direction of a forecast day by using a wind direction judging program;
s204: dynamically screening a space model group selected by each station forecast according to the wind direction;
s205: and automatically judging a final forecast result, and when the forecast value of the conventional mode reaches a certain value, using the forecast result in the pollution standard exceeding mode as the final forecast result, otherwise, using the forecast result in the conventional mode as the final forecast result.
Preferably, the method of resampling is as follows:
X'=X min +randΟ×(X max -X min )
X max -is the maximum value in the sample;
X min -is the minimum value in the sample;
rand O-is a uniformly distributed random real number between 0 and 1;
and X' is the resampled data.
Preferably, the normalization process is performed by: y = log (1+y)
y is actual meteorological factor data;
and Y-is the meteorological factor data after the standardization processing.
Preferably, the input layer comprises a space factor, a meteorological factor and a time factor; 4 layers are selected as hidden layers; the output layer predicts the concentration for the contaminant.
Preferably, the normal mode is to train the model directly by using the normal pollutant concentration data and the meteorological factor data after the data cleaning in step S102.
Preferably, the pollution exceeding mode is to train the model by using the sample data obtained by resampling the heavy pollution weather sample and the meteorological factor data in step S103.
Preferably, the root mean square error is calculated as:
Figure BDA0003882215530000041
the calculation formula of the judgment coefficient is as follows:
Figure BDA0003882215530000051
in the above formula, X OBS,i -a monitored value representing day i; x model,i -a forecast value representing day i; n-represents the number of samples for model validation.
Preferably, the calculation formula of the correct proportion for exceeding days prediction is as follows:
Figure BDA0003882215530000052
the success index calculation formula is:
Figure BDA0003882215530000053
the error early warning proportion calculation formula is as follows:
Figure BDA0003882215530000054
in the formula, N1 represents the number of days that the monitored value reaches the standard and the forecast value also reaches the standard; n2-represents the number of days that the monitored value reaches the standard and the forecast value exceeds the standard; n3 represents the number of days that the monitoring value exceeds the standard and the forecast value also exceeds the standard; n4-represents the number of days that the monitored value exceeds the standard and the forecast value reaches the standard.
In summary, the present invention includes at least one of the following advantages:
the method adopts high-time-frequency high-precision actual measurement data as sample data of model training, and resamples the heavily polluted weather sample; meanwhile, a method for carrying out space grouping modeling by combining a correlation matrix is used for dynamically screening a space model group selected by each station forecast according to the wind direction; then, the optimal model in the space model group is screened out by utilizing the statistical indexes and the special classification statistical indexes of the invention to simulate and generate a forecast result. Therefore, the accuracy of the forecast data is improved, and the forecast data output by the method can be accurate to the county monitoring sites.
Drawings
FIG. 1 is a flow chart of a training phase of the present invention;
FIG. 2 is a flow chart of the automatic run phase of the present invention;
FIG. 3 is a diagram illustrating a class statistical indicator according to the present invention;
FIG. 4 is a schematic diagram of the present invention for modeling a spatial grouping of regions.
Detailed Description
The invention is described in further detail below with reference to figures 1-4.
Example 1
The embodiment provided by the invention comprises the following steps: as shown in fig. 1-4, a BP neural network air quality prediction method based on spatial grouping modeling includes two stages, namely, a model training stage and an automatic prediction stage. The two phases will be described in detail below:
s1: stage of model training
S101: data acquisition: and collecting pollutant concentration data of hour resolution of national control monitoring sites and county monitoring sites of the target forecast area for nearly three years.
S102: data cleaning: carrying out automatic cleaning on the acquired data in various modes, such as cleaning unreasonable maximum and minimum values, and cleaning abnormal values by using hampel filtering; for the data with abnormal values or missing values which are cleaned, a plurality of methods can be used for completing, such as an adjacent interpolation method, a historical synchronization data completing method and the like; and (4) checking whether the cleaned data conforms to normal distribution or not so as to ensure the reasonability of the data cleaning process.
S103: resampling the heavily polluted weather sample: the proportion of the number of samples of the heavily polluted weather in the whole samples is low, so that the heavily polluted weather samples are resampled; the resampling method comprises the following steps:
X'=X min +randΟ×(X max -X min )
X max -is the maximum value in the sample;
X min -is the minimum value in the sample;
rand O-is a uniformly distributed random real number between 0 and 1;
x' is data after resampling;
the resampled data is ready for use.
S104: preprocessing meteorological data: converting the downloaded weather forecast data into a required format, and standardizing the weather factor data, wherein the standardized processing method comprises the following steps: y = log (1+y)
y is actual meteorological factor data;
and Y-is the meteorological factor data after the standardization processing.
S105: and analyzing the spatial dependence relationship between each site of the target forecasting region and the district and county sites, namely analyzing the correlation relationship between the pollutant concentration of each national control monitoring site and the pollutant concentration of the peripheral district and county sites.
S106: and performing space grouping modeling according to the space dependency relationship between each site of the target forecast area and the sites of the surrounding counties.
S107: and analyzing the correlation between the pollutant concentration data and the meteorological factor data, analyzing the correlation between the pollutant concentration data of each national control monitoring site in the target forecasting region and each meteorological factor data, and screening a meteorological parameter set.
S108: configuring and constructing a plurality of air quality forecasting models, wherein the configured models comprise: an input layer, 4 hidden layers and an output layer;
the input layer comprises: space factors (space grouping modeling), meteorological factors (namely meteorological factors which have high correlation with the pollutant concentration of the target forecast site, such as factors of planet boundary layer height, temperature, relative humidity, wind direction, wind speed, inverse temperature intensity and the like), and time factors (namely whether the target forecast day is a working day or a public holiday or the like);
4 layers are selected as hidden layers, the traditional neural network algorithm is generally 1-3 layers, 4 layers of hidden layers and an algorithm for initializing weights by unsupervised training and reversely fine-tuning weights by supervised training are selected in the invention, thus not only achieving the learning effect of deep learning multi-layer perception, but also avoiding the problem of overlong training time. The activation function selects a tanh function, the effect of the tanh function is good when the characteristic difference is obvious, and the characteristic effect is continuously enlarged in the circulation process. In specific applications, tanh function is often more advantageous than sigmoid function, mainly because the sigmoid function is sensitive to function value change when the input is between [ -1,1], loses sensitivity once approaching or exceeding the range, and is in a saturation state, and tanh function is centered at 0, and has a faster convergence rate, so tanh is better than sigmod in practical applications.
The output layer is the pollutant forecast concentration.
S109: training a model: training a plurality of constructed air quality forecasting models, wherein the training is divided into two modes, one mode is a conventional mode, namely, the models are trained by directly adopting conventional pollutant concentration data, meteorological factor data and the like after data cleaning in the step S102; the other mode is a pollution standard exceeding mode, namely, the model is trained by adopting the sample data obtained by resampling the heavy pollution weather sample in the step S103 and the meteorological factor data. The data of the training model accounts for 70% of the total sample number, and the remaining 30% of the sample number is used for model verification.
S110: and (3) model verification: the model is required to obtain a better effect finally, different pollutants are required to select the model which is most suitable for the pollutant at different sites, and the basis of model selection is the key of the pollutant. The invention designs a set of evaluation indexes for forecast results, provides basis for evaluating models and selecting optimal models, and comprises two general indexesAnd (3) statistical indexes: root Mean Square Error (RMSE) and decision coefficient (R) 2 ) And three categorical statistical indicators: the number of exceeding days predicts the correct proportion (FCF), the Success Index (SI) and the error early warning proportion (FFA), and the indexes are comprehensively applied to comprehensively measure the quality of the model.
The above-mentioned root mean square error RMSE is calculated by the formula:
Figure BDA0003882215530000081
in the formula, X OBS,i -a monitored value representing day i;
X model,i -a forecast value representing day i;
n-represents the number of samples for model validation.
Determination coefficient R 2 The calculation formula of (c) is:
Figure BDA0003882215530000091
in the formula, X OBS,i -a monitored value representing day i;
X model,i -a forecast value representing day i;
n-represents the number of samples for model validation.
The correct ratio for days out of standard (FCF) is calculated as:
Figure BDA0003882215530000092
the Success Index (SI) is calculated as:
Figure BDA0003882215530000093
the calculation formula of the error early warning ratio (FFA) is as follows:
Figure BDA0003882215530000094
in the above formula, N 1 、N 2 、N 3 And N 4 As shown in the figureAs shown in figure 3, the first and second,
N 1 -days representing the compliance of the monitored values and the compliance of the forecast values;
N 2 -days representing when the monitored value is up to standard and the forecast value is out of standard;
N 3 days representing that the monitored value exceeds the standard and the forecast value also exceeds the standard;
N 4 the number of days that the monitored value exceeds the standard and the forecast value reaches the standard.
S111: the general statistical index and the classification statistical index described in step S110 are used to automatically select the optimal model as the model used in the automatic forecasting stage at different sites and different forecasting periods for different pollutants.
S2: automatic forecast phase
S201: monitoring data acquisition: and compiling data acquisition programs aiming at national control monitoring sites and county monitoring sites of the target forecast area (the current national control monitoring sites and the county monitoring sites belong to different monitoring networks), and continuously acquiring pollutant concentration data with hour resolution in real time to serve as input data of an automatic forecast stage.
S202: meteorological data acquisition: the method is characterized in that a program for automatically downloading GFS weather forecast data is compiled, the weather data format can be converted into a format required by an automatic forecast stage, and weather factor data is subjected to standardized processing, wherein the standardized processing method comprises the following steps:
Y=log (1+y)
y is actual meteorological factor data;
and Y-is the meteorological factor data after the standardization processing.
S203: judging the wind direction: and automatically judging the wind direction of the forecast day by using a wind direction judging program.
S204: and dynamically screening the space model group selected by each station according to the wind direction, for example, for one station, firstly judging the wind direction of the area where the station is located, then selecting the space model group of the wind direction for the station, and then simulating and generating a forecasting result according to the optimal model in the space model group screened in the model verification stage in the training process. And the forecast results also comprise forecast results in a conventional mode and forecast results in a pollution exceeding mode.
S205: and automatically judging a final forecast result, and verifying that the correlation between the forecast trend of the forecast result and the actual monitoring value in the conventional mode is higher through a model, and the forecast is lower only under the condition of individual extreme value. Therefore, the invention designs a set of mechanism, when the forecast value of the conventional mode reaches a certain value, the forecast result under the pollution standard exceeding mode is used as the final forecast result, otherwise, the forecast result under the conventional mode is used as the final forecast result.
Example 2
A BP neural network air quality forecasting method based on space packet modeling comprises two stages:
1. a model training stage, comprising the following steps:
(1) Data acquisition: and collecting the pollutant concentration data of the national control monitoring sites and county monitoring sites in a certain area with hour resolution of nearly three years.
(2) Data cleaning: the collected data is automatically cleaned in various modes, such as cleaning unreasonable maximum values and unreasonable minimum values, cleaning abnormal values by using hampel filtering, and completing the cleaned abnormal values or the data which is lost by using various methods, such as a proximity interpolation method, a historical synchronization data completing method and the like, and the cleaned data is also subjected to a test of whether the data conforms to normal distribution, so that the rationality of the data cleaning process is ensured.
(3) Resampling the heavily polluted weather sample: the proportion of the number of samples of the heavily polluted weather in the whole samples is low, so the method for resampling the heavily polluted weather samples is shown in step S103 in the model training stage in embodiment 1.
(4) Meteorological data preprocessing: and converting the downloaded weather forecast data into a required format, and standardizing the weather factor data.
(5) And analyzing the spatial dependence relationship between the sites of the target forecast area and the county sites, namely analyzing the correlation relationship between the pollutant concentration of each national control monitoring site and the pollutant concentration of the county sites.
(6) And analyzing the correlation between the pollutant concentration data and the meteorological factor data, analyzing the correlation between the pollutant concentration data of each national control monitoring site in the target forecasting region and each meteorological factor data, and screening a meteorological parameter set.
(7) Configuring and constructing a plurality of air quality forecasting models, wherein the configured models comprise: an input layer, 4 hidden layers, and an output layer. The input layer comprises: space factors (space grouping modeling), meteorological factors (namely meteorological factors with high correlation with the pollutant concentration of a target forecast site, such as factors of planet boundary layer height, temperature, relative humidity, wind direction, wind speed, inverse temperature intensity and the like), and time factors (namely whether a target forecast day is a working day or a public holiday or the like); 4 layers are selected as hidden layers, the traditional neural network algorithm is generally 1-3 layers, 4 layers of hidden layers and an algorithm for initializing weights by unsupervised training and reversely fine-tuning weights by supervised training are selected in the invention, thus not only achieving the learning effect of deep learning and multi-layer perception, but also avoiding the problem of too long training time; the output layer is the pollutant forecast concentration.
(7.1) a space grouping modeling method: the method comprises the steps that a site to be forecasted is taken as a center, a circle is drawn with a radius of 20km, the circular area is divided into 8 fan-shaped areas according to 8 wind directions, pollutant concentration data of sites (national control sites and prefecture sites) contained in each fan-shaped area are modeled in a grouping mode, namely 8 groups of models are built, each group of models are combined with different meteorological factors to build 7 models, namely 8 groups of models are built for each forecast site, and 56 models are calculated in total; as shown in fig. 4.
(8) Training a model: and training the plurality of constructed air quality forecasting models, wherein the training is divided into two modes. One is a conventional mode, namely, the conventional pollutant concentration data, meteorological factor data and the like after the data are cleaned in the step (2) are directly adopted to train the model; and the other is a pollution standard exceeding mode, namely, the model is trained by adopting the sample data obtained by resampling the heavy pollution weather sample in the step (3) and the meteorological factor data. The data of the training model accounts for 70% of the total number of samples, and the remaining 30% of the number of samples is used for model verification.
(9) And (3) model verification: the model is required to obtain a better effect finally, different pollutants are required to select the model which is most suitable for the pollutant at different sites, and the basis of model selection is the key of the pollutant. The invention designs a set of evaluation indexes for forecasting results, provides basis for evaluating models and selecting optimal models, and comprises two general statistical indexes: root Mean Square Error (RMSE) and decision coefficient (R) 2 ) And three categorical statistical indicators: and predicting correct proportion (FCF), success Index (SI) and error early warning proportion (FFA) in the exceeding days, and comprehensively applying the indexes to comprehensively measure the quality of the model.
(10) And (4) automatically selecting the optimal model in each group of spatial model groups as the model used in the automatic forecasting stage according to different pollutants at different sites and different forecasting time periods through the general statistical indexes and the classification statistical indexes stated in the step (9). Each space model group comprises an optimal model of a conventional mode and an optimal model of a pollution standard exceeding mode.
2. An automatic forecasting stage, comprising the following steps:
(11) Monitoring data acquisition: and compiling a data acquisition program for the state control monitoring site and the county monitoring site in a certain area (the current state control monitoring site and the county monitoring site belong to different monitoring networks), and continuously acquiring the pollutant concentration data with the hour resolution in real time to be used as input data of an automatic forecasting stage.
(12) Meteorological data acquisition: a program for automatically downloading GFS weather forecast data is compiled, the weather data format can be converted into a format required by an automatic forecast stage, and the weather factor data is subjected to standardized processing.
(13) Judging the wind direction: and automatically judging the wind direction of the forecast day by using a wind direction judging program.
(14) And dynamically screening the space model group selected by each station according to the wind direction, for example, for one station, firstly judging the wind direction of the area where the station is located, then selecting the space model group of the wind direction for the station, and then simulating and generating a forecasting result according to the optimal model in the space model group screened in the model verification stage in the training process. And the forecast results also comprise forecast results in a conventional mode and forecast results in a pollution exceeding mode.
(15) And automatically judging a final forecast result, and verifying that the correlation between the forecast trend of the forecast result and the actual monitoring value in the conventional mode is higher through a model, and the forecast is lower only under the condition of individual extreme value. Therefore, the invention designs a set of mechanism, when the forecast value of the conventional mode reaches a certain value, the forecast result under the pollution standard exceeding mode is used as the final forecast result, otherwise, the forecast result under the conventional mode is used as the final forecast result.
Example 3
The forecasting accuracy of the forecasting model configured for a certain area by adopting the innovative method is evaluated, the evaluation time period is 11 months 1 days in 2021 year to 6 months 23 days in 2022 year, the evaluation indexes are the forecasting accuracy of the AQI level, the forecasting accuracy of the PM2.5 level and the forecasting accuracy of the O3-8h level, and the evaluation method is referred to the national environmental protection standard of the people's republic of China: the environmental air quality numerical prediction technical specification (HJ 1130-2020) is obviously superior to the conventional method through evaluation, the evaluation result is shown in the following table, and the table 1 shows the AQI, PM2.5 and O in the future 24 hours 3-8h The level forecast accuracy rate; TABLE 2 AQI, PM2.5 and O for 72 hours in the future 3-8h And (4) level forecasting accuracy. The above ambient air quality numerical prediction specifications (HJ 1130-2020) specify: the AQI level prediction accuracy rate evaluation result is not less than 60%. The average value of the grade prediction accuracy rates of the future 24 hours configured for a certain area based on the innovative method of the invention reaches more than 85 percent, and the average value of the grade prediction accuracy rates of the future 72 hours reaches more than 75 percent, which is far higher than the standard.
TABLE 1 prediction accuracy at a future 24-hour level for a prediction model deployed for an area based on the present invention
Figure BDA0003882215530000141
TABLE 2 prediction accuracy at 72 hour future level for a prediction model deployed for an area based on the present invention
Figure BDA0003882215530000142
The above are all preferred embodiments of the present invention, and the protection scope of the present invention is not limited thereby, so: equivalent changes made according to the structure, shape and principle of the invention shall be covered by the protection scope of the invention.

Claims (8)

1. A BP neural network air quality forecasting method based on space packet modeling is characterized in that: the method comprises a model training stage and an automatic forecasting stage, and specifically comprises the following steps:
s1: stage of model training
S101: data acquisition: collecting pollutant concentration data of hour resolution of national control monitoring sites and county monitoring sites of a target forecast area for nearly three years;
s102: data cleaning: carrying out automatic cleaning on the acquired data in various modes, and carrying out test on whether the cleaned data conforms to normal distribution;
s103: resampling the heavily polluted weather sample, and reserving the resampled data for later use;
s104: meteorological data preprocessing: converting the downloaded weather forecast data into a required format, and carrying out standardization processing on the weather factor data;
s105: analyzing the spatial dependence relationship between each site of the target forecasting region and the district and county sites, namely analyzing the correlation relationship between the pollutant concentration of each national control monitoring site and the pollutant concentration of the peripheral district and county sites;
s106: performing space grouping modeling according to the space dependency relationship between each site of the target forecasting region and the sites of the surrounding counties;
s107: analyzing the correlation between the pollutant concentration data and the meteorological factor data, analyzing the correlation between the pollutant concentration data of each national control monitoring site in the target forecasting region and each meteorological factor data, and screening a meteorological parameter set;
s108: configuring and constructing a plurality of air quality forecasting models, wherein the configured models comprise: an input layer, 4 hidden layers and an output layer;
s109: training a model: training a plurality of constructed air quality prediction models, wherein the training is divided into two modes, one mode is a conventional mode; the other is a pollution standard exceeding mode; the data of the training model accounts for 70% of the total sample number, and the rest 30% of the sample number is used for model verification;
s110: and (3) model verification: two general statistical indicators are included: root mean square error and decision coefficient; and three categorical statistical indicators: predicting a correct proportion, a success index and an error early warning proportion by the exceeding days;
s111: automatically selecting an optimal model as a model used in an automatic forecasting stage according to the general statistical indexes and the classification statistical indexes described in the step S110 at different sites and different forecasting time periods of different pollutants;
s2: automatic forecasting phase
S201: monitoring data acquisition: compiling data acquisition programs of national control monitoring sites and county monitoring sites of a target forecasting area, and continuously acquiring pollutant concentration data with hour resolution in real time to serve as input data of an automatic forecasting stage;
s202: meteorological data acquisition: a program for automatically downloading GFS weather forecast data is compiled, and the weather data format can be converted into a format required by an automatic forecast stage and the weather factor data is subjected to standardized processing;
s203: judging the wind direction: automatically judging the wind direction of a forecast day by using a wind direction judging program;
s204: dynamically screening a space model group selected by each station forecast according to the wind direction;
s205: and automatically judging a final forecast result, and when the forecast value of the conventional mode reaches a certain value, using the forecast result in the pollution standard exceeding mode as the final forecast result, otherwise, using the forecast result in the conventional mode as the final forecast result.
2. The BP neural network air quality forecasting method based on the spatial grouping modeling as claimed in claim 1, wherein: the resampling method comprises the following steps:
X'=X min +randΟ×(X max -X min )
X max -is the maximum value in the sample;
X min -is the minimum value in the sample;
rand O-is a uniformly distributed random real number between 0 and 1;
and X' is the resampled data.
3. The BP neural network air quality forecasting method based on the spatial grouping modeling as claimed in claim 1, wherein: the standardization treatment method comprises the following steps: y = log (1+y)
y is actual meteorological factor data;
and Y-is the meteorological factor data after the standardization processing.
4. The BP neural network air quality forecasting method based on the space packet modeling as claimed in claim 1, wherein: the input layer comprises a space factor, a meteorological factor and a time factor; 4 layers are selected as hidden layers; the output layer predicts the concentration for the contaminant.
5. The BP neural network air quality forecasting method based on the spatial grouping modeling as claimed in claim 1, wherein: the conventional mode is to train the model directly by using the conventional pollutant concentration data after the data cleaning in step S102, the meteorological factor data, and the like.
6. The BP neural network air quality forecasting method based on the space packet modeling as claimed in claim 1, wherein: the pollution standard exceeding mode is to train the model by adopting the sample data obtained by resampling the heavy pollution weather sample in the step S103 and the meteorological factor data.
7. The BP neural network air quality forecasting method based on the space packet modeling as claimed in claim 1, wherein: the root mean square error is calculated as:
Figure FDA0003882215520000031
the calculation formula of the judgment coefficient is as follows:
Figure FDA0003882215520000032
in the above formula, X OBS,i -a monitored value representing day i; x model,i -a forecast value representing day i; n-represents the number of samples for model validation.
8. The BP neural network air quality forecasting method based on the spatial grouping modeling as claimed in claim 1, wherein: the correct proportion calculation formula for the exceeding days is as follows:
Figure FDA0003882215520000033
the success index calculation formula is as follows:
Figure FDA0003882215520000041
the error early warning proportion calculation formula is as follows:
Figure FDA0003882215520000042
in the formula, N1-represents the number of days that the monitored value reaches the standard and the forecast value also reaches the standard; n2 represents the number of days when the monitored value reaches the standard and the forecast value exceeds the standard; n3 represents the number of days that the monitored value exceeds the standard and the forecast value also exceeds the standard; n4-represents the number of days that the monitored value exceeds the standard and the forecast value reaches the standard.
CN202211234483.2A 2022-10-10 2022-10-10 BP neural network air quality forecasting method based on space grouping modeling Pending CN115639628A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211234483.2A CN115639628A (en) 2022-10-10 2022-10-10 BP neural network air quality forecasting method based on space grouping modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211234483.2A CN115639628A (en) 2022-10-10 2022-10-10 BP neural network air quality forecasting method based on space grouping modeling

Publications (1)

Publication Number Publication Date
CN115639628A true CN115639628A (en) 2023-01-24

Family

ID=84942063

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211234483.2A Pending CN115639628A (en) 2022-10-10 2022-10-10 BP neural network air quality forecasting method based on space grouping modeling

Country Status (1)

Country Link
CN (1) CN115639628A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117668531A (en) * 2023-12-07 2024-03-08 无锡中科光电技术有限公司 EMMD-BP neural network atmospheric pollutant forecasting method based on principal component analysis

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117668531A (en) * 2023-12-07 2024-03-08 无锡中科光电技术有限公司 EMMD-BP neural network atmospheric pollutant forecasting method based on principal component analysis

Similar Documents

Publication Publication Date Title
CN110363347B (en) Method for predicting air quality based on neural network of decision tree index
CN110346517B (en) Smart city industrial atmosphere pollution visual early warning method and system
CN110782093B (en) PM fusing SSAE deep feature learning and LSTM2.5Hourly concentration prediction method and system
Dedovic et al. Forecasting PM10 concentrations using neural networks and system for improving air quality
CN114936957B (en) Urban PM25 concentration distribution simulation and scene analysis model based on mobile monitoring data
CN105738974A (en) Air heavy pollution weather forecast method and system
CN108802856B (en) AI-based source data dynamic correction and forecast system and working method thereof
CN111489015A (en) Atmosphere O based on multiple model comparison and optimization3Concentration prediction method
CN103279671A (en) Urban water disaster risk prediction method based on RBF (radial basis function) neural network-cloud model
CN108537336B (en) Air quality prediction method based on deep neural network
CN114578457B (en) Air pollutant concentration space-time prediction method based on evolutionary ensemble learning
CN110727717A (en) Monitoring method, device, equipment and storage medium for gridding atmospheric pollution intensity
CN115759488A (en) Carbon emission monitoring and early warning analysis system and method based on edge calculation
CN113011455B (en) Air quality prediction SVM model construction method
CN117031582B (en) Ozone hour concentration forecasting method based on recursive space-time learning and simulation monitoring fusion
CN115860286B (en) Air quality prediction method and system based on time sequence gate mechanism
CN115639628A (en) BP neural network air quality forecasting method based on space grouping modeling
CN111709646A (en) Air pollution exposure risk evaluation method and system
CN116013426A (en) Site ozone concentration prediction method with high space-time resolution
CN112287299A (en) River health change quantitative attribution method, device and system
CN116402408A (en) Site concentration difference-based local emission contribution estimation method
CN115545565A (en) Method and system for managing and controlling total amount of pollution discharged from park based on atmospheric environment quality
CN115936523A (en) Atmospheric PM 2.5 High-time-space high-precision analysis and evaluation method and system for crowd exposure
CN113420443B (en) Accurate stink simulation method coupled with peak-to-average factor
Rumbayan et al. Solar irradiation estimation with neural network method using meteorological data in Indonesia

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination