CN117332906B

CN117332906B - Machine learning-based three-dimensional space-time grid air quality prediction method and system

Info

Publication number: CN117332906B
Application number: CN202311628815.XA
Authority: CN
Inventors: 王新锋; 韩子祯; 关天奕; 辛鑫; 宋晓萌; 王一丹; 张庆竹; 任鹏杰; 陈竹敏; 王桥
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2023-12-01
Filing date: 2023-12-01
Publication date: 2024-03-15
Anticipated expiration: 2043-12-01
Also published as: CN117332906A

Abstract

The invention relates to a machine learning-based three-dimensional space-time grid air quality prediction method and system, and relates to the technical field of air quality prediction. The method comprises the following steps: acquiring meteorological data and source emission list data with different heights, preprocessing the meteorological data, and constructing a three-dimensional space-time grid data set according to the preprocessed meteorological data and source emission list data; comparing the predicted value and the measured value of the machine learning composite model, and correcting the machine learning composite model according to the comparison result; and predicting the air quality of the region to be detected by using a machine learning composite model, and evaluating the local emission contribution and the emission reduction effect through scene simulation. The invention can better show the space-time distribution characteristics of the atmospheric pollutants in the future time period of the regional space range, and quantitatively evaluate the contribution degree of the local source row and the emission reduction effect to a certain extent.

Description

Machine learning-based three-dimensional space-time grid air quality prediction method and system

Technical Field

The invention relates to the technical field of air quality prediction, in particular to a three-dimensional space-time grid air quality prediction method and system based on machine learning.

Background

With the acceleration of the process of urban and industrial production, PM in the atmosphere _2.5 、O ₃ The pollution conditions are severe, the air pollution has obvious influence on the health of human bodies and the ecological environment, and the air pollution becomes an environmental prominence problem which is focused on by people, so that the future air quality is predicted to trace the emission pollution sources or evaluate the treatment effect, and the air pollution control method has great significance for the human society.

The continuous development of computer technology is beneficial to improving the accuracy of air quality prediction of regional environments and reducing the prediction time, and in recent years, an air quality prediction model is gradually developed from a physical modeling and data driving model to a machine learning model. The conventional regional air quality model based on the physical and chemical mechanism of the atmosphere often has the following problems: the method has the advantages of large operation amount, more consumed resources, long time consumption and lower spatial resolution. The existing big data-based air quality prediction website or platform outputs basically a near-ground two-dimensional result, has a good prediction effect on air quality of three-dimensional space-time grid space-time distribution, is difficult to screen and evaluate an overhead pollution source, does not directly consider the influence of source emission generally, and cannot quantitatively evaluate the contribution of local emission and the emission reduction effect. Therefore, it is very interesting to explore air quality prediction methods based on machine-learning data-driven three-dimensional spatiotemporal feature extraction.

Through searching and finding existing patents and related technologies, the existing air quality prediction method comprises the following steps:

(1) The invention relates to a time sequence prediction method for urban air quality taking time-space correlation into consideration, and provides a time sequence prediction method for urban air quality taking time-space correlation into consideration, which is disclosed in Chinese patent publication No. CN 111340288A. The method predicts PM for 1-12 hours in the future by using random forest model _2.5 Concentration. A space-time correlation cube is introduced to extract space-time information, and a singular spectrum analysis and random forest coupling model is designed to accurately fit the air quality of the future stage. However, the data sources used in the patent are single, only the data of the stations are monitored, and the prediction of the air quality in the vertical direction is lacking.

(2) The invention discloses an air quality detection system based on an LSTM-CNN model, and provides an air quality detection system based on the LSTM-CNN model in Chinese patent with publication number of CN 115730684A. The system expands the vertical profile of satellite data by using an LSTM-CNN model and an interpolation method to obtain stereoscopic observation data, so that the horizontal resolution and the vertical resolution of the stereoscopic observation data are respectively improved to more than 2 times and 4 times of the original observation data. However, the air quality monitoring system can only realize the prediction of the current day O through historical data ₃ And PM _2.5 Is not predictive of contaminant concentration over a future time period. In addition, the LSTM-CNN model is often time-consuming when processing data with large calculation amount, and the model does not correct the prediction result after the prediction is finished.

Therefore, it can be seen that although the current air quality prediction model based on machine learning realizes the simulation of air pollutants in different time periods or different horizontal and vertical spaces, the comprehensive prediction of air quality in a three-dimensional space in a future period of time is lacking, and assimilation fusion of multiple source data sets is lacking, so that accurate prediction is difficult to realize.

Disclosure of Invention

Aiming at the defects existing in the prior art, the invention aims to provide a three-dimensional space-time grid air quality prediction method and system based on machine learning, which can well reproduce horizontal advection and vertical diffusion processes and continuously predict air quality in future time periods of regional space ranges.

In order to achieve the above object, the present invention is realized by the following technical scheme:

the invention provides a machine learning-based three-dimensional space-time grid air quality prediction method, which comprises the following steps of:

acquiring meteorological data and source emission list data with different heights, preprocessing the meteorological data, and constructing a three-dimensional space-time grid data set according to the preprocessed meteorological data and source emission list data;

training a machine learning composite model by utilizing a three-dimensional space-time grid data set, wherein a random forest algorithm is firstly adopted to train an ozone number density model and an optical thickness model, prediction is carried out according to the trained model, a prediction result and an actual measurement value are filled into the three-dimensional space-time grid data set, and then the filled data set is utilized to train an atmospheric pollutant model through a random forest, a cross network or a deep neural network, so that the machine learning composite model is obtained;

comparing the predicted value and the actual measured value of the machine learning composite model, and correcting the machine learning composite model according to the comparison result;

and predicting the air quality of the region to be detected by using a machine learning composite model, and evaluating the local emission contribution and the emission reduction effect through scene simulation.

Further, the three-dimensional space-time grid data set is a set of data sets with time periods of warm seasons and cold seasons, the total time period of the three-dimensional space-time grid data set is more than or equal to 2 weeks, and the time difference between the three-dimensional space-time grid data set and the prediction day is less than or equal to 2 years.

Furthermore, the training of the atmospheric pollutant model adopts a cross network or deep neural network training model when the continuous data quantity of the three-dimensional space-time grid data set time sequence exceeds 1000, and adopts a random forest training model when the continuous data quantity of the three-dimensional space-time grid data set time sequence is lower than 1000.

Further, when meteorological data and source emission list data of different heights are acquired, meteorological data and source emission list data of different sources and different resolutions are acquired, and satellite profile data of monitoring point data, optical thickness and ozone number density of different sources are also acquired at the same time.

Further, the specific steps of constructing the three-dimensional space-time grid data set include:

standardizing the meteorological data set, wherein the time series of the standardized data set comprises a training period and a prediction period; and merging the meteorological data set and the source emission inventory data set, so that the source emission inventory data are merged into the three-dimensional space-time grid data set, and finally, the satellite profile data of optical thickness and ozone number density are merged into the three-dimensional space-time grid data set.

Further, the source emission list data is integrated into the three-dimensional space-time grid data set by acquiring the earth surface height data and time information from the meteorological data, and combining the data of the latitude and longitude corresponding to the source emission list data according to the height information of the meteorological data.

Further, the specific steps of comparing the predicted value and the measured value of the machine learning composite model and correcting the machine learning composite model according to the comparison result include:

calculating the ratio of the measured value to the predicted value of the atmospheric pollutants at the current time;

filling interpolation of the calculated ratio in a future time period, and dynamically correcting a predicted value of the atmospheric pollutants according to a ratio result;

and carrying out secondary dynamic correction on the predicted value of the atmospheric pollutants by using a relative humidity exponential decay formula according to the relative humidity changes of the air at different heights.

Further, an atmospheric pollutant model is constructed according to the atmospheric pollutant condition of the monitoring point, and the atmospheric pollutant is PM _2.5 、PM ₁₀ 、NO ₂ 、SO ₂ 、O ₃ And one or more of CO.

Further, the specific steps of predicting the air quality and the emission reduction effect of the region to be detected by using the machine learning composite model include:

processing meteorological data and source emission list data of the region to be detected by using a machine learning composite model to obtain an air quality prediction result of the region to be detected;

setting the emission intensity of the local source to be zero, re-inputting the emission intensity of the local source into a machine learning composite model, and obtaining the contribution of the local emission according to the change of the predicted concentration;

and reducing pollutant emission concentration in the source emission list data according to preset requirements, and re-inputting the pollutant emission concentration into the machine learning composite model to obtain an emission reduction effect prediction result.

A second aspect of the present invention provides a machine learning based three-dimensional space-time grid air quality prediction system comprising:

the data acquisition module is configured to acquire meteorological data and source emission list data with different heights, preprocess the meteorological data and construct a three-dimensional space-time grid data set according to the preprocessed meteorological data and source emission list data;

the model training module is configured to train a machine learning composite model by utilizing a three-dimensional space-time grid data set, wherein a random forest algorithm is firstly adopted to train an ozone number density model and an optical thickness model, prediction is carried out according to the trained models, a prediction result and an actual measurement value are filled into the three-dimensional space-time grid data set, and then the filled data set is utilized to train an atmospheric pollutant model through a random forest, a cross network or a deep neural network, so that the machine learning composite model is obtained;

the model correction module is configured to compare the predicted value and the actual measured value of the machine learning composite model and correct the machine learning composite model according to the comparison result;

and the prediction module is configured to predict the air quality of the region to be detected by using the machine learning composite model and evaluate the local emission contribution and the emission reduction effect through scene simulation.

The one or more of the above technical solutions have the following beneficial effects:

the invention discloses a machine learning-based three-dimensional space-time grid air quality prediction method and system, which designs data-driven three-dimensional space-time grid space-time distribution, has good display effect and is beneficial to regional transmission evaluation of an air pollution three-dimensional space.

The invention carries out two times of training on the data set of the three-dimensional space-time grid which is processed and fused by using algorithms such as random forests and the like, and firstly simulates the ozone number density (O) ₃ ) And an optical thickness (AOD) model, adding the obtained satellite remote sensing vertical profile to the total data set, and retraining atmospheric contaminants (such as PM _2.5 ) The model is beneficial to improving the prediction accuracy of the air quality. In addition, the method combines known real data and relative humidity changes with different heights to dynamically correct the predicted data twice, so that the accuracy and reliability of the predicted model can be further improved.

The method takes the disclosed three-dimensional weather analysis data, source emission list, satellite remote sensing vertical profile data and environmental air monitoring site data as model input data, establishes a three-dimensional space-time grid air quality prediction model and a prediction method based on a machine learning algorithm, and can be further used for air quality scene simulation, evaluating local emission contribution and emission reduction effect, analyzing high-altitude transmission of atmospheric pollutants and the like.

Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

FIG. 1 is a flow chart of a machine learning based three-dimensional space-time grid air quality prediction method in accordance with a first embodiment of the present invention;

FIG. 2 is a flow chart of a machine learning composite model data processing in accordance with a first embodiment of the present invention;

FIG. 3 is a flowchart of training, verifying and predicting a random forest model in accordance with a first embodiment of the present invention;

FIG. 4 is a diagram of multi-source dataset formation in accordance with a first embodiment of the present invention;

FIG. 5 is a schematic diagram of the prediction performance of an AOD model according to a first embodiment of the present invention;

FIG. 6 is a diagram of O in a first embodiment of the invention ₃ Model predictive representation schematics;

FIG. 7 is a PM in a first embodiment of the invention _2.5 Model predictive representation schematics.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It should be noted that, in the embodiments of the present invention, the weather data and the source emission data related to the monitoring station, the organization or the institution and the like are related, when the embodiments of the present invention are applied to specific products or technologies, the user permission or consent is required to be obtained, and the collection, the use and the processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof;

embodiment one:

the first embodiment of the invention provides a machine learning-based three-dimensional space-time grid air quality prediction method, as shown in figure 1, using PM _2.5 For example, firstly, acquiring and standardizing meteorological data, extracting features, resampling and interpolating the meteorological data and the emission list data, combining satellite profile data to obtain a multi-source data three-dimensional space-time grid data set, inputting the three-dimensional space-time grid data set into a machine learning composite model to obtain PM _2.5 Is a predicted result of (a). Wherein, as shown in FIG. 2, the machine compound model obtains AOD and O from the multi-source data three-dimensional space-time network data set through a random forest model ₃ According to the PM of the monitoring station _2.5 Generating a new three-dimensional space-time grid data set by using the data, and processing the new three-dimensional space-time grid data set by using a cross network, a deep neural network or a random forest model to obtain PM _2.5 Prediction of (2)As a result. After that for PM _2.5 And (3) performing altitude correction and relative humidity correction, wherein the corrected model is used for evaluating emission contribution and emission reduction results.

The method specifically comprises the following steps:

step 1, acquiring meteorological data and source emission list data with different heights, preprocessing the meteorological data, and constructing a three-dimensional space-time grid data set according to the preprocessed meteorological data and source emission list data.

And 2, training a machine learning composite model by utilizing a three-dimensional space-time grid data set based on algorithms such as random forests, cross networks, deep neural networks and the like. The random forest model training, verifying and predicting process is shown in fig. 3, and comprises a screening process of a three-dimensional space-time grid training set and a testing set, result correction is carried out according to historical data rules, scientific principles and the like as required, then a three-dimensional air quality predicting model is generated, and the three-dimensional space-time grid predicting data is obtained by combining the added three-dimensional space-time grid real-time data for prediction.

And 3, comparing the predicted value and the actual measured value of the machine learning composite model, and correcting the machine learning composite model according to the comparison result.

And 4, predicting the air quality of the region to be detected by using a machine learning composite model, and evaluating the local emission contribution and the emission reduction effect through scene simulation.

In step 1, when the meteorological data and source emission list data of different heights are acquired, the meteorological data and source emission list data of different sources and different resolutions are acquired, namely, the data come from different monitoring sites, organizations or institutions. PM (particulate matter) of monitoring points of different sources is also acquired simultaneously _2.5 Data, optical thickness and ozone number density.

The three-dimensional space-time grid data set is a set of data sets with time periods of warm seasons and cold seasons, the total time period of the three-dimensional space-time grid data set is more than or equal to 2 weeks, and the time difference between the three-dimensional space-time grid data set and the prediction current day is less than or equal to 2 years.

The method for constructing the three-dimensional space-time grid data set comprises the following specific steps of:

standardizing the meteorological data set, wherein the time series of the standardized data set comprises a training period and a prediction period; and merging the meteorological data set and the source emission inventory data set, so that the source emission inventory data are merged into the three-dimensional space-time grid data set, and finally, the satellite remote sensing vertical profile data of optical thickness and ozone number density are merged into the three-dimensional space-time grid data set.

The method of the meteorological data set standardization is to adjust each space-time resolution of the data set to a uniform size, wherein the space-time resolution comprises longitude, latitude, altitude and time resolution. The specific method is to divide the height of the data set into a certain layer number, group the data according to the time, latitude, longitude and layer number dimension of the data, and calculate the average value of each group. Then, interpolation is carried out on the data in longitude and latitude, then interpolation is carried out in height dimension, and finally time resampling is carried out, so that time series data of different characteristics of the same coordinates and different time stamps are obtained.

In this embodiment, as shown in fig. 4, the preprocessed two-dimensional multi-source data is combined into a three-dimensional space-time grid data set by using left connection, and the interpolation method of data set filling may be one or more of linear interpolation, polynomial interpolation, spline interpolation, forward filling interpolation or backward filling interpolation.

The source emission list data is integrated into the three-dimensional space-time grid data set by acquiring surface height data and time information from the meteorological data, and combining the data of latitude and longitude corresponding to the source emission list data according to the height information of the meteorological data.

In this embodiment, the processing and merging of multi-source datasets handles the standardization and merging of multiple data simultaneously by creating a multi-process and multi-threaded environment.

In a specific embodiment, a three-dimensional space-time grid air quality data set is constructed, and a city atmospheric pollutant source emission list, environmental air quality national control station monitoring data, satellite remote sensing vertical profile data and three-dimensional meteorological simulation data are acquired by taking Shandong province as an example. The source emission list data is from a bloom emission list data set, the meteorological simulation data is from a meteorological data-FNL/GFS data set, the satellite remote sensing vertical profile data is from a CALIOP satellite data set, and the environment monitoring data is from a national atmospheric environment monitoring site data set.

Data collection and preprocessing: air quality detection data is collected over a specified date and altitude range, including emissions inventory data, weather data, monitoring point data, and satellite profile data. Selecting data of 2017, 6, 23, 7, 11, 23, 12, 7 days, preprocessing the data, and matching according to the space site position and acquisition time to obtain time sequence data of different characteristics of the same coordinate and different time stamps;

the data preprocessing sequence is that firstly, a weather data set with stronger data integrity is standardized, and then source list data with large data quantity is three-dimensionally processed, wherein the preprocessing of the source list data set is obtained by acquiring surface height data and time information from the weather data, combining data corresponding to latitude and longitude according to the height information of the data, and calculating the actual height by adding the ground height and a value from the surface height. And then when the satellite remote sensing vertical profile data is preprocessed, the meteorological data height data and the CALIOP satellite vertical data are combined by utilizing the corrected relative humidity. Finally, the data sets are combined into a file, and a file containing the target variable is generated.

The data set normalization method is to match each data according to the space site position and the acquisition time. The data structure was highly layered, the height of each layer was set to 60 m, and the data average value of each layer was calculated. And groups the processed data by using time, latitude, longitude, and number of layers, and calculates an average value of each group. And then performing spatial linear interpolation, namely interpolating in longitude and latitude dimensions, and then interpolating in altitude dimensions. And then time resampling is carried out, and each data set is unified into a standard format respectively to obtain time sequence data of different characteristics of the same coordinate and different time stamps.

The preprocessed data is subjected to high-level processing, precision conversion and the likeMethod, reading all preprocessed data sets only including time, latitude, longitude, layer number and O ₃ And a column of AODs. The preprocessed two-dimensional multisource data is combined into a three-dimensional space-time grid data set using left connections based on latitude, longitude, time and number of layers.

The data is processed and merged by creating a multi-process and multi-threaded environment. For each generated date, a new thread is created and started to run.

In step 2, the measured ozone density and aerosol optical thickness (AOD) are small amounts of banded data covering only a small portion of the investigation region, and it is necessary to first spatially predict, expand to the entire spatial region, and then predict in the future time. Ozone density and aerosol optical thickness, and PM _2.5 Belonging to different pollutant/atmospheric parameters, thus utilizing O ₃ And the vertical profile of the AOD can predict PM _2.5 (concentration distribution in the vertical direction).

In a specific embodiment, the ozone number density model and the optical thickness model are trained by a random forest algorithm to obtain a vertical profile with high spatial coverage, and the prediction result and atmospheric pollutants (such as PM _2.5 ) The actual measurement value is filled into a three-dimensional space-time grid data set, and then the filled data set is utilized to train an atmospheric pollutant model through a random forest, a cross network or a deep neural network, so that a machine learning composite model is obtained. After the machine learning composite model is obtained, three-dimensional meteorological data of a period to be predicted are input into the machine learning composite model, and the atmospheric pollutant concentration of the day and the future 3 days hour by hour is predicted.

Wherein, the atmospheric pollutant model is constructed according to the atmospheric pollutant condition of the monitoring point, and the atmospheric pollutant is PM _2.5 、PM ₁₀ 、NO ₂ 、SO ₂ 、O ₃ And one or more of CO, in this embodiment in PM _2.5 The machine learning composite model is illustrated as an example.

In a specific embodiment, the preprocessed three-dimensional spatiotemporal grid data is read, defining AODs and O ₃ ModelFeatures and target variables of (a). Training and prediction of models under the Scikit-learn framework. After the training and testing sets are divided, a regression model such as a random forest, a cross network or a deep neural network is used for training and calculating the score of the model on the testing set to obtain a simulated satellite vertical profile, wherein the prediction performance result of the AOD model is shown as figure 5, and O ₃ The model predictive performance results are shown in fig. 6.

AOD and O ₃ Model predicted AOD and O ₃ Results are populated into the dataset and PM is trained using the updated dataset _2.5 And (5) compounding the model. After the training and testing sets are divided, the model is trained by using algorithms such as random forests, cross networks or deep neural networks, and the score of the model on the testing set is calculated, and the model is stored. Loading a trained AOD model, O ₃ And (5) a model, and reading prediction data. Creation of AODs, O by selecting different columns in the total dataset ₃ Sub-data set, using AOD model, O ₃ Model and model pair AOD, O ₃ And (5) predicting. Loading trained PM _2.5 Composite model, adding AOD and O to a dataset ₃ To obtain a new data set using PM _2.5 The composite model predicts, and the prediction result is shown in fig. 7.

In step 3, the specific steps of comparing the predicted value and the measured value of the machine learning composite model and correcting the machine learning composite model according to the comparison result include:

s1: and calculating the ratio of the measured value to the predicted value of the atmospheric pollutants at the current time.

Calculation of the current atmospheric pollutants (e.g. PM _2.5 ) Ratio of measured value to predicted value:

P _ri = P _i / P _vi ，（1），

wherein P is _ri P is the ratio of the observed value to the predicted value of the pollutant on the ith day _i For the ith day PM _2.5 Measured value, P _vi Is the predicted value of the contaminant on the ith day.

S2: and filling interpolation of the calculated ratio in a future time period, and dynamically correcting the predicted value of the atmospheric pollutants according to the ratio result.

This ratio data is then interpolated over an hour-by-hour time of 3 days into the future. Linear interpolation methods, forward fill and backward fill methods are used to interpolate first in the time and height dimensions and then in the horizontal dimensions (longitude and latitude). And then, correcting the predicted value, including twice correction, and one or more times of correction can be used according to actual conditions. The predicted values of different heights are corrected for the first time, and the formula is as follows:

P _a1i = P _vi ×P _ri （2），

wherein P is _ri P is the ratio of the observed value to the predicted value of the pollutant on the ith day _vi P, which is the predicted value of the pollutant on the ith day _a1i The correction value is once for the ith contaminant.

S3: and carrying out secondary dynamic correction on the predicted value of the atmospheric pollutants by using a relative humidity exponential decay formula according to the relative humidity changes of the air at different heights.

And filling the missing value and then performing second dynamic correction. If L _i <12, the correction formula is

P _a2i =P _a1i （3）.

Wherein P is _a2i Is the ith secondary correction value;

if L _i >11, then using the relative humidity exponential decay formula:

（4）。

wherein L is _i Is the layer value of the i-th data.

In step 4, the specific steps of predicting the air quality and the emission reduction effect of the region to be detected by using the machine learning composite model include:

and reducing pollutant emission concentration in the source emission list data according to preset requirements, and re-inputting the pollutant emission concentration into the machine learning composite model to obtain an emission reduction effect prediction result. PM after emission reduction is obtained _2.5 Concentration values. And evaluating emission contribution and emission reduction effect according to the result.

Specifically, according to the P of the selected specific place _ri Value, map PM at specified location _2.5 A plot of concentration versus time; PM after emission reduction in specific place is selected _2.5 And (5) drawing a concentration value and a line graph of emission reduction and emission reduction effects. In addition, specific time, height and longitude and latitude can be selected, and a thermodynamic diagram of the horizontal distribution of the concentration of the atmospheric pollutants, a thermodynamic diagram of the vertical distribution of the concentration of the atmospheric pollutants, a two-dimensional ray diagram of the prediction of the concentration of the atmospheric pollutants and the like can be drawn.

The method in this embodiment uses a monitoring function to realize full-automatic operation. Specifically, the watchdog library is used to monitor files in the file system, which monitors a specified folder when a file of a particular format is created.

To further demonstrate the superior effect of the model of the present invention, the model predictive performance was validated against the measured values, and the comparative simulation data selected ozone number density, AOD and PM _2.5 Concentration value, result finding R of linear regression equation representing relation between predicted value and measured value ² And the three-dimensional air quality model is larger than 0.96, which shows that the three-dimensional air quality model and the three-dimensional air quality model have good prediction performance and higher fitting result.

In this embodiment, AOD, O ₃ The model is based on prediction in three-dimensional space of random forest, space resampling; but is PM _2.5 The method designs three-dimensional space-time grid fusion multi-source data based on prediction of random forest or neural network and time interpolation on future time, and utilizes algorithms such as random forest, cross network or deep neural network to perform AOD and O ₃ And PM _2.5 Prediction of concentration-different time-space sequences. Combining PM _2.5 PM as a function of measured values and relative humidity changes at different heights _2.5 The prediction value is dynamically corrected, so that the accuracy and reliability of the prediction model can be further improved. The method realizes rapid forecast, early warning and evaluation of future atmospheric pollution, can automatically extract and dynamically display three-dimensional space-time grid distribution and variation trend of the atmospheric pollutants, intelligently identify the time period and the area where the atmospheric pollution is likely to occur, accurately evaluate the local emission contribution and the expected emission reduction effect, and is favorable for the transmission evaluation of the atmospheric pollution area.

The method solves the problems of large calculation amount, long time consumption and low spatial resolution of the traditional regional air quality model, realizes the prediction of the three-dimensional space-time grid air quality and the rapid prediction and evaluation of the future atmospheric pollution, can identify the time period and the region which are likely to generate the atmospheric pollution through the prediction result, and is beneficial to accurately evaluating the influence of a large pollution source.

Embodiment two:

the second embodiment of the invention provides a machine learning-based three-dimensional space-time grid air quality prediction system, which comprises:

The steps involved in the second embodiment correspond to those of the first embodiment of the method, and the detailed description of the second embodiment can be found in the related description section of the first embodiment.

It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented by general-purpose computer means, alternatively they may be implemented by program code executable by computing means, whereby they may be stored in storage means for execution by computing means, or they may be made into individual integrated circuit modules separately, or a plurality of modules or steps in them may be made into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.

While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims

1. The machine learning-based three-dimensional space-time grid air quality prediction method is characterized by comprising the following steps of:

satellite profile data of monitoring point data, optical thickness and ozone number density of different sources are also obtained simultaneously;

training a machine learning composite model by utilizing a three-dimensional space-time grid data set, wherein a random forest algorithm is firstly adopted to train an ozone number density model and an optical thickness model, prediction is carried out according to the trained model, a prediction result and an actual measurement value are filled into the three-dimensional space-time grid data set, and then the filled data set is utilized to train an atmospheric pollutant model through a random forest, a cross network or a deep neural network algorithm, so that the machine learning composite model is obtained;

the specific steps for constructing the three-dimensional space-time grid data set include: standardizing the meteorological data set, wherein the time series of the standardized data set comprises a training period and a prediction period; combining the meteorological data set and the source emission list data set, so that the source emission list data are integrated into the three-dimensional space-time grid data set, and finally integrating satellite remote sensing vertical profile data of optical thickness and ozone number density into the three-dimensional space-time grid data set;

comparing the predicted value and the measured value of the machine learning composite model, and correcting the machine learning composite model according to the comparison result, wherein the specific steps comprise: calculating the ratio of the measured value to the predicted value of the atmospheric pollutants at the current time; filling interpolation of the calculated ratio in a future time period, and dynamically correcting a predicted value of the atmospheric pollutants according to a ratio result; performing secondary dynamic correction on the predicted value of the atmospheric pollutants by using a relative humidity exponential decay formula according to the relative humidity changes of the air at different heights;

2. The machine learning based three-dimensional space-time grid air quality prediction method of claim 1, wherein the three-dimensional space-time grid data set is a set of data sets having time periods of warm seasons and cool seasons, and the total time period of the three-dimensional space-time grid data set is 2 weeks or more and the time difference from the prediction day is 2 years or less.

3. The machine learning based three-dimensional space-time grid air quality prediction method of claim 1, wherein the atmospheric contaminant model uses a cross network or deep neural network training model when the amount of data continuous in the three-dimensional space-time grid data set time sequence exceeds 1000, and uses a random forest training model when the amount of data continuous in the three-dimensional space-time grid data set time sequence is lower than 1000.

4. The machine learning based three-dimensional space-time grid air quality prediction method of claim 1, wherein the source emission inventory data is merged into the three-dimensional space-time grid data set by combining data of the source emission inventory data corresponding to latitude and longitude according to altitude information of the meteorological data by acquiring surface altitude data and time information from the meteorological data.

5. The machine learning-based three-dimensional space-time grid air quality prediction method according to claim 1, wherein the atmospheric pollutant model is constructed according to the condition of atmospheric pollutants at monitoring points, and the atmospheric pollutants are PM _2.5 、PM ₁₀ 、NO ₂ 、SO ₂ 、O ₃ And one or more of CO.

6. The machine-learning-based three-dimensional space-time grid air quality prediction method according to claim 1, wherein the specific step of performing prediction evaluation on the air quality, the local emission contribution and the emission reduction effect of the region to be measured by using the machine-learning composite model comprises the following steps:

7. A machine learning based three-dimensional space-time grid air quality prediction system for implementing the method of any of claims 1-6, comprising: