CN112990609A - Air quality prediction method based on space-time bandwidth self-adaptive geographical weighted regression - Google Patents

Air quality prediction method based on space-time bandwidth self-adaptive geographical weighted regression Download PDF

Info

Publication number
CN112990609A
CN112990609A CN202110477876.5A CN202110477876A CN112990609A CN 112990609 A CN112990609 A CN 112990609A CN 202110477876 A CN202110477876 A CN 202110477876A CN 112990609 A CN112990609 A CN 112990609A
Authority
CN
China
Prior art keywords
data
air quality
time
bandwidth
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110477876.5A
Other languages
Chinese (zh)
Other versions
CN112990609B (en
Inventor
仇阿根
杨毅
赵阳阳
张钰娟
陈才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese Academy of Surveying and Mapping
Original Assignee
Chinese Academy of Surveying and Mapping
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese Academy of Surveying and Mapping filed Critical Chinese Academy of Surveying and Mapping
Priority to CN202110477876.5A priority Critical patent/CN112990609B/en
Publication of CN112990609A publication Critical patent/CN112990609A/en
Application granted granted Critical
Publication of CN112990609B publication Critical patent/CN112990609B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The air quality prediction method based on space-time bandwidth adaptive geographical weighted regression comprises the following steps: the method comprises the steps of obtaining and preprocessing relevant data, constructing an air quality prediction model based on geographical weighted regression, calculating optimal space-time bandwidth in a self-adaptive mode, and carrying out local estimation on the air quality. Compared with other methods, the method has the advantages that higher prediction precision is obtained, the time-space change rule of the air quality can be accurately disclosed, and data and scientific support are provided for making relevant policies such as environmental protection policies, social and economic development policies, urban planning policies and the like in the future.

Description

Air quality prediction method based on space-time bandwidth self-adaptive geographical weighted regression
Technical Field
The invention relates to a method for predicting air quality. In particular, the present invention relates to a method and a storage medium for predicting air quality according to a spatiotemporal bandwidth adaptive geographical weighted regression method.
Background
The air quality is related to the life health of people and is more and more widely concerned. The air quality is affected by various factors, including pollution source factors such as the position of a pollution source and the emission concentration of pollutants, atmospheric environmental factors such as temperature, wind speed, humidity and air pressure, and surface environmental factors such as land utilization and vegetation coverage. With the known air quality influencing factors, the air quality can be predicted by establishing a regression model between the air quality and the influencing factors. Considering that the air quality at a specific moment is influenced by the air quality at the previous moment, the air quality at a specific place is influenced by the air quality of the surrounding area and related influence factors, and the air quality modeling has the characteristic of temporal and spatial non-stationarity, namely the influence degrees of the same influence factor at different times and places on the air quality are different. Therefore, modeling air quality is a complex modeling process that involves time, space, and takes into account spatiotemporal non-stationarities.
At present, the modeling of the air quality mostly does not consider the spatial non-stationarity, and the temporal effect is generally assumed to be constant in space when the temporal modeling is carried out, which can cause the air quality prediction result not to be consistent with the reality. Space-time geoweighted Regression (GTWR) explores the space-time variation and related driving factors of a research object by establishing a local Regression equation of each sample point in a space-time range, has higher accuracy due to the fact that space-time non-stationarity is considered in modeling, and is widely applied to various subjects and fields, such as geology, environmental science, landscape ecology, health investigation and the like. GTWR was developed based on Geophysics Weighted Regression (GWR) that explores the spatial variation of the study objects and the associated drivers by establishing local Regression equations for each sample point in the spatial range. Bai et al studied the relationship between the factors such as the height of the planet boundary layer, the relative humidity, the wind speed and the temperature and the concentration of PM2.5 by using OLS, GWR, TWR and GTWR models, and the result shows that the estimation accuracy of the GTWR model to the concentration of PM2.5 is the highest.
At present, the air quality prediction by using GTWR has the following problems: the space-time bandwidth can greatly affect the accuracy of the GTWR. The current bandwidth selection mostly depends on experience and manual trial, and is low in efficiency and low in precision. Therefore, how to select and quantize the optimal bandwidth becomes a technical problem to be solved in the prior art.
Disclosure of Invention
The invention aims to provide an air quality prediction method and a storage medium based on space-time bandwidth adaptive geographical weighted regression.
In order to achieve the purpose, the invention adopts the following technical scheme:
an air quality prediction method based on space-time bandwidth adaptive geographical weighted regression comprises the following steps:
data acquisition and preprocessing step S110:
(1) setting the data type: obtaining PM2.5 concentration data, meteorological data, statistical data and AOD data in a certain time range of a certain area, wherein the PM2.5 concentration is obtained from an air quality monitoring station as point data, the meteorological data comprises temperature, wind speed, relative humidity and atmospheric pressure, the meteorological data is obtained from a meteorological detection station as the point data, the statistical data comprises population and GDP data, the population and GDP data are obtained through spatialization, the statistical data is raster data, and the AOD (Aerosol Optical depth) data is selected from remote sensing images and is raster data;
(2) acquiring and preprocessing related data: establishing a 3km multiplied by 3km grid covering the whole area of the area, wherein the central point of the grid is an air quality sampling point, so as to obtain PM2.5 concentration data, meteorological data, statistical data and AOD data;
an air quality prediction model based on geographical weighted regression step S120:
an air quality prediction model based on geographical weighted regression is constructed according to the formula (1):
Figure DEST_PATH_IMAGE001
wherein, yiIs an air mass sample point (u)i,vi,ti) PM2.5 concentration of (b), whereinuWhich represents the longitude of the vehicle,vthe latitude is represented by the number of lines,trepresents time; x is the number ofikIs an air mass sample point (u)i,vi,ti) To get it atkThe values of the factors including AOD, population, GDP, temperature, wind speed, relative humidity and atmospheric pressure, εiIs a sample pointiP represents the number of influencing factors, betaikIs an air mass sample point (u)i,vi,ti) To (1) akThe regression coefficient of each influencing factor is estimated by using a least square method:
Figure 174621DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE003
spatio-temporal kernel function of air quality prediction model
Figure 303114DEST_PATH_IMAGE004
Calculating according to equation (3):
Figure DEST_PATH_IMAGE005
where j represents an air mass sample point, i represents a regression point for predicting the air mass sample point j, KsIs a spatial kernel function, dsijIs the spatial separation between i and j, bstRepresenting the spatial bandwidth, K, of the air quality sample point at time tTIs a time kernel function, dtijIs the time interval between i and j, bTIs a time bandwidth, weight kernel function
Figure 793001DEST_PATH_IMAGE004
The weight matrix is obtained by representing in a matrix form, and the specific steps are as follows:
Figure 261023DEST_PATH_IMAGE006
the weights in the matrix consist of q +1 pairs of diagonal elements, q being the number of time segments,
Figure DEST_PATH_IMAGE007
,…,
Figure 483057DEST_PATH_IMAGE008
is the n-th time period ttThe weight of a data point is determined,
Figure DEST_PATH_IMAGE009
,…,
Figure 597381DEST_PATH_IMAGE010
is the n-th time period t-1t-1The weight of the data point, and so on, the nth time slot t-qt-qThe weight of a data point is expressed as
Figure 359801DEST_PATH_IMAGE011
,…,
Figure DEST_PATH_IMAGE012
An adaptive calculation step S130 of the optimal spatio-temporal bandwidth:
(1) setting a time bandwidth to 1 time unit;
(2) constructing a spatio-temporal kernel function for a time period t, when d2 tijIs zero and the spatio-temporal kernel function is
Figure 744646DEST_PATH_IMAGE013
Then, a weighting weight is calculated by equation (5), and the weighting weight is calculated according to equation(1) Establishing an air quality prediction model based on geographical weighted regression, and obtaining the optimal space bandwidth b by minimizing a CV (Cross validation) function* st
Figure DEST_PATH_IMAGE014
Wherein y isiPM2.5 concentration, a dependent variable, representing air mass sample point i, ŷ-iIs yiDoes not include i point in the calibration process, and obtains the optimal space bandwidth b* stThen, obtaining a first group of diagonal elements of the weight matrix of the formula (4) through the formula (5);
(3) to calculate the second set of diagonal elements on the diagonal of the weight matrix, the air quality data points for time period (t-1) are applied to the model, so GWR performs air quality prediction model correction on the regression points for time period t by using the air quality data points from time periods t and t-1, in step (2), the air quality data points for time period t have been weighted corresponding to the first set of diagonal elements of the diagonal matrix, b* stThe air quality data points of the time segment (t-1) are weighted by using a space-time kernel function defined by the formula (3) to obtain the optimal space bandwidth b* s(t-1)When the time period is t-1, d2 tij=1, the spatio-temporal kernel function of the air quality data points for the application time period (t-1) becomes
Figure 872002DEST_PATH_IMAGE015
As with step (2), the optimal spatial bandwidth b is obtained by minimizing the CV function* s(t-1)Will optimize the space bandwidth b* s(t-1)The weight obtained when t = t-1 is input to equation (3) is the second group n of the weight matrix of equation (4)t-1A diagonal element;
(4) repeating the step (3), and introducing the air quality data points from time periods t-2, t-3, t-4, …, t-q successively to obtain the optimal spatial bandwidth of the time periods and the corresponding diagonal element sets in the weight matrix;
(5) after obtaining the weight matrix of the air quality data points in the time period t to t-q, performing geographical weighted regression by using the weight given in the weight matrix of the formula (4), correcting the air quality prediction model of the air quality data points in the time period t, and obtaining a CV value through GWR correction, wherein the CV value corresponds to the time bandwidth of a time unit assumed in the first step and is called CV valuebT=1;
(6) Repeating the process described in the steps (2) to (5) according to the time interval times in the model for other q-1 possible time bandwidths, namely the time bandwidth bTEqual to 2, 3, 4, …, or q time units, for each time bandwidth used for each calibration model, a CV value is obtained, CV respectivelybT=1,CVbT=2,CVbT=3,…,CVbT=q;
(7) Selecting CVbT=1,CVbT=2,CVbT=3,…,CVbT=qThe time bandwidth corresponding to the minimum CV value in (a) is the optimal time bandwidth b* TThe corresponding optimal space bandwidth set is as follows:
Figure DEST_PATH_IMAGE016
air quality local estimation step S140:
the air quality at the time period t is locally estimated by using equation (2), that is, a local weighted least square method is used to perform point-by-point parameter estimation on a narrow area only containing one sample point, the range of the air quality sample points participating in parameter estimation is limited by bandwidth, the closer to the regression point, the higher the weight of the air quality sample point, and the farther away the weight of the air quality sample point are, the less the weight of the air quality sample point is, and the diagonal element in the weight diagonal matrix W used in equation (4) is the space-time optimal bandwidth set of step (7) of the adaptive calculation step S130 of the optimal space-time bandwidth.
Optionally, the preprocessing the related data further includes:
specifically, for the point data of the PM2.5 concentration and the meteorological data, a kriging interpolation method is adopted to obtain a grid numerical value of the whole region of the region, and then a data value of a grid central point is extracted, so as to obtain the grid numerical value, and for the grid data of AOD data, population and GDP, the data value of the grid central point is extracted.
Optionally, in the step S130 of adaptively calculating the optimal spatiotemporal bandwidth, the time unit is year, month or day.
The invention further discloses a storage medium for storing computer executable instructions, which is characterized in that:
the computer executable instructions, when executed by a processor, perform the air quality prediction method based on spatio-temporal bandwidth adaptive geo-weighted regression described above.
The method predicts the air quality of the place according to the characteristics of the Aerosol Optical Depth (AOD), population, GDP, temperature, wind speed, relative humidity, atmospheric pressure and the like of the specific place, obtains higher prediction precision compared with other methods, can more accurately disclose the time-space change rule of the air quality, and provides data and scientific support for the establishment of relevant policies such as environmental protection policies, social and economic development policies, urban planning policies and the like in the future.
Drawings
FIG. 1 is a flow chart of an air quality prediction method based on spatiotemporal bandwidth adaptive geo-weighted regression according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
The invention adopts PM2.5 concentration to reflect air quality. PM2.5 is particulate matter with aerodynamic equivalent diameter less than or equal to 2.5 microns in ambient air, and PM2.5 can be suspended in air for a long time, and the higher the concentration of the particulate matter, the more serious the air quality pollution is.
Referring to fig. 1, there is shown a flow chart of an air quality prediction method based on spatio-temporal bandwidth adaptive geo-weighted regression according to the present invention, the method comprising the steps of:
data acquisition and preprocessing step S110:
(1) setting the data type: obtaining PM2.5 concentration data, meteorological data, statistical data and AOD data in a certain time range of a certain area, wherein the PM2.5 concentration is obtained from an air quality monitoring station as point data, the meteorological data comprises PM2.5 concentration and temperature, wind speed, relative humidity and atmospheric pressure, the meteorological data is obtained from the air quality monitoring station and an atmospheric pressure detection station as point data, the statistical data comprises population and GDP data, the statistical data is obtained through statistical data spatialization and is raster data, and the AOD (Aerosol Optical Depth) data is selected from remote sensing images and is raster data;
(2) acquiring and preprocessing related data: establishing a 3km multiplied by 3km grid covering the whole area of the area, wherein the central point of the grid is an air quality sampling point, so as to obtain PM2.5 concentration data, meteorological data, statistical data and AOD data;
specifically, for the point data of the PM2.5 concentration and the meteorological data, a kriging interpolation method is adopted to obtain a grid numerical value of the whole region of the region, and then a data value of a grid central point is extracted, so as to obtain the grid numerical value, and for the grid data of AOD data, population and GDP, the data value of the grid central point is extracted.
Therefore, the data type to be utilized next is obtained and a specific value is acquired through the step.
It should be noted that the air quality monitoring station is used for measuring the PM2.5 concentration, the air quality sampling point is a place for air quality sampling, and the air quality sample point is the sampled air quality, specifically, the air quality sample point is obtained by sampling at the air quality sampling point.
An air quality prediction model based on geographical weighted regression step S120:
an air quality prediction model is constructed according to the formula (1):
Figure 37404DEST_PATH_IMAGE017
wherein, yiIs an air mass sample point (u)i,vi,ti) PM2.5 concentration of (b), whereinuWhich represents the longitude of the vehicle,vthe latitude is represented by the number of lines,trepresents time; x is the number ofikIs an air mass sample point (u)i,vi,ti) To get it atkThe values of the factors including AOD, population, GDP, temperature, wind speed, relative humidity and atmospheric pressure, εiIs a sample pointiP represents the number of influencing factors, betaikIs an air mass sample point (u)i,vi,ti) To (1) akThe regression coefficient of each influencing factor is estimated by using a least square method:
Figure DEST_PATH_IMAGE018
wherein the content of the first and second substances,
Figure 806777DEST_PATH_IMAGE019
spatio-temporal kernel function of air quality prediction model
Figure 842866DEST_PATH_IMAGE004
Calculating according to equation (3):
Figure 108500DEST_PATH_IMAGE005
where j represents an air mass sample point, i represents a regression point for predicting the air mass sample point j, KsIs a spatial kernel function, dsijIs the spatial separation between i and j, bstRepresenting the spatial bandwidth, K, of the air quality sample point at time tTIs a time kernel function, dtijIs the time interval between i and j, bTIs time bandwidth, weight kernelNumber of
Figure 292356DEST_PATH_IMAGE004
The weight matrix is obtained by representing in a matrix form, and the specific steps are as follows:
Figure 396579DEST_PATH_IMAGE006
the weights in the matrix consist of q +1 pairs of diagonal elements, q being the number of time segments,
Figure 693699DEST_PATH_IMAGE007
,…,
Figure 287491DEST_PATH_IMAGE008
is the n-th time period ttThe weight of a data point is determined,
Figure 302852DEST_PATH_IMAGE009
,…,
Figure 210765DEST_PATH_IMAGE010
is the n-th time period t-1t-1The weight of the data point, and so on, the nth time slot t-qt-qThe weight of a data point is expressed as
Figure 752605DEST_PATH_IMAGE011
,…,
Figure 127085DEST_PATH_IMAGE012
An adaptive calculation step S130 of the optimal spatio-temporal bandwidth:
(1) setting a time bandwidth to 1 time unit (e.g., year, month, day);
(2) constructing a spatio-temporal kernel function for a time period t, when d2 tijIs zero and the spatio-temporal kernel function is
Figure 19955DEST_PATH_IMAGE013
Then, calculating the weighted weight by using the formula (5), establishing an air quality prediction model based on geographical weighted regression according to the formula (1), and obtaining the optimal space bandwidth b by minimizing a CV (Cross validation) function* st
Figure 636619DEST_PATH_IMAGE014
Wherein y isiPM2.5 concentration, a dependent variable, representing air mass sample point i, ŷ-iIs yiDoes not include i point in the calibration process, and obtains the optimal space bandwidth b* stThen, obtaining a first group of diagonal elements of the weight matrix of the formula (4) through the formula (5);
(3) to calculate the second set of diagonal elements on the diagonal of the weight matrix, the air quality data points for time period (t-1) are applied to the model, so GWR performs air quality prediction model correction on the regression points for time period t by using the air quality data points from time periods t and t-1, in step (2), the air quality data points for time period t have been weighted corresponding to the first set of diagonal elements of the diagonal matrix, b* stThe air quality data points of the time segment (t-1) are weighted by using a space-time kernel function defined by the formula (3) to obtain the optimal space bandwidth b* s(t-1)When the time period is t-1, d2 tij=1, the spatio-temporal kernel function of the air quality data points for the application time period (t-1) becomes
Figure 236227DEST_PATH_IMAGE015
As with step (2), the optimal spatial bandwidth b is obtained by minimizing the CV function* s(t-1)Will optimize the space bandwidth b* s(t-1)The weight obtained when t = t-1 is input to equation (3) is the second group n of the weight matrix of equation (4)t-1A diagonal element;
(4) repeating the step (3), and introducing the air quality data points from time periods t-2, t-3, t-4, …, t-q successively to obtain the optimal spatial bandwidth of the time periods and the corresponding diagonal element sets in the weight matrix;
(5) after obtaining the weight matrix of the air quality data points in the time period t to t-q, performing geographical weighted regression by using the weight given in the weight matrix of the formula (4), correcting the air quality prediction model of the air quality data points in the time period t, and obtaining a CV value through GWR correction, wherein the CV value corresponds to the time bandwidth of a time unit assumed in the first step and is called CV valuebT=1;
(6) Repeating the process described in the steps (2) to (5) according to the time interval times in the model for other q-1 possible time bandwidths, namely the time bandwidth bTEqual to 2, 3, 4, …, or q time units, for each time bandwidth used for each calibration model, a CV value is obtained, CV respectivelybT=1,CVbT=2,CVbT=3,…,CVbT=q;
(7) Selecting CVbT=1,CVbT=2,CVbT=3,…,CVbT=qThe time bandwidth corresponding to the minimum CV value in (a) is the optimal time bandwidth b* TThe corresponding optimal space bandwidth set is as follows:
Figure 906243DEST_PATH_IMAGE016
therefore, the optimal time bandwidth corresponds to the optimal space bandwidth set in the step, so that the optimal space-time bandwidth is obtained.
Air quality local estimation step S140:
the air quality at the time period t is locally estimated by using equation (2), that is, a local weighted least square method is used to perform point-by-point parameter estimation on a narrow area only containing one sample point, the range of the air quality sample points participating in parameter estimation is limited by bandwidth, the closer to the regression point, the higher the weight of the air quality sample point, and the farther away the weight of the air quality sample point are, the less the weight of the air quality sample point is, and the diagonal element in the weight diagonal matrix W used in equation (4) is the space-time optimal bandwidth set of step (7) of the adaptive calculation step S130 of the optimal space-time bandwidth.
Examples
The concentration of the PM2.5 in kyojin Ji area is estimated, and the concentration of the PM2.5 is taken as a dependent variable, meteorological data such as temperature, wind speed, relative humidity and atmospheric pressure, AOD data, socioeconomic data such as population density and GDP are taken as characteristic variables, and the characteristic variables are shown in table 1. The geographical range of the study area was set to 35.5 ° N-43 ° N, 113 ° E-120 ° E, the time range was set to 1/2018 to 31/2018/12/3 km, and the spatial resolution was set to 3km, and PM2.5 per month was estimated.
Table 1 description of variables
Variable names (symbol) Unit of
PM2.5 concentration PM ug/m2
Temperature of TEM
Wind speed WS m/s
Relative humidity RH %
Atmospheric pressure PRE Pa
AOD AOD -
GDP GDP Hundred million yuan
Population density POP Thousands of people per square kilometer
In order to keep the spatio-temporal resolution of the dependent and independent variables consistent, all data needs to be pre-processed before the PM2.5 estimation can be performed. In terms of time resolution, the time resolution of PM2.5 concentration, temperature, wind speed, relative humidity, and AOD was set to be monthly. In the aspect of spatial resolution, a 3km multiplied by 3km grid of the whole area is created, a kriging interpolation method is adopted for the PM2.5 concentration and the punctiform data of meteorological data, and then the daily average value and the monthly average value of the center point of the grid are extracted through resampling. For AOD data, the data are processed in batches through programs of C #, ArcGIS Engine and Visual Studio 2013, and then the daily average value and the monthly average value of the center point of the grid are obtained through resampling. And assigning statistical data to spatial data by place name association and attribute assignment aiming at the monthly average population and GDP, and extracting the monthly average statistical data value of the grid center point by geographic calculation.
According to the determined dependent variable and the characteristic variable, constructing a model function expression:
Figure DEST_PATH_IMAGE020
wherein PMiNamely PM2.5 estimated according to the model; TEM (transmission electron microscope)i、WSi、RHi、PREi、AODi、GDPi、POPiFor values of characteristic variables, betaiAnd a regression coefficient representing each characteristic variable determined by the spatio-temporal kernel function. To investigate the effect of different models on the experimental results, the following table shows the goodness of fit R2The estimation results of a multivariate linear regression model, a geographic weighted regression model, a space-time geographic weighted regression model (parameters are space-time factors) and an adaptive geographic weighted regression model (parameters are time and space factors) are counted by three indexes of Mean Squared Error (MSE) and Residual Sum of Squares (RSS).
TABLE 2 statistics of various regression models
Figure DEST_PATH_IMAGE022
The result shows that the fitting effect of the self-adaptive space-time geographic weighted regression is superior to that of a multiple linear regression model, a geographic weighted regression model and a space-time geographic weighted regression model, and the mean square error and the residual square sum are greatly improved. Wherein R is2From 0.478 to 0.821, the MSE was reduced from 0.156 to 0.009, and the RSS was reduced from 512.647 to 86.963. The self-adaptive geographical weighted regression model obtains the optimal fitting effect.
The spatio-temporal bandwidth of the spatio-temporal kernel is adaptive, with values of the spatio-temporal bandwidth between 0-1, and serves to limit the proportion of air quality data points in the local model calibration. The spatiotemporal weights are computed from gaussian kernel functions. The data set in the case includes PM2.5 data from 1 month to 12 months per month in 2018. Thus, if the model of the regression point for which spatio-temporal geoweighted regression was used for calibration is 12 months 2018, then there are 13 parameters for the CV function of equation (6):
Figure 365038DEST_PATH_IMAGE023
they are respectively pairedThere should be 12 spatial bandwidths and 1 temporal bandwidth. However, in the computation process of the CV function, there are a large number of combinations of 13 spatial bandwidths, which makes the computation of the function very complicated. For example, if the space-time bandwidth adaptive method is adopted, the value of the space bandwidth is between 0 and 1, and the space bandwidth is taken every 0.05, the value probability of the space bandwidth is composed of a series of numbers [0.05, 0.1, 0.15, 0.2, 0.25, …, 0.9, 0.95, 1 [ ]]. There are 20 bandwidth choices in 1 month, and there are 12 months in total, so when calculating the CV value, there are 20 in the form of the combination of the space bandwidth12And (4) carrying out the following steps. Furthermore, there are 12 possible choices of time bandwidth, and then a total of 12 × 20 should be calculated in calculating the CV value12Second, the model must be simplified.
Data with air quality data points that are further from the regression point in the time dimension are purged since the data further from the regression point has less effect on the regression point. In this case, the selection of air quality data points in the time dimension is reduced from 12 months to 4 months, but when regression points are in 3 months, 2 months, or 1 month, only 3, 2, or 1 month data can be used for estimation when local estimation is performed. The feasibility of this method is based on the assumption that air quality data points more than 4 months away from the regression year have little or negligible effect on regression point parameter estimation. Therefore, the total number of calculated CV values is reduced to 4 x 204And (4) possibility. However, even so, the calculation amount to minimize the CV value is still very large. In order to solve this problem, the selection method proposed in the "selection step of the optimal temporal bandwidth and the optimal spatial bandwidth" is adopted, that is, the selection of the bandwidth is optimized by a method of deriving the optimal bandwidth one by one. Firstly, a time bandwidth b is givenTThen, the air quality data point in the same year as the regression point is substituted into the model, and the optimal space bandwidth b of the year is calculated* stSetting the value as a constant, then putting the air quality data point of the time period T-1 into a model, calculating the weight of the air quality data point of the time period T-1 by using an equation (4), and finally obtaining the optimal space bandwidth b of the time period* s(t-1). The process is repeated, inAnd the optimal space bandwidth of the time period is obtained by gradually leading in the air quality data points of the corresponding time period.
Then the selection of the time bandwidth is carried out, the time bandwidth can be selected from 4 different values of 1, 2, 3 and 4, and the selection of the space bandwidth is repeated based on each time bandwidth so as to obtain the optimal space bandwidth set obtained by each time bandwidth, which can be expressed as
Figure DEST_PATH_IMAGE024
Thus, a CV value can be calculated for each temporal bandwidth and their corresponding optimal spatial bandwidth. The time bandwidth derived by minimizing the CV value and the corresponding space bandwidth set are the optimal space-time bandwidth.
The invention further discloses a storage medium for storing computer executable instructions, which is characterized in that:
the computer executable instructions, when executed by a processor, perform the air quality prediction method based on spatio-temporal bandwidth adaptive geo-weighted regression described above.
In summary, the invention has the following advantages:
(1) the invention provides a method for predicting air quality by using meteorological data and social and economic data, and the characteristics of space-time non-stationarity are considered in modeling, so that the prediction precision is improved.
(2) The invention provides an algorithm for automatically selecting a GTWR (GTWR) model bandwidth, which solves the problems of low efficiency and low precision caused by the fact that the bandwidth selection at present depends on experience and manual trial.
It will be apparent to those skilled in the art that the various elements or steps of the invention described above may be implemented using a general purpose computing device, they may be centralized on a single computing device, or alternatively, they may be implemented using program code that is executable by a computing device, such that they may be stored in a memory device and executed by a computing device, or they may be separately fabricated into various integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
While the invention has been described in further detail with reference to specific preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (4)

1. An air quality prediction method based on space-time bandwidth adaptive geographical weighted regression comprises the following steps:
data acquisition and preprocessing step S110:
(1) setting the data type: obtaining PM2.5 concentration data, meteorological data, statistical data and AOD data in a certain time range of a certain area, wherein the PM2.5 concentration is obtained from an air quality monitoring station as point data, the meteorological data comprises temperature, wind speed, relative humidity and atmospheric pressure, the meteorological data is obtained from a meteorological detection station as the point data, the statistical data comprises population and GDP data, the population and GDP data are obtained through spatialization, the statistical data is raster data, and the AOD (Aerosol Optical depth) data is selected from remote sensing images and is raster data;
(2) acquiring and preprocessing related data: establishing a 3km multiplied by 3km grid covering the whole area of the area, wherein the central point of the grid is an air quality sampling point, so as to obtain PM2.5 concentration data, meteorological data, statistical data and AOD data;
an air quality prediction model based on geographical weighted regression step S120:
an air quality prediction model based on geographical weighted regression is constructed according to the formula (1):
Figure 240364DEST_PATH_IMAGE001
wherein, yiIs an air mass sample point (u)i,vi,ti) PM2.5 concentration of (b), whereinuWhich represents the longitude of the vehicle,vthe latitude is represented by the number of lines,trepresents time; x is the number ofikIs an air mass sample point (u)i,vi,ti) To get it atkThe values of the factors including AOD, population, GDP, temperature, wind speed, relative humidity and atmospheric pressure, εiIs a sample pointiP represents the number of influencing factors, betaikIs an air mass sample point (u)i,vi,ti) To (1) akThe regression coefficient of each influencing factor is estimated by using a least square method:
Figure 694479DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 269817DEST_PATH_IMAGE003
spatio-temporal kernel function of air quality prediction model
Figure 809382DEST_PATH_IMAGE004
Calculating according to equation (3):
Figure 66051DEST_PATH_IMAGE005
where j represents an air mass sample point, i represents a regression point for predicting the air mass sample point j, KsIs a spatial kernel function, dsijIs the spatial separation between i and j, bstRepresenting the spatial bandwidth, K, of the air quality sample point at time tTIs a time kernel function, dtijIs the time interval between i and j, bTIs a time bandwidth, weight kernel function
Figure 374673DEST_PATH_IMAGE004
The weight matrix is obtained by representing in a matrix form, and the specific steps are as follows:
Figure 120912DEST_PATH_IMAGE006
the weights in the matrix consist of q +1 pairs of diagonal elements, q being the number of time segments,
Figure 882195DEST_PATH_IMAGE007
,…,
Figure 942554DEST_PATH_IMAGE008
is the n-th time period ttThe weight of a data point is determined,
Figure 902420DEST_PATH_IMAGE009
,…,
Figure 22823DEST_PATH_IMAGE010
is the n-th time period t-1t-1The weight of the data point, and so on, the nth time slot t-qt-qThe weight of a data point is expressed as
Figure 35516DEST_PATH_IMAGE011
,…,
Figure 633988DEST_PATH_IMAGE012
An adaptive calculation step S130 of the optimal spatio-temporal bandwidth:
(1) setting a time bandwidth to 1 time unit;
(2) constructing a spatio-temporal kernel function for a time period t, when d2 tijIs zero and the spatio-temporal kernel function is
Figure 448360DEST_PATH_IMAGE013
Then, weighted weights are calculated by using the formula (5), and geographical weighted regression is established according to the formula (1)An air quality prediction model obtains the optimal space bandwidth b by minimizing a CV (Cross validation) function* st
Figure 5243DEST_PATH_IMAGE014
Wherein y isiPM2.5 concentration, a dependent variable, representing air mass sample point i, ŷ-iIs yiDoes not include i point in the calibration process, and obtains the optimal space bandwidth b* stThen, obtaining a first group of diagonal elements of the weight matrix of the formula (4) through the formula (5);
(3) to calculate the second set of diagonal elements on the diagonal of the weight matrix, the air quality data points for time period (t-1) are applied to the model, so GWR performs air quality prediction model correction on the regression points for time period t by using the air quality data points from time periods t and t-1, in step (2), the air quality data points for time period t have been weighted corresponding to the first set of diagonal elements of the diagonal matrix, b* stThe air quality data points of the time segment (t-1) are weighted by using a space-time kernel function defined by the formula (3) to obtain the optimal space bandwidth b* s(t-1)When the time period is t-1, d2 tij=1, the spatio-temporal kernel function of the air quality data points for the application time period (t-1) becomes
Figure 475539DEST_PATH_IMAGE015
As with step (2), the optimal spatial bandwidth b is obtained by minimizing the CV function* s(t-1)Will optimize the space bandwidth b* s(t-1)The weight obtained when t = t-1 is input to equation (3) is the second group n of the weight matrix of equation (4)t-1A diagonal element;
(4) repeating the step (3), and introducing the air quality data points from time periods t-2, t-3, t-4, …, t-q successively to obtain the optimal spatial bandwidth of the time periods and the corresponding diagonal element sets in the weight matrix;
(5) after obtaining the weight matrix of the air quality data points in the time period t to t-q, performing geographical weighted regression by using the weight given in the weight matrix of the formula (4), correcting the air quality prediction model of the air quality data points in the time period t, and obtaining a CV value through GWR correction, wherein the CV value corresponds to the time bandwidth of a time unit assumed in the first step and is called CV valuebT=1;
(6) Repeating the process described in the steps (2) to (5) according to the time interval times in the model for other q-1 possible time bandwidths, namely the time bandwidth bTEqual to 2, 3, 4, …, or q time units, for each time bandwidth used for each calibration model, a CV value is obtained, CV respectivelybT=1,CVbT=2,CVbT=3,…,CVbT=q;
(7) Selecting CVbT=1,CVbT=2,CVbT=3,…,CVbT=qThe time bandwidth corresponding to the minimum CV value in (a) is the optimal time bandwidth b* TThe corresponding optimal space bandwidth set is as follows:
Figure 143281DEST_PATH_IMAGE016
air quality local estimation step S140:
the air quality at the time period t is locally estimated by using equation (2), that is, a local weighted least square method is used to perform point-by-point parameter estimation on a narrow area only containing one sample point, the range of the air quality sample points participating in parameter estimation is limited by bandwidth, the closer to the regression point, the higher the weight of the air quality sample point, and the farther away the weight of the air quality sample point are, the less the weight of the air quality sample point is, and the diagonal element in the weight diagonal matrix W used in equation (4) is the space-time optimal bandwidth set of step (7) of the adaptive calculation step S130 of the optimal space-time bandwidth.
2. The air quality prediction method of claim 1, wherein:
preprocessing the relevant data further comprises:
specifically, for the point data of the PM2.5 concentration and the meteorological data, a kriging interpolation method is adopted to obtain a grid numerical value of the whole region of the region, and then a data value of a grid central point is extracted, so as to obtain the grid numerical value, and for the grid data of AOD data, population and GDP, the data value of the grid central point is extracted.
3. The air quality prediction method of claim 1, wherein:
in the adaptive calculation step S130 of the optimal spatiotemporal bandwidth, the time unit is year, month or day.
4. A storage medium for storing computer-executable instructions, characterized in that:
the computer executable instructions, when executed by a processor, perform the air quality prediction method based on spatio-temporal bandwidth adaptive geo-weighted regression of any of claims 1-3.
CN202110477876.5A 2021-04-30 2021-04-30 Air quality prediction method based on space-time bandwidth self-adaptive geographical weighted regression Active CN112990609B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110477876.5A CN112990609B (en) 2021-04-30 2021-04-30 Air quality prediction method based on space-time bandwidth self-adaptive geographical weighted regression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110477876.5A CN112990609B (en) 2021-04-30 2021-04-30 Air quality prediction method based on space-time bandwidth self-adaptive geographical weighted regression

Publications (2)

Publication Number Publication Date
CN112990609A true CN112990609A (en) 2021-06-18
CN112990609B CN112990609B (en) 2021-09-14

Family

ID=76336682

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110477876.5A Active CN112990609B (en) 2021-04-30 2021-04-30 Air quality prediction method based on space-time bandwidth self-adaptive geographical weighted regression

Country Status (1)

Country Link
CN (1) CN112990609B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113901348A (en) * 2021-11-10 2022-01-07 江苏省血吸虫病防治研究所 Oncomelania snail distribution influence factor identification and prediction method based on mathematical model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103392A (en) * 2017-05-24 2017-08-29 北京航空航天大学 A kind of identification of bus passenger flow influence factor and Forecasting Methodology based on space-time Geographical Weighted Regression
CN107729293A (en) * 2017-09-27 2018-02-23 中南大学 A kind of geographical space method for detecting abnormal based on Multivariate adaptive regression splines
WO2019082009A1 (en) * 2017-10-25 2019-05-02 International Business Machines Corporation Regression for metric dataset
CN110046771A (en) * 2019-04-25 2019-07-23 河南工业大学 A kind of PM2.5 concentration prediction method and apparatus
CN111210052A (en) * 2019-12-16 2020-05-29 天津职业技术师范大学(中国职业培训指导教师进修中心) Traffic accident prediction method based on mixed geography weighted regression
CN111507514A (en) * 2020-04-13 2020-08-07 中国矿业大学(北京) Atmospheric aerosol data prediction method
CN111896680A (en) * 2020-07-08 2020-11-06 天津师范大学 Greenhouse gas emission analysis method and system based on satellite remote sensing data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103392A (en) * 2017-05-24 2017-08-29 北京航空航天大学 A kind of identification of bus passenger flow influence factor and Forecasting Methodology based on space-time Geographical Weighted Regression
CN107729293A (en) * 2017-09-27 2018-02-23 中南大学 A kind of geographical space method for detecting abnormal based on Multivariate adaptive regression splines
WO2019082009A1 (en) * 2017-10-25 2019-05-02 International Business Machines Corporation Regression for metric dataset
CN110046771A (en) * 2019-04-25 2019-07-23 河南工业大学 A kind of PM2.5 concentration prediction method and apparatus
CN111210052A (en) * 2019-12-16 2020-05-29 天津职业技术师范大学(中国职业培训指导教师进修中心) Traffic accident prediction method based on mixed geography weighted regression
CN111507514A (en) * 2020-04-13 2020-08-07 中国矿业大学(北京) Atmospheric aerosol data prediction method
CN111896680A (en) * 2020-07-08 2020-11-06 天津师范大学 Greenhouse gas emission analysis method and system based on satellite remote sensing data

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
BO HUANG等: ""Geographically and temporally weighted regression for modeling spatio-temporal variation in house prices"", 《INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE》 *
JIPING LIU等: ""A geographically temporal weighted regression approach with travel distance for house price estimation"", 《ENTROPY》 *
JIPING LIU等: ""A mixed geographically and temporally weighted regression: Exploring spatial-temporal variations from global and local perspectives"", 《ENTROPY》 *
WEI ZHANG等: ""Regional Precipitation Model Based on Geographically and Temporally Weighted Regression Kriging"", 《REMOTE SENSING》 *
张俊杰等: ""时空异质性探测的上海市房价演进分析"", 《测绘科学》 *
张小璐等: ""顾及平稳特征的PM2.5浓度时空趋势拟合研究"", 《测绘科学》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113901348A (en) * 2021-11-10 2022-01-07 江苏省血吸虫病防治研究所 Oncomelania snail distribution influence factor identification and prediction method based on mathematical model

Also Published As

Publication number Publication date
CN112990609B (en) 2021-09-14

Similar Documents

Publication Publication Date Title
CN111932036B (en) Fine spatio-temporal scale dynamic population prediction method and system based on position big data
Cameletti et al. Spatio-temporal modeling of particulate matter concentration through the SPDE approach
Salvi et al. High‐resolution multisite daily rainfall projections in India with statistical downscaling for climate change impacts assessment
Singh et al. Data assimilation for constructing long-term gridded daily rainfall time series over Southeast Asia
CN111539453B (en) Global ionized layer electron total content prediction method based on deep cycle neural network
Biard et al. Automated detection of weather fronts using a deep learning neural network
CN114254802B (en) Prediction method for vegetation coverage space-time change under climate change drive
CN116151483B (en) Regional rainfall landslide probabilistic prediction method and prediction terminal
CN114154702A (en) Pollutant concentration prediction method and device based on multi-granularity graph space-time neural network
Gascón et al. Statistical postprocessing of dual‐resolution ensemble precipitation forecasts across Europe
CN112990609B (en) Air quality prediction method based on space-time bandwidth self-adaptive geographical weighted regression
CN113011455A (en) Air quality prediction SVM model construction method
CN113610286A (en) PM accounting for spatio-temporal correlations and meteorological factors2.5Concentration prediction method and device
Yang et al. Probabilistic post-processing of gridded atmospheric variables and its application to site adaptation of shortwave solar radiation
Vani et al. Modelling urban expansion of a south-east Asian city, India: comparison between SLEUTH and a hybrid CA model
CN115544706A (en) Wavelet and XGboost model integrated atmospheric fine particle concentration estimation method
Morrison et al. Spatial scale affects novel and disappeared climate change projections in Alaska
KR102343374B1 (en) Method for predicting cosmetics sales using artificial intelligence based on weather and climate data, and server using the same
CN111507514A (en) Atmospheric aerosol data prediction method
CN108957594B (en) Method and system for forecasting and correcting total cloud amount of satellite orbit
CN115239027B (en) Method and device for forecasting air quality check set
Irwin Assessment of the regionalization of precipitation in two Canadian climate regions: A fuzzy clustering approach
Irandegani et al. Investigating temporal and spatial effects of urban planning variables on crime rate: A Gwr and Ols based approach
CN111859304B (en) Satellite aerosol missing prediction method and system based on space-time autocorrelation
CN113935530A (en) Network taxi appointment demand space-time heat prediction method based on deep aggregation neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant