CN114936957A - Urban PM25 concentration distribution simulation and scene analysis model based on mobile monitoring data - Google Patents

Urban PM25 concentration distribution simulation and scene analysis model based on mobile monitoring data Download PDF

Info

Publication number
CN114936957A
CN114936957A CN202210561494.5A CN202210561494A CN114936957A CN 114936957 A CN114936957 A CN 114936957A CN 202210561494 A CN202210561494 A CN 202210561494A CN 114936957 A CN114936957 A CN 114936957A
Authority
CN
China
Prior art keywords
concentration
scene
simulation
factors
urban
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210561494.5A
Other languages
Chinese (zh)
Inventor
李代超
谢晓苇
吴升
赵志远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202210561494.5A priority Critical patent/CN114936957A/en
Publication of CN114936957A publication Critical patent/CN114936957A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A50/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather
    • Y02A50/20Air quality improvement or preservation, e.g. vehicle emission control or emission reduction by using catalytic converters

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Tourism & Hospitality (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Marketing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a mobile monitoring data-based urban PM25 concentration distribution simulation and scene analysis model, which comprises a step S1, wherein a PM2.5 concentration training data set with consistent space-time is constructed based on PM2.5 concentration mobile monitoring data space-time correction of a fixed monitoring station; step S2, analyzing the correlation between pollution source related factors and PM2.5 concentration, and constructing a PM2.5 concentration spatial differentiation simulation model based on a geographical weighted regression method; step S3, based on a gradient lifting tree method, combining pollution diffusion related factors and scene factors, further fitting the fitting residual error of the PM2.5 concentration space difference simulation model, and constructing a PM2.5 concentration simulation and scene analysis model; and step S4, analyzing the response characteristics of the PM2.5 concentration to the scene factors by combining a partial dependency graph method. By applying the technical scheme, the spatial heterogeneity of PM2.5 concentration and the nonlinear influence of meteorological and urban scene factors on PM2.5 concentration can be considered, and the spatial resolution of PM2.5 concentration distribution simulation in the city is improved.

Description

Urban PM25 concentration distribution simulation and scene analysis model based on mobile monitoring data
Technical Field
The invention relates to the technical field of spatial information, in particular to an urban PM2.5 concentration distribution simulation and scene analysis model based on mobile monitoring data.
Background
At present, PM2.5 spatial distribution simulation is mainly developed based on satellite remote sensing image data and ground monitoring station data. However, the satellite remote sensing image data is affected by factors such as long revisit period, data loss caused by cloud and rain blocking and the like, the ground monitoring station data is sparsely distributed, and air pollution in different local areas is greatly different, so that urban scale fine simulation is difficult to perform based on the two types of data.
The mobile monitoring is used as a flexible, accurate and high-precision spatial data acquisition method, can be deeply inserted into different urban scenes, provides a new technical means for monitoring the urban PM2.5 concentration, can realize the refined simulation of the urban PM2.5 concentration spatial distribution, and provides support for PM2.5 pollution control and urban planning in different scenes and prevention of PM2.5 pollution exposure risk of high-risk groups such as old people and children.
At present, research based on air pollution mobile monitoring data mainly aims at exploring relevant influence factors such as urban internal air pollution monitoring concentration, building environment and weather, or analyzing urban internal PM2.5 concentration distribution based on a simple interpolation method [1-2 ]. In the aspect of exploring the PM2.5 concentration influence, domestic and foreign scholars study the influence of meteorological factors, urban landscape patterns, land coverage types and other factors on the PM2.5 concentration, but lack the exploration of complex nonlinear influence of scene factors in different periods on the PM2.5 concentration.
In the aspect of PM2.5 concentration distribution simulation model construction, methods such as spatial interpolation [3], statistical regression [4], machine learning [5] and hybrid models are mainly adopted in the existing research. The spatial interpolation method only considers the spatial correlation of PM2.5 concentration distribution, and the model is simpler but has lower precision. The statistical Regression model comprises a Land Use Regression model (LUR), a Geographical Weighted Regression model (GWR), a Geographical space-time Weighted Regression model (GTWR) and the like, and can be used for fusing the influence of various influencing factors on the concentration of PM2.5, wherein the GWR model gives consideration to the spatial heterogeneity of PM2.5 distribution, and the GTWR model further considers the correlation of PM2.5 concentration change in time, but the models can only be used for fitting the linear relation of the influencing factors on the concentration of PM 2.5. The machine learning model [6-8] can better fit the nonlinear relation between the influence factors and the PM2.5 concentration, and compared with an interpolation and statistical regression method, the simulation model precision is improved. Aiming at the problems that the nonlinear fitting effect of a statistical regression model is poor and a machine learning model ignores spatial correlation and heterogeneity of PM2.5 concentration, a scholars fuses two models to generate a mixed model [9-10], so that the model can give consideration to spatial heterogeneity of PM2.5 concentration distribution and nonlinear influence of influencing factors on PM2.5 concentration, the fitting precision of the model is improved, but the correlation model is mostly based on a pure quantitative mode and lacks of analysis of nonlinear relation between variables. A Gradient Boosting Decision Tree (GBDT) model in machine learning has better interpretability on the relation between independent variables and dependent variables by combining a partial dependency graph [11], and the nonlinear influence degree between the independent variables and the dependent variables can be shown.
Summarizing the above study, it can be seen that: (1) at present, research based on air pollution mobile monitoring data mainly aims at exploring relevant influence factors such as air pollution monitoring concentration in cities, building environment, weather and the like, and PM2.5 concentration distribution simulation relevant research is only based on a simpler interpolation method; (2) in the aspect of exploring the PM2.5 concentration influence, domestic and foreign scholars study the influence of meteorological factors, urban landscape patterns, land coverage types and other factors on the PM2.5 concentration, but lack the study on the complex nonlinear influence of different time period scene factors on the PM2.5 concentration; (3) in the aspect of building a PM2.5 concentration distribution simulation model, the existing research fails to consider the spatial heterogeneity of the PM2.5 concentration distribution and the nonlinear influence of the PM2.5 concentration and related factors and also consider the interpretability of the model.
Disclosure of Invention
In view of this, the present invention provides an urban PM2.5 concentration distribution simulation and scene analysis model based on mobile monitoring data, which achieves consideration of spatial heterogeneity of PM2.5 concentration and nonlinear influence of weather and urban scene factors on PM2.5 concentration, and improves spatial resolution of the urban PM2.5 concentration distribution simulation.
In order to achieve the purpose, the invention adopts the following technical scheme: the model for simulating urban PM2.5 concentration distribution and analyzing scenes based on mobile monitoring data comprises the following steps:
step S1, based on the PM2.5 concentration mobile monitoring data space-time correction of a fixed monitoring station, constructing a PM2.5 concentration training data set with consistent space-time;
step S2, analyzing the correlation between pollution source related factors and PM2.5 concentration, and constructing a PM2.5 concentration spatial differentiation simulation model based on a geographical weighted regression method;
step S3, based on a gradient lifting tree method, combining pollution diffusion related factors and scene factors, further fitting the fitting residual error of the PM2.5 concentration spatial difference simulation model, and constructing a PM2.5 concentration distribution simulation and scene analysis model;
and step S4, analyzing the response characteristics of the PM2.5 concentration to the scene factors by combining a partial dependency graph method.
In a preferred embodiment, step S1 specifically includes:
step S11, preprocessing the data of the mobile PM2.5 monitoring data; processing abnormal values and missing values in the moving PM2.5 concentration monitoring data;
step S12, researching area mesh division; setting the sizes of the grid units in the vertical direction and the horizontal direction, respectively starting from the left side boundary and the lower side boundary of the research region, carrying out grid division on the research region rightward and upward, and coding grids;
step S13, correcting the mobile monitoring data time; correcting the PM2.5 concentration of the mobile monitoring based on the PM2.5 concentration variation trend of the fixed monitoring station closest to the mobile monitoring position;
step S14, equalizing the concentration of PM2.5 in each grid in each time interval; and calculating the average value of the PM2.5 concentrations monitored in each grid in each time period, taking the average value as the PM2.5 concentration of the grid in the time period, and constructing a multi-time-period space-time consistent PM2.5 concentration training data set.
In a preferred embodiment, step S2 specifically includes:
step S21, preprocessing the pollution source related data; respectively constructing buffer areas with different widths such as 100, 200, 300, 500, 1000, 1500m and the like based on the central points of grids, reclassifying land utilization/coverage data, calculating the area ratio of various types of utilization/coverage types, the lengths of primary roads and secondary roads and the number of catering stores in different buffer areas, and constructing a pollution source related factor data set;
step S22, carrying out PM2.5 concentration distribution space autocorrelation test; checking whether the PM2.5 concentration distribution has spatial autocorrelation by using the global Moran index, and if the spatial autocorrelation exists, using a geographical weighted regression method;
step S23, screening relevant factors of pollution sources; screening factors in the pollution source related factor data set by adopting a step-by-step method, and selecting influence factors with high correlation with PM2.5 concentration; considering the actual significance of the variables, when the variables of the same type and different buffer zone widths are all obvious to the PM2.5 concentration, deleting the variables with lower correlation, and carrying out factor screening on the rest variables by adopting a step-by-step method again until the influence factors of the same type do not exist, so as to obtain the optimal pollution source correlation influence factor combination;
step S24, constructing a PM2.5 concentration spatial differentiation simulation model; and modeling the spatial autocorrelation of the PM2.5 concentration by using a geographical weighted regression method, fusing the screened optimal pollution source related influence factor combination, and constructing a PM2.5 concentration spatial differentiation simulation model.
In a preferred embodiment, step S3 specifically includes:
step S31, preprocessing pollution diffusion factors and urban scene factors; acquiring spatial distribution of wind speed, average temperature and humidity in a research area by adopting an empirical Krigin interpolation method based on meteorological station monitoring data, calculating a pollution source wind direction index of each grid based on a main city wind direction and the relative position of the grid and the nearest pollution source, calculating the area ratio of different scene types in each grid through superposition analysis, and constructing a pollution diffusion related factor and scene factor data set;
step S32, PM2.5 concentration simulation and scene analysis model construction; and (3) fusing pollution diffusion related factors and urban scene factors by adopting a gradient lifting tree method, further fitting residual errors of the PM2.5 concentration spatial differentiation simulation model, and constructing a PM2.5 concentration simulation and scene analysis model.
In a preferred embodiment, step S4 specifically includes: the influence of the urban scene on the PM2.5 concentration is subjected to nonlinear quantitative calculation; calculating the nonlinear influence of urban scenes on the PM2.5 concentration at different time periods based on the PM2.5 concentration simulation and the scene analysis model combined with a partial dependency graph; and (4) visually displaying the nonlinear influence of the urban scenes in different time periods on the PM2.5 concentration, and analyzing the result by combining actual conditions.
In a preferred embodiment, the specific row and column number calculation of step S12 refers to formula (1) to formula (3);
Figure BDA0003656454470000041
Figure BDA0003656454470000042
N=count lng *count lat #(3)
wherein maxlng and minng are the maximum and minimum longitude coordinates of the study area, maxlat and minlat are the maximum and minimum latitude coordinates of the study area, count lng 、count lat The total number of the row and column numbers, d is the size of the regular grid, and N is the total number of the grids;
step S13 is specifically to perform time consistency correction on the moving PM2.5 monitoring data by using the hourly PM2.5 concentration variation trend of the atmospheric environment fixed monitoring station, correct the data to the same time, and use the moving PM2.5 monitoring data correction method as shown in formula (4);
Figure BDA0003656454470000043
in the formula:
Figure BDA0003656454470000044
for mobile monitoring data t 3 PM2.5 concentration value corrected at the position i at the moment;
Figure BDA0003656454470000045
for mobile monitoring data t 1 Monitoring a PM2.5 concentration observed value at the position i at the moment;
Figure BDA0003656454470000046
atmospheric environment monitoring station t for position l 3 A PM2.5 concentration observed value at a moment;
Figure BDA0003656454470000047
atmospheric environment monitoring station t for position l 2 A PM2.5 concentration observed value at a moment; t is t 1 Monitoring time for moving PM 2.5; t is t 2 Is t 1 First hour of previous hour, t 3 Is t 1 The first hour of the future;
step S14 is specifically to substitute the coordinate values of the mobile PM2.5 concentration monitoring points as maxlng and maxlat into equations (1) and (2), respectively, calculate the grids to which the monitoring values belong, calculate the PM2.5 concentration values of each grid in different time periods, serve as the PM2.5 concentration values of the grid in the time period, and construct a multi-time-period space-time consistent PM2.5 concentration training dataset.
In a preferred embodiment, step S21 is specifically: the land utilization/coverage data is classified into 9 types of cultivated land, high-density forest regions, low-density forest regions, high-density residential regions, low-density residential regions, water areas, urban green lands, dust-raising ground surfaces and other building regions according to research requirements by combining with high-resolution remote sensing images; extracting a main road and a secondary road from the road vector data according to the road grade; the method comprises the steps that catering data are crawled from an open data platform, and POI data of three types of catering, namely a Chinese restaurant, a Western-style restaurant and a snack fast food restaurant, are selected according to the influence of catering source emission on PM2.5 concentration in the existing research; respectively constructing buffer areas of 100m, 200 m, 300 m, 500m, 1000 m and 1500m on the basis of the central points of the grids, calculating the area ratio of various types of utilization/coverage types in different buffer areas, the lengths of primary roads and secondary roads and the number of restaurant stores, and constructing a pollution source related factor data set;
step S22 specifically includes: verifying the autocorrelation of the spatial distribution of the concentration of PM2.5 by using a global Moran index, wherein the principle is shown in formulas (5) to (9);
Figure BDA0003656454470000051
Figure BDA0003656454470000052
wherein, w i,j Is the spatial weight between grids i and j, S 0 Denotes the aggregation of all spatial weights, z i And z j Respectively representing the deviation of the PM2.5 concentration values of grids I and j from the average value of the global PM2.5 concentration, wherein n represents the total number of elements, and I represents the global Moran index; the global Moran index is used for describing the average association degree of all the space units with the surrounding area in the whole area, and the value of the global Moran index is between-1.0 and 1.0, and I>0 represents that the attribute values of all regions have positive correlation in space, namely the closer the attribute values are, the easier the attribute values are to be gathered together, and I-0 represents that the regions are randomly distributed and has no spatial correlation; i is<0 means that the attribute values of all regions have a negative correlation in space, i.e., the more distinct the attribute values are, the easier they are to be grouped together;
step S23 specifically includes: introducing results of land use/coverage type ratio, road length and catering quantity of different buffer areas into a regression model one by one, and introducing relevant variables when the P value of the maximum F value in the candidate variables is less than or equal to 0.05; when the originally introduced variable becomes no longer significant due to the introduction of the following variable, namely the P value of the minimum F value is greater than or equal to 0.1, rejecting the original introduced variable; the process is repeated until no significant variable is selected into the equation, and no insignificant independent variable is removed from the regression equation; when the variables of the same type and different buffer area widths are significant to the PM2.5 concentration, deleting the variables with low significance to obtain the optimal relevant factor combination, and participating in the construction of a GWR model;
step S24 specifically includes: the influence of the geographic space on the dependent variable is comprehensively considered, and a geographic weighted regression model is established based on the optimal correlation factor combination, wherein the principle of the geographic weighted regression model is shown as a formula (7);
y i =β 0 ( U i ,V i )1 ( U i ,V i ) X i12 (U i ,V i )X i2 +…+β p (U i ,V i )X ipi ,i=1,2,…,n#(7
wherein (U) i ,V i ) Is the position of observation point i, beta 0 (U i ,V i ),β 1 (U i ,V i ),……,β p (U i ,V i ) Is the regression coefficient, X, at the ith geospatial location ip Is the observed value of the p-th group at the i position.
In a preferred embodiment, step S31 is specifically: the PM2.5 pollution diffusion related factors comprise average wind speed, temperature, humidity and wind direction index, and the influence of wind direction on the PM2.5 concentration is expressed by adopting the pollutant wind direction index, as shown in a formula (8):
Figure BDA0003656454470000061
among them, Wind index The method is characterized in that the method is a pollution source wind direction index and represents the intensity influenced by a pollution source, theta represents the Euclidean direction from a road or a flying dust surface nearest to a monitoring point to a PM2.5 concentration monitoring station, and beta is the average wind direction of 2 minutes at the moment of a meteorological station nearest to the monitoring point; the value range of the wind direction index of the pollution source is 0-1, and when the PM2.5 concentration monitoring station is located in the downwind direction of the nearest pollution source or in the pollution source, the wind direction index is 1; when PM2.5 concentration monitoring station is on the nearest pollution sourceAnd in the wind direction, the wind direction index is 0.
In a preferred embodiment, step S32 specifically includes the following steps:
step S321, extracting M samples from the N data sets;
step S322, calculating the residual error of each sample;
step S323, selecting an optimal division node from the m-dimensional features through a minimum loss function, and taking a residual error as training data;
step S324, re-dividing the sample according to the optimal dividing node to obtain a new leaf node and update the model;
step S325, iterate steps 322 and 324 until the mean square error is minimum;
the method comprises the steps of taking meteorological and scene factors as independent variables, taking PM2.5 concentration residual errors calculated by a PM2.5 concentration space differentiation simulation model as dependent variables, solving a GBDT model by using a Bayesian optimization algorithm, taking a GBDT ten-fold verification average value as a target function, updating posterior distribution of the target function by continuously adding sample points, and finally obtaining an optimal hyper-parameter combination, so that the PM2.5 concentration simulation and scene analysis model is constructed.
In a preferred embodiment, step S4 is specifically: based on the nonlinear influence of scene factors on the PM2.5 concentration in a partial dependence graph analysis PM2.5 concentration simulation and scene analysis model, the comprehensive influence of scene factor change on the PM2.5 concentration result is calculated by changing the area ratio of the scene factors under the condition of controlling other variables to be unchanged, and the principle is shown as a formula (9):
Figure BDA0003656454470000062
in the formula: x is a radical of a fluorine atom S As a scene variable, x C Dividing x in PM2.5 concentration simulation and scene analysis model S Other variables than the one or more of the above-mentioned variables,
Figure BDA0003656454470000063
for the trained PM2.5 concentration simulation and scene analysis model,
Figure BDA0003656454470000064
represents the model pair x of PM2.5 concentration simulation and scene analysis after training C The expectation of the variables is that,
Figure BDA0003656454470000065
denotes x S The corresponding PM2.5 concentration variation under different values.
Compared with the prior art, the invention has the following beneficial effects:
(1) according to the method, based on the mobile PM2.5 concentration monitoring data, the difference of PM2.5 concentrations in different urban scenes is comprehensively considered on the basis of combining pollution source related factors and pollution diffusion related factors, the spatial heterogeneity of the PM2.5 concentrations and the nonlinear influence of meteorological and urban scene factors on the PM2.5 concentrations are considered, and the spatial resolution of PM2.5 concentration distribution simulation in cities is improved.
(2) According to the method, the interpretability of a PM2.5 concentration distribution model is considered, the influence degree of urban scenes in different time periods on the PM2.5 concentration can be quantitatively analyzed by combining part of dependency graphs, and support is provided for PM2.5 pollution fine treatment in different scenes, urban planning and prevention of PM2.5 pollution exposure risks of key groups such as old people and children in scenes such as hospitals and schools.
Drawings
FIG. 1 is a flow chart of PM2.5 concentration distribution simulation and scene analysis in accordance with a preferred embodiment of the present invention;
fig. 2 is a fine simulation distribution diagram of PM2.5 concentration according to a preferred embodiment of the present invention.
FIG. 3 is a response characteristic of PM2.5 concentration to road scene factors in a preferred embodiment of the present invention;
FIG. 4 is a response characteristic of PM2.5 concentration to industrial area scene factors in accordance with a preferred embodiment of the present invention;
FIG. 5 is response characteristics of PM2.5 concentrations to park genre service area scene factors in accordance with a preferred embodiment of the present invention;
FIG. 6 is a response characteristic of PM2.5 concentration to construction site situational factors in accordance with a preferred embodiment of the present invention;
FIG. 7 is a graph showing the response characteristics of PM2.5 concentration to scene factors of educational medical units in accordance with a preferred embodiment of the present invention;
FIG. 8 is a response characteristic of PM2.5 concentration to business circle scenario factors for a preferred embodiment of the present invention;
FIG. 9 is a response characteristic of PM2.5 concentration to residential scene factors in a preferred embodiment of the present invention;
fig. 10 is a schematic diagram for distinguishing line segments of various periods of response characteristics of PM2.5 concentration to scene factors according to the preferred embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application; as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Referring to fig. 1 to 10, based on a model for simulating urban PM2.5 concentration distribution and analyzing scenes based on mobile monitoring data, based on mobile PM2.5 concentration monitoring data of different types of urban scenes, firstly, aiming at the problem of inconsistent monitoring time of the mobile PM2.5 monitoring data, performing space-time correction on the mobile PM2.5 monitoring data by using the urban atmospheric environment station PM2.5 monitoring data; then, constructing a GWR-based PM2.5 concentration spatial differentiation simulation model by using pollution source related data such as land utilization/coverage types, traffic networks, catering POIs and the like as independent variables and using a GWR method based on the spatial differentiation characteristics of PM2.5 concentration distribution; on the basis, the nonlinear response characteristics of PM2.5 concentration to meteorological factors and scene factors are fused, wind speed, temperature, humidity, wind direction indexes and scene factors are used as independent variables, the residual error of the result calculated by a PM2.5 concentration space diversity simulation model is used as a dependent variable, a GBDT method with strong explanatory power on the nonlinear action relationship is utilized, GWR simulation results are fused, a PM2.5 simulation and scene analysis model based on GWR-GBDT is constructed, and an hourly PM2.5 concentration distribution diagram is obtained; and finally, quantitatively analyzing the complex nonlinear relation between the scene factors and the PM2.5 concentration by combining a partial dependence graph, and exploring the response characteristics of the PM2.5 concentration to the scene factors.
The main content of the method comprises:
step S1, based on PM2.5 concentration mobile monitoring data space-time correction of a fixed monitoring station, constructing a PM2.5 concentration training data set with consistent space-time;
step S2, analyzing the correlation between pollution source related factors and PM2.5 concentration, and constructing a PM2.5 concentration spatial differentiation simulation model based on a geographical weighted regression method;
step S3, based on a gradient lifting tree method, combining pollution diffusion related factors and scene factors, further fitting the fitting residual error of the PM2.5 concentration space difference simulation model, and constructing a PM2.5 concentration simulation and scene analysis model;
and step S4, analyzing the response characteristics of the PM2.5 concentration to the scene factors by combining a partial dependency graph method.
The specific flow chart is shown in fig. 1, and the specific steps are introduced as follows: .
Step S1, moving PM2.5 concentration monitoring data space-time correction
In step S11, the mobile PM2.5 monitoring data is preprocessed. Abnormal values and missing values in the moving PM2.5 concentration monitoring data are processed. Step S12, study area meshing. And setting the sizes of the grid units in the vertical direction and the horizontal direction, respectively starting from the left side boundary and the lower side boundary of the research region, carrying out grid division on the research region to the right and upwards, and coding the grids. In step S13, the movement monitoring data is time-corrected. And correcting the PM2.5 concentration monitored in a moving mode based on the PM2.5 concentration variation trend of the fixed monitoring station closest to the moving monitoring position. In step S14, the concentration of PM2.5 in each grid is equalized in each period. And calculating the average value of the PM2.5 concentrations monitored in each grid in each time period, taking the average value as the PM2.5 concentration of the grid in the time period, and constructing a multi-time-period space-time consistent PM2.5 concentration training data set.
Step S2, establishing a PM2.5 concentration space differentiation simulation model
Step S21, preprocessing of pollution source related data. Buffer areas with different widths of 100m, 200 m, 300 m, 500m, 1000 m, 1500m and the like are respectively constructed based on the central points of the grids, land utilization/coverage data are reclassified, the area ratio of various types of utilization/coverage types in different buffer areas, the lengths of primary roads and secondary roads and the number of restaurant stores are calculated, and pollution source related factor data sets are constructed. And step S22, carrying out PM2.5 concentration distribution space autocorrelation test. And (4) checking whether the PM2.5 concentration distribution has spatial autocorrelation by using the global Moran index, and if the spatial autocorrelation exists, using a geographical weighted regression method. And step S23, screening pollution source related factors. And screening factors in the pollution source related factor data set by adopting a step-by-step method, and selecting the influence factors with high correlation with the PM2.5 concentration. Considering the practical significance of the variables, when the variables of the same type and different buffer zone widths are all obvious to the PM2.5 concentration, the variables with low correlation are deleted, and factor screening is carried out on the remaining variables again by adopting a step-by-step method until the influence factors of the same type do not exist, so that the optimal pollution source correlation influence factor combination is obtained. And step S24, constructing a PM2.5 concentration spatial differentiation simulation model. And modeling the spatial autocorrelation of the PM2.5 concentration by using a geographical weighted regression method, fusing the screened optimal pollution source related influence factor combination, and constructing a PM2.5 concentration spatial differentiation simulation model.
Step S3, constructing PM2.5 concentration simulation and scene analysis model
And step S31, preprocessing pollution diffusion factors and urban scene factors. Acquiring spatial distribution of wind speed, average temperature and humidity in a research area by adopting an empirical Krigin interpolation method based on meteorological station monitoring data, calculating a pollution source wind direction index of each grid based on a main city wind direction and the relative position of the grid and the nearest pollution source, calculating the area ratio of different scene types in each grid through superposition analysis, and constructing a pollution diffusion related factor and scene factor data set. And step S32, carrying out PM2.5 concentration simulation and scene analysis model construction. And (3) fusing pollution diffusion related factors and urban scene factors by adopting a gradient lifting tree method, further fitting residual errors of the PM2.5 concentration spatial differentiation simulation model, and constructing a PM2.5 concentration simulation and scene analysis model.
Step S4, response characteristic analysis of PM2.5 concentration to city scene factor
The urban scene influences the non-linear quantitative calculation of the PM2.5 concentration. And calculating the nonlinear influence of the urban scene on the PM2.5 concentration in different periods based on the PM2.5 concentration simulation and the scene analysis model combined with a partial dependency graph. And (4) visually displaying the nonlinear influence of the urban scenes in different time periods on the PM2.5 concentration, and analyzing the result by combining actual conditions.
In the above steps, the time-space correction of the moving PM2.5 monitoring data, the construction of the PM2.5 concentration spatial differentiation simulation model, the construction of the PM2.5 concentration simulation and scene analysis model, and the response characteristic analysis of the PM2.5 concentration to the urban scene factor are the key points of the present invention, and these four steps are discussed in detail in the following subsections.
Moving PM2.5 concentration monitoring data space-time correction
In step S11, the mobile PM2.5 monitoring data is preprocessed. The mobile PM2.5 concentration monitoring data are limited by objective conditions, partial data records are incomplete, and PM2.5 concentration monitoring values are abnormal, missing values and abnormal values are found by screening null values and drawing a line graph and a box graph, and the missing values and the abnormal values are deleted to ensure the integrity of the data.
Step S12, study meshing. Firstly, setting the sizes of grid units in the vertical direction and the horizontal direction, respectively starting from the left side boundary and the lower side boundary of a research area, carrying out grid division on the research area rightward and upward, coding grids according to the row and column numbers of the grids, and specifically calculating the reference formulas (1) to (3) according to the row and column numbers.
Figure BDA0003656454470000091
Figure BDA0003656454470000092
N=count lng *count lat #(3)
Wherein maxlng and minng are the maximum and minimum longitude coordinates of the study area, maxlat and minlat are the maximum and minimum latitude coordinates of the study area, count lng 、count lat Is the total number of row and column numbers, d is the regular grid size, and N is the total number of grids.
In step S13, the movement monitoring data is time-corrected. The problem that the collection time of the moving PM2.5 monitoring data is inconsistent exists, the concentration of PM2.5 slightly fluctuates in a short time, the PM2.5 concentration distribution simulation is directly carried out by adopting the moving PM2.5 concentration monitoring data at different moments, and the simulation result has errors. The method is based on the basic assumption of the classical meteorological parameter movement measurement in the existing research, namely the general trend of PM2.5 concentration change in a certain range in a short time is assumed to be consistent, the time consistency correction is carried out on the moving PM2.5 monitoring data by utilizing the hourly PM2.5 concentration change trend of an atmospheric environment fixed monitoring station, the data are corrected to the same moment, and the moving PM2.5 monitoring data correction method is shown as a formula (4).
Figure BDA0003656454470000101
In the formula:
Figure BDA0003656454470000102
for mobile monitoring data t 3 PM2.5 concentration value corrected at the position i at the moment;
Figure BDA0003656454470000103
for mobile monitoring data t 1 Monitoring a PM2.5 concentration observed value at the position i at the moment;
Figure BDA0003656454470000104
atmospheric ring at the l positionSituation monitoring station t 3 A PM2.5 concentration observed value at the moment;
Figure BDA0003656454470000105
atmospheric environment monitoring station t for position l 2 A PM2.5 concentration observed value at a moment; t is t 1 Monitoring time for moving PM 2.5; t is t 2 Is t 1 First hour of previous hour, t 3 Is t 1 The first hour of the future.
In step S14, the concentration of PM2.5 in each grid is equalized in each period. And (3) based on the coordinate values (lng, lat) of the mobile PM2.5 concentration monitoring points, respectively serving as maxlng and maxlat to be substituted into the formulas (1) and (2), calculating grids to which the monitoring values belong, calculating the PM2.5 concentration value of each grid at different periods, serving as the PM2.5 concentration value of the grid at the period, and constructing a multi-period space-time consistent PM2.5 concentration training data set.
PM2.5 concentration space differentiation simulation model construction
The spatial autocorrelation relationship exists in the PM2.5 concentration distribution, and the urban PM2.5 concentration distribution can be better simulated by comprehensively considering the autocorrelation relationship between pollution source-related factors and the PM2.5 concentration. The invention mainly utilizes a geographical weighted regression method to construct a PM2.5 concentration spatial differentiation simulation model based on 3 types of land utilization/coverage type, road length and catering quantity and 12 indexes, and the specific operation steps are as follows:
step S21, preprocessing of pollution source related data. The land utilization/coverage data is classified into 9 types of cultivated land, high-density forest regions, low-density forest regions, high-density residential regions, low-density residential regions, water areas, urban green lands, dust-raising ground surfaces and other building regions according to research requirements by combining with high-resolution remote sensing images; extracting a main road and a secondary road from the road vector data according to the road grade; the catering data is crawled from an open data platform of a Baidu map, and POI data of three types of catering, namely Chinese restaurants, western restaurants and snack fast food restaurants, are selected according to the influence of the emission of catering sources on PM2.5 concentration in the existing research. And respectively constructing buffer areas of 100m, 200 m, 300 m, 500m, 1000 m and 1500m on the basis of the central points of the grids, calculating the area ratio of various types of utilization/coverage types in different buffer areas, the lengths of the primary roads and the secondary roads and the number of restaurant stores, and constructing a pollution source related factor data set.
And step S22, carrying out PM2.5 concentration distribution space autocorrelation test. The spatial autocorrelation of the PM2.5 concentration distribution often exists, and if the spatial autocorrelation exists, a geographical weighted regression method is suitable for being used, so that the global Moran index is adopted to verify the autocorrelation of the PM2.5 concentration spatial distribution, and the principle is shown in the formulas (5) to (9).
Figure BDA0003656454470000111
Figure BDA0003656454470000112
Wherein, w i,j Is the spatial weight between the grids i and j, S 0 Denotes the aggregation of all spatial weights, z i And z j The deviation of the PM2.5 concentration values from the global PM2.5 concentration mean value for grids I and j, respectively, is indicated, n denotes the total number of elements, and I denotes the global morland index. The global Moran index is mainly used for describing the average association degree of all space units with the surrounding area in the whole area, and the value of the global Moran index is between-1.0 and 1.0, and I>0 represents that the attribute values of all regions have positive correlation in space, that is, the larger (smaller) the attribute value is, the more easily the attribute values are gathered together, and I-0 represents that the regions are randomly distributed and have no spatial correlation; I.C. A<0 means that the attribute values of all regions have a negative correlation in space, i.e., the larger (small) the attribute values are less likely to be clustered together.
And step S23, screening pollution source related factors. The invention screens the related variables of the pollution source based on a stepwise regression method, and the basic idea is as follows: and (3) introducing results of land utilization/coverage type ratio, road length and catering quantity of different buffer areas into a regression model one by one, and introducing relevant variables when the P value of the maximum F value in the candidate variables is less than or equal to 0.05. When the originally introduced variable becomes no longer significant due to the introduction of the following variable, i.e., the P value of the minimum F value is greater than or equal to 0.1, it is eliminated. This process is repeated until neither significant variables are selected into the equation, nor insignificant independent variables are removed from the regression equation. And considering the practical significance of the variables, when the variables of the same type and different buffer area widths are all significant to the PM2.5 concentration, deleting the variables with low significance to obtain the optimal relevant factor combination, and participating in the construction of the GWR model.
And step S24, constructing a PM2.5 concentration space differentiation simulation model. The geospatial heterogeneity problem is considered in the geoweighted regression, and a geoweighted regression model is established based on the optimal correlation factor combination by comprehensively considering the influence of geospatial on dependent variables, and the principle is shown in formula (7).
y i =β 0 (U i ,V i )+β 1 (U i ,V i )X i12 (U i ,V i )X i2 +…+β p (U i ,V i )X ipi ,i=1,2,…,n#(7)
Wherein (U) i ,V i ) Is the position of observation point i, beta 0 (U i ,V i ),β 1 (U i ,V i ),……,β p (U i ,V i ) Is the regression coefficient, X, at the ith geospatial location ip Is the observed value of the p-th group at the i position.
PM2.5 concentration simulation and scene analysis model construction
And step S31, preprocessing pollution diffusion factors and urban scene factors. The PM2.5 pollution diffusion related factors comprise average wind speed, temperature, humidity and wind direction indexes, hourly observation data of monitoring sites above the county level cannot represent the local meteorological condition conditions, and the spatial distribution conditions of the average wind speed, the temperature and the humidity are calculated by an empirical Krigin interpolation method. Because the wind direction has different influences on each area, the relative position of the pollution source and the fitting area needs to be combined for quantitative representation, and the influence of the wind direction on the PM2.5 concentration is represented by adopting a pollutant wind direction index, as shown in a formula (8):
Figure BDA0003656454470000121
therein, Wind index The index is a wind direction index of the pollution source and represents the intensity influenced by the pollution source, theta represents the Euclidean direction from a road or a flying dust surface nearest to a monitoring point to a PM2.5 concentration monitoring station, and beta is the average wind direction of 2 minutes at the moment of a meteorological station nearest to the monitoring point. The value range of the wind direction index of the pollution source is 0-1, and when a PM2.5 concentration monitoring station is in the downwind direction of the nearest pollution source or in the pollution source, the wind direction index is 1; when the PM2.5 concentration monitoring station is in the upwind direction of the nearest pollution source, the wind direction index is 0.
The urban scene is divided into a plurality of types according to the land plot functions and geographic conditions such as human activity rules, social functions, earth surface coverage combination and the like, and the urban scene is divided into 8 types of scenes such as roads, industrial areas, residential areas, educational and medical units, park cultural and literature service areas, business circles, construction sites and other areas, and the specific definition is shown in table 1.
TABLE 1 City scene type partitioning Specification
Figure BDA0003656454470000122
Figure BDA0003656454470000131
And step S32, constructing a PM2.5 concentration simulation and scene analysis model. Meteorological factors and urban scene factors have nonlinear influence on PM2.5 concentration, the gradient lifting tree method is an improved integrated model based on a decision tree, a flexible and efficient machine learning algorithm is fused, and the nonlinear relation between independent variables and dependent variables can be well simulated. The core of the method is that the negative gradient of a loss function in a lifting tree algorithm is used as residual approximation, and the loss function is minimized by gradually reducing the residual value, and the method specifically comprises the following steps: extracting M samples from N data sets; calculating the residual error of each sample; thirdly, selecting an optimal division node from the m-dimensional features through a minimized loss function, and using residual errors as training data; fourthly, re-segmenting the samples according to the optimal partitioning node to obtain new leaf nodes and updating the model; and fifthly, iterating the step 2 to the step 4 until the mean square error is minimum. According to the method, meteorological and scene factors are used as independent variables, PM2.5 concentration residual errors calculated by a PM2.5 concentration space diversity simulation model are used as dependent variables, a GBDT model is solved by using a Bayesian optimization algorithm, a GBDT ten-fold verification average value is used as a target function, the posterior distribution of the target function is updated by continuously adding sample points, and finally the optimal hyper-parameter combination is obtained, so that the PM2.5 concentration simulation and scene analysis model is constructed.
Response characteristic analysis of PM2.5 concentration to urban scene factors
The invention analyzes the nonlinear influence of scene factors in a PM2.5 concentration simulation and scene analysis model on the PM2.5 concentration based on a partial dependency graph, changes the area ratio of the scene factors under the condition of controlling other variables to be unchanged, and calculates the comprehensive influence of the scene factor change on the PM2.5 concentration result, wherein the principle is shown as a formula (9):
Figure BDA0003656454470000132
in the formula: x is the number of S As a scene variable, x C Dividing x in PM2.5 concentration simulation and scene analysis model S Other variables than the one or more of the above-mentioned variables,
Figure BDA0003656454470000133
for the trained PM2.5 concentration simulation and scene analysis model,
Figure BDA0003656454470000134
represents the model pair x of PM2.5 concentration simulation and scene analysis after training C The expectation of the variables is that,
Figure BDA0003656454470000135
represents x S The corresponding PM2.5 concentration variation under different values.
Through the method, the nonlinear influence of urban scenes on the PM2.5 concentration in different time periods is calculated, the line graph is adopted for visual display, the characteristic time period with strong PM2.5 concentration response is extracted, the reason why the PM2.5 concentration responds to urban scene factors is analyzed in combination with actual conditions, and suggestions can be provided for urban atmospheric pollution prevention and control.
The method is based on mobile monitoring data, integrates multi-source influence factors, and comprehensively considers the spatial heterogeneity of PM2.5 concentration distribution and the nonlinear relation of the influence factors. According to the method, the mobile monitoring data are adopted, the spatial heterogeneity of PM2.5 concentration distribution and the nonlinear influence of pollution diffusion and scene factors on the PM2.5 concentration are comprehensively considered, the geographic weighting and gradient lifting tree method is fused to construct the PM2.5 simulation and scene analysis model, and the spatial resolution of the urban PM2.5 concentration distribution simulation is improved. The invention realizes a method for quantitatively analyzing the nonlinear influence of the urban scene on the PM2.5 concentration. Based on a partial dependency graph method combining PM2.5 concentration simulation and a scene analysis model, the nonlinear influence degree of urban scenes in different time periods on PM2.5 concentration is quantitatively analyzed, time periods with stronger PM2.5 concentration response of each scene are extracted, and the influence reasons are analyzed by combining actual conditions, so that support can be provided for PM2.5 pollution fine treatment in different scenes, urban planning, and prevention of PM2.5 pollution exposure risks of important groups such as old people, children and the like in scenes such as hospitals and schools.
The invention adopts a plurality of mobile sensors to continuously and movably collect 10-hour PM2.5 concentration data aiming at the main urban area of a certain large city in China, and is used for analyzing and testing the effectiveness of the method in PM2.5 concentration spatial distribution simulation and urban scene analysis at different time intervals. The result of simulating PM2.5 concentration distribution according to the method of the invention is shown in FIG. 2, the spatial resolution reaches 100m × 100m, and the temporal resolution is 1 h. The result shows that the range of model fitting R2 in each period is 0.73-0.99, the model not only improves the accuracy of PM2.5 concentration space distribution simulation, but also has strong interpretability on the nonlinear relation between the scene factor and the PM2.5 concentration.
The PM2.5 concentration simulation and scene analysis model is combined with a partial dependency graph method, response characteristics of PM2.5 concentrations in different time periods to different urban scenes can be quantitatively mined, time periods with obvious influences on the PM2.5 concentrations in all scenes are extracted based on the method and visually represented and analyzed, and results are shown in FIGS. 3-9. The result shows that the PM2.5 concentration has time heterogeneity to the response of scene factors, and the human activity time-space difference characteristics and the intervention means to the PM2.5 have influence on the PM2.5 concentration distribution to a certain extent. Elderly people should avoid activities in the community during peak hours of work, reduce the risk of exposure to additional PM2.5 pollution, and be able to travel to parks for recreational activities. In urban planning and management, road grade dispersed traffic flow is reasonably planned, the increase of the local PM2.5 concentration in an urban caused by concentrated vehicle emission is reduced, and PM2.5 concentration pollution control of traffic intersections and urban main roads is enhanced; traffic dispersion in school areas at school time intervals of getting on and getting off is further strengthened, so that harm of PM2.5 pollution to children at the time intervals is reduced; scenes such as a small park can be planned in the industrial park, and measures such as watering and dust falling are adopted to reduce the PM2.5 concentration of the scenes.
According to the method, the time consistency correction is carried out on the mobile PM2.5 concentration monitoring data based on the atmospheric environment fixed station monitoring data, so that errors caused by inconsistent mobile monitoring data time are reduced. Based on mobile PM2.5 concentration monitoring data with consistent time, the spatial heterogeneity influence of urban pollution source factors on PM2.5 concentration distribution and the complex nonlinear relation between pollution diffusion factors and urban scenes and PM2.5 concentration are comprehensively considered, a geographic weighted regression method and a gradient lifting tree method are fused to construct a PM2.5 concentration simulation and scene analysis model based on the mobile monitoring data, and the spatial distribution of the urban PM2.5 concentration is well simulated. The method can better fit the nonlinear relation among meteorological factors, scene factors and PM2.5 concentration, effectively and finely simulate the spatial distribution difference of the PM2.5 concentration in the city, and improve the spatial resolution of the simulation of the PM2.5 concentration distribution in the city. The influence of urban scene factors on the PM2.5 concentration can be quantitatively analyzed by combining part of dependency graphs, and support is provided for PM2.5 pollution fine treatment and urban planning under different scenes and prevention of PM2.5 pollution exposure risk of high-risk groups such as old people and children.

Claims (10)

1. The model for simulating urban PM2.5 concentration distribution and analyzing scenes based on mobile monitoring data is characterized by comprising the following steps:
step S1, based on PM2.5 concentration mobile monitoring data space-time correction of a fixed monitoring station, constructing a PM2.5 concentration training data set with consistent space-time;
step S2, analyzing the correlation between pollution source related factors and PM2.5 concentration, and constructing a PM2.5 concentration spatial differentiation simulation model based on a geographical weighted regression method;
step S3, based on a gradient lifting tree method, combining pollution diffusion related factors and scene factors, further fitting the fitting residual error of the PM2.5 concentration spatial difference simulation model, and constructing a PM2.5 concentration distribution simulation and scene analysis model;
and step S4, analyzing the response characteristics of the PM2.5 concentration to the scene factors by combining a partial dependency graph method.
2. The mobile monitoring data-based urban PM2.5 concentration distribution simulation and scene analysis model according to claim 1, wherein step S1 specifically comprises:
step S11, preprocessing the data of the mobile PM2.5 monitoring data; processing abnormal values and missing values in the moving PM2.5 concentration monitoring data;
step S12, researching area mesh division; setting the sizes of the grid units in the vertical direction and the horizontal direction, respectively starting from the left side boundary and the lower side boundary of the research region, carrying out grid division on the research region rightward and upward, and coding grids;
step S13, correcting the mobile monitoring data time; correcting the PM2.5 concentration of the mobile monitoring based on the PM2.5 concentration variation trend of the fixed monitoring station closest to the mobile monitoring position;
step S14, equalizing the concentration of PM2.5 in each grid in each time interval; and calculating the monitored PM2.5 concentration mean value in each grid in each time period to serve as the PM2.5 concentration of the grid in the time period, and constructing a multi-time-period space-time consistent PM2.5 concentration training data set.
3. The mobile monitoring data-based urban PM2.5 concentration distribution simulation and scene analysis model according to claim 1, wherein step S2 specifically comprises:
step S21, preprocessing the pollution source related data; buffer areas with different widths of 100m, 200 m, 300 m, 500m, 1000 m, 1500m and the like are respectively constructed based on the central points of the grids, land utilization/coverage data are reclassified, the area ratio of various types of utilization/coverage types in different buffer areas, the lengths of primary roads and secondary roads and the number of restaurant stores are calculated, and pollution source related factor data sets are constructed;
step S22, carrying out PM2.5 concentration distribution space autocorrelation test; checking whether the PM2.5 concentration distribution has spatial autocorrelation by using the global Moran index, and if the spatial autocorrelation exists, using a geographical weighted regression method;
step S23, screening relevant factors of pollution sources; screening factors in the pollution source related factor data set by adopting a step-by-step method, and selecting influence factors with high correlation with PM2.5 concentration; considering the actual significance of the variables, when the variables of the same type and different buffer zone widths are all obvious to the PM2.5 concentration, deleting the variables with lower correlation, and carrying out factor screening on the rest variables by adopting a step-by-step method again until the influence factors of the same type do not exist, so as to obtain the optimal pollution source correlation influence factor combination;
step S24, constructing a PM2.5 concentration space differentiation simulation model; and modeling the spatial autocorrelation of the PM2.5 concentration by using a geographical weighted regression method, fusing the screened optimal pollution source related influence factor combination, and constructing a PM2.5 concentration spatial differentiation simulation model.
4. The mobile monitoring data-based urban PM2.5 concentration distribution simulation and scene analysis model according to claim 1, wherein step S3 specifically comprises:
step S31, preprocessing pollution diffusion factors and urban scene factors; acquiring spatial distribution of wind speed, average temperature and humidity in a research area by adopting an empirical Krigin interpolation method based on meteorological station monitoring data, calculating a pollution source wind direction index of each grid based on a main city wind direction and the relative position of the grid and the nearest pollution source, calculating the area ratio of different scene types in each grid through superposition analysis, and constructing a pollution diffusion related factor and scene factor data set;
step S32, PM2.5 concentration simulation and scene analysis model construction; and (3) fusing pollution diffusion related factors and urban scene factors by adopting a gradient lifting tree method, further fitting residual errors of the PM2.5 concentration spatial differentiation simulation model, and constructing a PM2.5 concentration simulation and scene analysis model.
5. The mobile monitoring data-based urban PM2.5 concentration distribution simulation and scene analysis model according to claim 1, wherein step S4 specifically comprises: the influence of the urban scene on the PM2.5 concentration is subjected to nonlinear quantitative calculation; calculating the nonlinear influence of urban scenes on the PM2.5 concentration in different periods based on a partial dependency graph combining PM2.5 concentration simulation and a scene analysis model; and (4) visually displaying the nonlinear influence of the urban scenes in different time periods on the PM2.5 concentration, and analyzing the result by combining actual conditions.
6. The mobile monitoring data-based urban PM2.5 concentration distribution simulation and scenario interpretation model according to claim 2, wherein the specific row and column numbers of step S12 are calculated with reference to formula (1) to formula (3);
Figure FDA0003656454460000031
Figure FDA0003656454460000032
N=count lng *count lat #(3)
wherein maxlng and minng are the maximum and minimum longitude coordinates of the study area, maxlat and minlat are the maximum and minimum latitude coordinates of the study area, count lng 、count lat Is the total number of row and column numbers, d is the regular gridSize, N is the total number of grids;
step S13 is specifically that the PM2.5 concentration change trend of the atmospheric environment fixed monitoring station per hour is utilized to carry out time consistency correction on the moving PM2.5 monitoring data, the moving PM2.5 monitoring data are corrected to the same moment, and the correction method of the moving PM2.5 monitoring data is shown as a formula (4);
Figure FDA0003656454460000041
in the formula:
Figure FDA0003656454460000042
for mobile monitoring data t 3 PM2.5 concentration value corrected at the position i at the moment;
Figure FDA0003656454460000043
for mobile monitoring data t 1 Monitoring a PM2.5 concentration observed value at the position i at the moment;
Figure FDA0003656454460000044
atmospheric environment monitoring station t for position l 3 A PM2.5 concentration observed value at a moment;
Figure FDA0003656454460000045
atmospheric environment monitoring station t for position l 2 A PM2.5 concentration observed value at a moment; t is t 1 Monitoring time for moving PM 2.5; t is t 2 Is t 1 First hour of previous hour, t 3 Is t 1 The first hour of the future;
step S14 is specifically to substitute the coordinate values of the mobile PM2.5 concentration monitoring points as maxlng and maxlat into equations (1) and (2), respectively, calculate the grids to which the monitoring values belong, calculate the PM2.5 concentration values of each grid in different time periods, serve as the PM2.5 concentration values of the grid in the time period, and construct a multi-time-period space-time consistent PM2.5 concentration training dataset.
7. The mobile monitoring data-based urban PM2.5 concentration distribution simulation and scenario analysis model of claim 3,
step S21 specifically includes: the land utilization/coverage data is classified into 9 types, namely cultivated land, high-density forest regions, low-density forest regions, high-density residential regions, low-density residential regions, water areas, urban green lands, dust-raising ground surfaces and other building regions according to research requirements by combining high-resolution remote sensing images; extracting a main road and a secondary road from the road vector data according to the road grade; the catering data is crawled from an open data platform, and POI data of three types of catering, namely a Chinese restaurant, a western restaurant and a snack fast food restaurant, are selected according to the influence of the emission of catering sources on PM2.5 concentration in the existing research; respectively constructing buffer areas of 100m, 200 m, 300 m, 500m, 1000 m and 1500m on the basis of the central points of the grids, calculating the area ratio of various types of utilization/coverage types in different buffer areas, the lengths of primary roads and secondary roads and the number of restaurant stores, and constructing a pollution source related factor data set;
step S22 specifically includes: verifying the autocorrelation of the spatial distribution of the concentration of PM2.5 by using a global Moran index, wherein the principle is shown in formulas (5) to (9);
Figure FDA0003656454460000051
Figure FDA0003656454460000052
wherein, w i,j Is the spatial weight between the grids i and j, S 0 Representing the aggregation of all spatial weights, z i And z j Respectively representing the deviation of the PM2.5 concentration values of grids I and j from the average value of the global PM2.5 concentration, wherein n represents the total number of elements, and I represents the global Moran index; the global Moran index is used for describing the average association degree of all the space units with the surrounding area in the whole area, and the value of the global Moran index is between-1.0 and 1.0, and I>0 means that the attribute values of all regions have positive correlation in space, i.e. the closer the attribute values are, the easier they are to be gathered together,i is 0, which represents that the regions are randomly distributed and have no spatial correlation; I.C. A<0 means that the attribute values of all regions have a negative correlation in space, i.e., the more distinct the attribute values are, the easier they are to be grouped together;
step S23 specifically includes: introducing results of land utilization/coverage type ratio, road length and catering quantity of different buffer areas into a regression model one by one, and introducing relevant variables when the P value of the maximum F value in the candidate variables is less than or equal to 0.05; when the originally introduced variable becomes no longer significant due to the introduction of the following variable, namely the P value of the minimum F value is greater than or equal to 0.1, the originally introduced variable is removed; the process is repeated until no significant variable is selected into the equation, and no insignificant independent variable is removed from the regression equation; when the variables of the same type and different buffer area widths are significant to the PM2.5 concentration, deleting the variables with low significance to obtain the optimal relevant factor combination, and participating in the construction of a GWR model;
step S24 specifically includes: the influence of the geographic space on the dependent variable is comprehensively considered, and a geographic weighted regression model is established based on the optimal correlation factor combination, wherein the principle of the geographic weighted regression model is shown as a formula (7);
y i =β 0 (U i ,V i )+β 1 (U i ,V i )X i12 (U i ,V i )X i2 +…+β p (U i ,V i )X ipi ,i=1,2,…,n#(7
wherein (U) i ,V i ) Is the position of observation point i, beta 0 (U i ,V i ),β 1 (U i ,V i ),……,β p (U i ,V i ) Is a regression coefficient, X, at the ith geospatial location ip Is the observed value of the p-th group at the i position.
8. The mobile monitoring data-based urban PM2.5 concentration distribution simulation and scenario analysis model of claim 4,
step S31 specifically includes: the PM2.5 pollution diffusion related factors comprise average wind speed, temperature, humidity and wind direction index, and the influence of wind direction on the PM2.5 concentration is expressed by adopting the pollutant wind direction index as shown in a formula (8):
Figure FDA0003656454460000061
therein, Wind index The method is characterized in that the method is a pollution source wind direction index and represents the intensity influenced by a pollution source, theta represents the Euclidean direction from a road or a flying dust surface nearest to a monitoring point to a PM2.5 concentration monitoring station, and beta is the average wind direction of 2 minutes at the moment of a meteorological station nearest to the monitoring point; the value range of the wind direction index of the pollution source is 0-1, and when a PM2.5 concentration monitoring station is in the downwind direction of the nearest pollution source or in the pollution source, the wind direction index is 1; when the PM2.5 concentration monitoring station is in the upwind direction of the nearest pollution source, the wind direction index is 0.
9. The model for urban PM2.5 concentration distribution simulation and scene analysis based on mobile monitoring data according to claim 4, wherein step S32 specifically comprises the following steps:
step S321, extracting M samples from the N data sets;
step S322, calculating the residual error of each sample;
step S323, selecting an optimal division node from the m-dimensional features through a minimized loss function, and using a residual error as training data;
step S324, re-dividing the sample according to the optimal dividing node to obtain a new leaf node and update the model;
step S325, iterate steps 322 and 324 until the mean square error is minimum;
the method comprises the steps of taking meteorological and scene factors as independent variables, taking PM2.5 concentration residual errors calculated by a PM2.5 concentration space differentiation simulation model as dependent variables, solving a GBDT model by using a Bayesian optimization algorithm, taking a GBDT ten-fold verification average value as a target function, updating posterior distribution of the target function by continuously adding sample points, and finally obtaining an optimal hyper-parameter combination, so that the PM2.5 concentration simulation and scene analysis model is constructed.
10. The mobile monitoring data-based urban PM2.5 concentration distribution simulation and scenario analysis model of claim 5, wherein:
step S4 specifically includes: based on the nonlinear influence of scene factors on the PM2.5 concentration in a partial dependence graph analysis PM2.5 concentration simulation and scene analysis model, the comprehensive influence of scene factor change on the PM2.5 concentration result is calculated by changing the area ratio of the scene factors under the condition of controlling other variables to be unchanged, and the principle is shown as a formula (9):
Figure FDA0003656454460000071
in the formula: x is the number of S As a scene variable, x C Dividing x in PM2.5 concentration simulation and scene analysis model S Other variables than the one or more of the above-mentioned variables,
Figure FDA0003656454460000072
for the trained PM2.5 concentration simulation and scene analysis model,
Figure FDA0003656454460000073
represents the model pair x of PM2.5 concentration simulation and scene analysis after training C The expectation of the variables is that,
Figure FDA0003656454460000074
denotes x S The corresponding PM2.5 concentration change under different values.
CN202210561494.5A 2022-05-23 2022-05-23 Urban PM25 concentration distribution simulation and scene analysis model based on mobile monitoring data Pending CN114936957A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210561494.5A CN114936957A (en) 2022-05-23 2022-05-23 Urban PM25 concentration distribution simulation and scene analysis model based on mobile monitoring data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210561494.5A CN114936957A (en) 2022-05-23 2022-05-23 Urban PM25 concentration distribution simulation and scene analysis model based on mobile monitoring data

Publications (1)

Publication Number Publication Date
CN114936957A true CN114936957A (en) 2022-08-23

Family

ID=82864513

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210561494.5A Pending CN114936957A (en) 2022-05-23 2022-05-23 Urban PM25 concentration distribution simulation and scene analysis model based on mobile monitoring data

Country Status (1)

Country Link
CN (1) CN114936957A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115238245A (en) * 2022-09-22 2022-10-25 中科三清科技有限公司 Pollutant monitoring method and device, storage medium and electronic equipment
CN115453064A (en) * 2022-09-22 2022-12-09 山东大学 Fine particle air pollution cause analysis method and system
CN115600919A (en) * 2022-09-19 2023-01-13 江苏蓝创智能科技股份有限公司(Cn) Method for real-time unorganized emission localization and campus emission total calculation
CN115861011A (en) * 2023-02-15 2023-03-28 山东优嘉环境科技有限公司 Smart city optimization management method and system based on multi-source data fusion
CN116522270A (en) * 2023-07-04 2023-08-01 西安启迪能源技术有限公司 Data processing system for smart sponge city
CN116732926A (en) * 2023-08-14 2023-09-12 中科三清科技有限公司 Method, apparatus and readable storage medium for improving air quality
CN117473398A (en) * 2023-12-26 2024-01-30 四川国蓝中天环境科技集团有限公司 Urban dust pollution source classification method based on slag transport vehicle activity
CN118013769A (en) * 2024-04-10 2024-05-10 南京气象科技创新研究院 Atmospheric pollutant concentration prediction method based on WRF-Chem

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120315920A1 (en) * 2011-06-10 2012-12-13 International Business Machines Corporation Systems and methods for analyzing spatiotemporally ambiguous events
CN112069673A (en) * 2020-08-31 2020-12-11 河南大学 Method for estimating surface PM2.5 concentration based on gradient lifting decision tree
CN113297527A (en) * 2021-06-09 2021-08-24 四川大学 PM based on multisource city big data2.5Overall domain space-time calculation inference method
CN113901384A (en) * 2021-09-24 2022-01-07 武汉大学 Ground PM2.5 concentration modeling method considering global spatial autocorrelation and local heterogeneity
CN114462316A (en) * 2022-02-09 2022-05-10 中南大学 Air quality optimization simulation method for homeland space planning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120315920A1 (en) * 2011-06-10 2012-12-13 International Business Machines Corporation Systems and methods for analyzing spatiotemporally ambiguous events
CN112069673A (en) * 2020-08-31 2020-12-11 河南大学 Method for estimating surface PM2.5 concentration based on gradient lifting decision tree
CN113297527A (en) * 2021-06-09 2021-08-24 四川大学 PM based on multisource city big data2.5Overall domain space-time calculation inference method
CN113901384A (en) * 2021-09-24 2022-01-07 武汉大学 Ground PM2.5 concentration modeling method considering global spatial autocorrelation and local heterogeneity
CN114462316A (en) * 2022-02-09 2022-05-10 中南大学 Air quality optimization simulation method for homeland space planning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
许珊等: "面向场景的城市PM2.5浓度空间分布精细模拟", 中国环境科学, vol. 39, no. 11, 31 December 2019 (2019-12-31), pages 4570 - 4579 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115600919B (en) * 2022-09-19 2023-08-22 江苏蓝创智能科技股份有限公司 Method for real-time unorganized emission location and total amount of campus emissions calculation
CN115600919A (en) * 2022-09-19 2023-01-13 江苏蓝创智能科技股份有限公司(Cn) Method for real-time unorganized emission localization and campus emission total calculation
CN115453064A (en) * 2022-09-22 2022-12-09 山东大学 Fine particle air pollution cause analysis method and system
CN115238245A (en) * 2022-09-22 2022-10-25 中科三清科技有限公司 Pollutant monitoring method and device, storage medium and electronic equipment
CN115453064B (en) * 2022-09-22 2023-09-05 山东大学 Fine particulate matter air pollution cause analysis method and system
CN115861011A (en) * 2023-02-15 2023-03-28 山东优嘉环境科技有限公司 Smart city optimization management method and system based on multi-source data fusion
CN115861011B (en) * 2023-02-15 2023-05-05 山东优嘉环境科技有限公司 Smart city optimization management method and system based on multi-source data fusion
CN116522270A (en) * 2023-07-04 2023-08-01 西安启迪能源技术有限公司 Data processing system for smart sponge city
CN116522270B (en) * 2023-07-04 2023-09-15 西安启迪能源技术有限公司 Data processing system for smart sponge city
CN116732926A (en) * 2023-08-14 2023-09-12 中科三清科技有限公司 Method, apparatus and readable storage medium for improving air quality
CN117473398A (en) * 2023-12-26 2024-01-30 四川国蓝中天环境科技集团有限公司 Urban dust pollution source classification method based on slag transport vehicle activity
CN117473398B (en) * 2023-12-26 2024-03-19 四川国蓝中天环境科技集团有限公司 Urban dust pollution source classification method based on slag transport vehicle activity
CN118013769A (en) * 2024-04-10 2024-05-10 南京气象科技创新研究院 Atmospheric pollutant concentration prediction method based on WRF-Chem

Similar Documents

Publication Publication Date Title
CN114936957A (en) Urban PM25 concentration distribution simulation and scene analysis model based on mobile monitoring data
CN110334864B (en) GIS-based multi-specification-in-one urban and rural space partition method
Bishop et al. Prediction of scenic beauty using mapped data and geographic information systems
Yang et al. Impact of accessibility on housing prices in Dalian city of China based on a geographically weighted regression model
Quan et al. GIS-based landslide susceptibility mapping using analytic hierarchy process and artificial neural network in Jeju (Korea)
CN110458048A (en) Take population distribution Spatio-temporal Evolution and the cognition of town pattern feature into account
CN108701274A (en) A kind of small scale air quality index prediction technique in city and system
De Ridder et al. Simulating the impact of urban sprawl on air quality and population exposure in the German Ruhr area. Part I: Reproducing the base state
CN109359166B (en) Space growth dynamic simulation and driving force factor contribution degree synchronous calculation method
CN107480808B (en) Method for planning diversion project line in high-altitude mountain area
CN114861277A (en) Long-time-sequence national soil space function and structure simulation method
Lamichhane et al. Land use land cover (LULC) change projection in Kathmandu valley using the clue-s model
Ren et al. Analysis of the spatial characteristics of inhalable particulate matter concentrations under the influence of a three-dimensional landscape pattern in Xi'an, China
CN115018268A (en) Forest ecological service value evaluation method based on space measurement and calculation relative quantity
CN116110210B (en) Data-driven landslide hazard auxiliary decision-making method in complex environment
Sangawongse et al. Urban growth and land cover change in Chiang Mai and Taipei: results from the SLEUTH model
Zhang et al. The CA model based on data assimilation
Zubair Prediction of land use and land cover (LULC) changes using CA-Markov model in Mamuju Subdistrict
CN117219183A (en) High coverage near ground NO in cloudy rain areas 2 Concentration estimation method and system
Chen et al. A Spatiotemporal Interpolation Graph Convolutional Network for Estimating PM₂. ₅ Concentrations Based on Urban Functional Zones
Dehingia et al. Decadal Transformation of Land Use-Land Cover and Future Spatial Expansion in Bangalore Metropolitan Region, India: Open-Source Geospatial Machine Learning Approach
CN113554221B (en) Method for simulating and predicting town development boundary under view angle of&#39; flow space
CN114881309A (en) Method for measuring characteristic correlation between urban vitality and carbon emptying
CN114818310A (en) Forest landscape simulation method and device, electronic equipment and storage medium
Sangawongse Land-use/land-cover dynamics in Chiang Mai: appraisal from remote sensing, GIS and modeling approaches

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination