CN114936957A

CN114936957A - Urban PM25 concentration distribution simulation and scene analysis model based on mobile monitoring data

Info

Publication number: CN114936957A
Application number: CN202210561494.5A
Authority: CN
Inventors: 李代超; 谢晓苇; 吴升; 赵志远
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2022-05-23
Filing date: 2022-05-23
Publication date: 2022-08-23

Abstract

The invention provides a mobile monitoring data-based urban PM25 concentration distribution simulation and scene analysis model, which comprises a step S1, wherein a PM2.5 concentration training data set with consistent space-time is constructed based on PM2.5 concentration mobile monitoring data space-time correction of a fixed monitoring station; step S2, analyzing the correlation between pollution source related factors and PM2.5 concentration, and constructing a PM2.5 concentration spatial differentiation simulation model based on a geographical weighted regression method; step S3, based on a gradient lifting tree method, combining pollution diffusion related factors and scene factors, further fitting the fitting residual error of the PM2.5 concentration space difference simulation model, and constructing a PM2.5 concentration simulation and scene analysis model; and step S4, analyzing the response characteristics of the PM2.5 concentration to the scene factors by combining a partial dependency graph method. By applying the technical scheme, the spatial heterogeneity of PM2.5 concentration and the nonlinear influence of meteorological and urban scene factors on PM2.5 concentration can be considered, and the spatial resolution of PM2.5 concentration distribution simulation in the city is improved.

Description

Urban PM25 concentration distribution simulation and scene analysis model based on mobile monitoring data

Technical Field

The invention relates to the technical field of spatial information, in particular to an urban PM2.5 concentration distribution simulation and scene analysis model based on mobile monitoring data.

Background

At present, PM2.5 spatial distribution simulation is mainly developed based on satellite remote sensing image data and ground monitoring station data. However, the satellite remote sensing image data is affected by factors such as long revisit period, data loss caused by cloud and rain blocking and the like, the ground monitoring station data is sparsely distributed, and air pollution in different local areas is greatly different, so that urban scale fine simulation is difficult to perform based on the two types of data.

The mobile monitoring is used as a flexible, accurate and high-precision spatial data acquisition method, can be deeply inserted into different urban scenes, provides a new technical means for monitoring the urban PM2.5 concentration, can realize the refined simulation of the urban PM2.5 concentration spatial distribution, and provides support for PM2.5 pollution control and urban planning in different scenes and prevention of PM2.5 pollution exposure risk of high-risk groups such as old people and children.

At present, research based on air pollution mobile monitoring data mainly aims at exploring relevant influence factors such as urban internal air pollution monitoring concentration, building environment and weather, or analyzing urban internal PM2.5 concentration distribution based on a simple interpolation method [1-2 ]. In the aspect of exploring the PM2.5 concentration influence, domestic and foreign scholars study the influence of meteorological factors, urban landscape patterns, land coverage types and other factors on the PM2.5 concentration, but lack the exploration of complex nonlinear influence of scene factors in different periods on the PM2.5 concentration.

In the aspect of PM2.5 concentration distribution simulation model construction, methods such as spatial interpolation [3], statistical regression [4], machine learning [5] and hybrid models are mainly adopted in the existing research. The spatial interpolation method only considers the spatial correlation of PM2.5 concentration distribution, and the model is simpler but has lower precision. The statistical Regression model comprises a Land Use Regression model (LUR), a Geographical Weighted Regression model (GWR), a Geographical space-time Weighted Regression model (GTWR) and the like, and can be used for fusing the influence of various influencing factors on the concentration of PM2.5, wherein the GWR model gives consideration to the spatial heterogeneity of PM2.5 distribution, and the GTWR model further considers the correlation of PM2.5 concentration change in time, but the models can only be used for fitting the linear relation of the influencing factors on the concentration of PM 2.5. The machine learning model [6-8] can better fit the nonlinear relation between the influence factors and the PM2.5 concentration, and compared with an interpolation and statistical regression method, the simulation model precision is improved. Aiming at the problems that the nonlinear fitting effect of a statistical regression model is poor and a machine learning model ignores spatial correlation and heterogeneity of PM2.5 concentration, a scholars fuses two models to generate a mixed model [9-10], so that the model can give consideration to spatial heterogeneity of PM2.5 concentration distribution and nonlinear influence of influencing factors on PM2.5 concentration, the fitting precision of the model is improved, but the correlation model is mostly based on a pure quantitative mode and lacks of analysis of nonlinear relation between variables. A Gradient Boosting Decision Tree (GBDT) model in machine learning has better interpretability on the relation between independent variables and dependent variables by combining a partial dependency graph [11], and the nonlinear influence degree between the independent variables and the dependent variables can be shown.

Summarizing the above study, it can be seen that: (1) at present, research based on air pollution mobile monitoring data mainly aims at exploring relevant influence factors such as air pollution monitoring concentration in cities, building environment, weather and the like, and PM2.5 concentration distribution simulation relevant research is only based on a simpler interpolation method; (2) in the aspect of exploring the PM2.5 concentration influence, domestic and foreign scholars study the influence of meteorological factors, urban landscape patterns, land coverage types and other factors on the PM2.5 concentration, but lack the study on the complex nonlinear influence of different time period scene factors on the PM2.5 concentration; (3) in the aspect of building a PM2.5 concentration distribution simulation model, the existing research fails to consider the spatial heterogeneity of the PM2.5 concentration distribution and the nonlinear influence of the PM2.5 concentration and related factors and also consider the interpretability of the model.

Disclosure of Invention

In view of this, the present invention provides an urban PM2.5 concentration distribution simulation and scene analysis model based on mobile monitoring data, which achieves consideration of spatial heterogeneity of PM2.5 concentration and nonlinear influence of weather and urban scene factors on PM2.5 concentration, and improves spatial resolution of the urban PM2.5 concentration distribution simulation.

In order to achieve the purpose, the invention adopts the following technical scheme: the model for simulating urban PM2.5 concentration distribution and analyzing scenes based on mobile monitoring data comprises the following steps:

step S1, based on the PM2.5 concentration mobile monitoring data space-time correction of a fixed monitoring station, constructing a PM2.5 concentration training data set with consistent space-time;

step S2, analyzing the correlation between pollution source related factors and PM2.5 concentration, and constructing a PM2.5 concentration spatial differentiation simulation model based on a geographical weighted regression method;

step S3, based on a gradient lifting tree method, combining pollution diffusion related factors and scene factors, further fitting the fitting residual error of the PM2.5 concentration spatial difference simulation model, and constructing a PM2.5 concentration distribution simulation and scene analysis model;

and step S4, analyzing the response characteristics of the PM2.5 concentration to the scene factors by combining a partial dependency graph method.

In a preferred embodiment, step S1 specifically includes:

step S11, preprocessing the data of the mobile PM2.5 monitoring data; processing abnormal values and missing values in the moving PM2.5 concentration monitoring data;

step S12, researching area mesh division; setting the sizes of the grid units in the vertical direction and the horizontal direction, respectively starting from the left side boundary and the lower side boundary of the research region, carrying out grid division on the research region rightward and upward, and coding grids;

step S13, correcting the mobile monitoring data time; correcting the PM2.5 concentration of the mobile monitoring based on the PM2.5 concentration variation trend of the fixed monitoring station closest to the mobile monitoring position;

step S14, equalizing the concentration of PM2.5 in each grid in each time interval; and calculating the average value of the PM2.5 concentrations monitored in each grid in each time period, taking the average value as the PM2.5 concentration of the grid in the time period, and constructing a multi-time-period space-time consistent PM2.5 concentration training data set.

In a preferred embodiment, step S2 specifically includes:

step S21, preprocessing the pollution source related data; respectively constructing buffer areas with different widths such as 100, 200, 300, 500, 1000, 1500m and the like based on the central points of grids, reclassifying land utilization/coverage data, calculating the area ratio of various types of utilization/coverage types, the lengths of primary roads and secondary roads and the number of catering stores in different buffer areas, and constructing a pollution source related factor data set;

step S22, carrying out PM2.5 concentration distribution space autocorrelation test; checking whether the PM2.5 concentration distribution has spatial autocorrelation by using the global Moran index, and if the spatial autocorrelation exists, using a geographical weighted regression method;

step S23, screening relevant factors of pollution sources; screening factors in the pollution source related factor data set by adopting a step-by-step method, and selecting influence factors with high correlation with PM2.5 concentration; considering the actual significance of the variables, when the variables of the same type and different buffer zone widths are all obvious to the PM2.5 concentration, deleting the variables with lower correlation, and carrying out factor screening on the rest variables by adopting a step-by-step method again until the influence factors of the same type do not exist, so as to obtain the optimal pollution source correlation influence factor combination;

step S24, constructing a PM2.5 concentration spatial differentiation simulation model; and modeling the spatial autocorrelation of the PM2.5 concentration by using a geographical weighted regression method, fusing the screened optimal pollution source related influence factor combination, and constructing a PM2.5 concentration spatial differentiation simulation model.

In a preferred embodiment, step S3 specifically includes:

step S31, preprocessing pollution diffusion factors and urban scene factors; acquiring spatial distribution of wind speed, average temperature and humidity in a research area by adopting an empirical Krigin interpolation method based on meteorological station monitoring data, calculating a pollution source wind direction index of each grid based on a main city wind direction and the relative position of the grid and the nearest pollution source, calculating the area ratio of different scene types in each grid through superposition analysis, and constructing a pollution diffusion related factor and scene factor data set;

step S32, PM2.5 concentration simulation and scene analysis model construction; and (3) fusing pollution diffusion related factors and urban scene factors by adopting a gradient lifting tree method, further fitting residual errors of the PM2.5 concentration spatial differentiation simulation model, and constructing a PM2.5 concentration simulation and scene analysis model.

In a preferred embodiment, step S4 specifically includes: the influence of the urban scene on the PM2.5 concentration is subjected to nonlinear quantitative calculation; calculating the nonlinear influence of urban scenes on the PM2.5 concentration at different time periods based on the PM2.5 concentration simulation and the scene analysis model combined with a partial dependency graph; and (4) visually displaying the nonlinear influence of the urban scenes in different time periods on the PM2.5 concentration, and analyzing the result by combining actual conditions.

In a preferred embodiment, the specific row and column number calculation of step S12 refers to formula (1) to formula (3);

N＝count _lng *count _lat #(3)

wherein maxlng and minng are the maximum and minimum longitude coordinates of the study area, maxlat and minlat are the maximum and minimum latitude coordinates of the study area, count _lng 、count _lat The total number of the row and column numbers, d is the size of the regular grid, and N is the total number of the grids;

step S13 is specifically to perform time consistency correction on the moving PM2.5 monitoring data by using the hourly PM2.5 concentration variation trend of the atmospheric environment fixed monitoring station, correct the data to the same time, and use the moving PM2.5 monitoring data correction method as shown in formula (4);

in the formula:

for mobile monitoring data t ₃ PM2.5 concentration value corrected at the position i at the moment;

for mobile monitoring data t ₁ Monitoring a PM2.5 concentration observed value at the position i at the moment;

atmospheric environment monitoring station t for position l ₃ A PM2.5 concentration observed value at a moment;

atmospheric environment monitoring station t for position l ₂ A PM2.5 concentration observed value at a moment; t is t ₁ Monitoring time for moving PM 2.5; t is t ₂ Is t ₁ First hour of previous hour, t ₃ Is t ₁ The first hour of the future;

step S14 is specifically to substitute the coordinate values of the mobile PM2.5 concentration monitoring points as maxlng and maxlat into equations (1) and (2), respectively, calculate the grids to which the monitoring values belong, calculate the PM2.5 concentration values of each grid in different time periods, serve as the PM2.5 concentration values of the grid in the time period, and construct a multi-time-period space-time consistent PM2.5 concentration training dataset.

In a preferred embodiment, step S21 is specifically: the land utilization/coverage data is classified into 9 types of cultivated land, high-density forest regions, low-density forest regions, high-density residential regions, low-density residential regions, water areas, urban green lands, dust-raising ground surfaces and other building regions according to research requirements by combining with high-resolution remote sensing images; extracting a main road and a secondary road from the road vector data according to the road grade; the method comprises the steps that catering data are crawled from an open data platform, and POI data of three types of catering, namely a Chinese restaurant, a Western-style restaurant and a snack fast food restaurant, are selected according to the influence of catering source emission on PM2.5 concentration in the existing research; respectively constructing buffer areas of 100m, 200 m, 300 m, 500m, 1000 m and 1500m on the basis of the central points of the grids, calculating the area ratio of various types of utilization/coverage types in different buffer areas, the lengths of primary roads and secondary roads and the number of restaurant stores, and constructing a pollution source related factor data set;

step S22 specifically includes: verifying the autocorrelation of the spatial distribution of the concentration of PM2.5 by using a global Moran index, wherein the principle is shown in formulas (5) to (9);

wherein, w _i,j Is the spatial weight between grids i and j, S ₀ Denotes the aggregation of all spatial weights, z _i And z _j Respectively representing the deviation of the PM2.5 concentration values of grids I and j from the average value of the global PM2.5 concentration, wherein n represents the total number of elements, and I represents the global Moran index; the global Moran index is used for describing the average association degree of all the space units with the surrounding area in the whole area, and the value of the global Moran index is between-1.0 and 1.0, and I>0 represents that the attribute values of all regions have positive correlation in space, namely the closer the attribute values are, the easier the attribute values are to be gathered together, and I-0 represents that the regions are randomly distributed and has no spatial correlation; i is<0 means that the attribute values of all regions have a negative correlation in space, i.e., the more distinct the attribute values are, the easier they are to be grouped together;

step S23 specifically includes: introducing results of land use/coverage type ratio, road length and catering quantity of different buffer areas into a regression model one by one, and introducing relevant variables when the P value of the maximum F value in the candidate variables is less than or equal to 0.05; when the originally introduced variable becomes no longer significant due to the introduction of the following variable, namely the P value of the minimum F value is greater than or equal to 0.1, rejecting the original introduced variable; the process is repeated until no significant variable is selected into the equation, and no insignificant independent variable is removed from the regression equation; when the variables of the same type and different buffer area widths are significant to the PM2.5 concentration, deleting the variables with low significance to obtain the optimal relevant factor combination, and participating in the construction of a GWR model;

step S24 specifically includes: the influence of the geographic space on the dependent variable is comprehensively considered, and a geographic weighted regression model is established based on the optimal correlation factor combination, wherein the principle of the geographic weighted regression model is shown as a formula (7);

y _i ＝β ₀ ⁽ U _i ,V _i ⁾ +β ₁ ⁽ U _i ,V _i ⁾ X _i1 +β ₂ (U _i ,V _i )X _i2 +…+β _p (U _i ,V _i )X _ip +ε _i ,i＝1,2,…,n#(7

wherein (U) _i ,V _i ) Is the position of observation point i, beta ₀ (U _i ,V _i )，β ₁ (U _i ,V _i )，……，β _p (U _i ,V _i ) Is the regression coefficient, X, at the ith geospatial location _ip Is the observed value of the p-th group at the i position.

In a preferred embodiment, step S31 is specifically: the PM2.5 pollution diffusion related factors comprise average wind speed, temperature, humidity and wind direction index, and the influence of wind direction on the PM2.5 concentration is expressed by adopting the pollutant wind direction index, as shown in a formula (8):

among them, Wind _index The method is characterized in that the method is a pollution source wind direction index and represents the intensity influenced by a pollution source, theta represents the Euclidean direction from a road or a flying dust surface nearest to a monitoring point to a PM2.5 concentration monitoring station, and beta is the average wind direction of 2 minutes at the moment of a meteorological station nearest to the monitoring point; the value range of the wind direction index of the pollution source is 0-1, and when the PM2.5 concentration monitoring station is located in the downwind direction of the nearest pollution source or in the pollution source, the wind direction index is 1; when PM2.5 concentration monitoring station is on the nearest pollution sourceAnd in the wind direction, the wind direction index is 0.

In a preferred embodiment, step S32 specifically includes the following steps:

step S321, extracting M samples from the N data sets;

step S322, calculating the residual error of each sample;

step S323, selecting an optimal division node from the m-dimensional features through a minimum loss function, and taking a residual error as training data;

step S324, re-dividing the sample according to the optimal dividing node to obtain a new leaf node and update the model;

step S325, iterate steps 322 and 324 until the mean square error is minimum;

the method comprises the steps of taking meteorological and scene factors as independent variables, taking PM2.5 concentration residual errors calculated by a PM2.5 concentration space differentiation simulation model as dependent variables, solving a GBDT model by using a Bayesian optimization algorithm, taking a GBDT ten-fold verification average value as a target function, updating posterior distribution of the target function by continuously adding sample points, and finally obtaining an optimal hyper-parameter combination, so that the PM2.5 concentration simulation and scene analysis model is constructed.

In a preferred embodiment, step S4 is specifically: based on the nonlinear influence of scene factors on the PM2.5 concentration in a partial dependence graph analysis PM2.5 concentration simulation and scene analysis model, the comprehensive influence of scene factor change on the PM2.5 concentration result is calculated by changing the area ratio of the scene factors under the condition of controlling other variables to be unchanged, and the principle is shown as a formula (9):

in the formula: x is a radical of a fluorine atom _S As a scene variable, x _C Dividing x in PM2.5 concentration simulation and scene analysis model _S Other variables than the one or more of the above-mentioned variables,

for the trained PM2.5 concentration simulation and scene analysis model,

represents the model pair x of PM2.5 concentration simulation and scene analysis after training _C The expectation of the variables is that,

denotes x _S The corresponding PM2.5 concentration variation under different values.

Compared with the prior art, the invention has the following beneficial effects:

(1) according to the method, based on the mobile PM2.5 concentration monitoring data, the difference of PM2.5 concentrations in different urban scenes is comprehensively considered on the basis of combining pollution source related factors and pollution diffusion related factors, the spatial heterogeneity of the PM2.5 concentrations and the nonlinear influence of meteorological and urban scene factors on the PM2.5 concentrations are considered, and the spatial resolution of PM2.5 concentration distribution simulation in cities is improved.

(2) According to the method, the interpretability of a PM2.5 concentration distribution model is considered, the influence degree of urban scenes in different time periods on the PM2.5 concentration can be quantitatively analyzed by combining part of dependency graphs, and support is provided for PM2.5 pollution fine treatment in different scenes, urban planning and prevention of PM2.5 pollution exposure risks of key groups such as old people and children in scenes such as hospitals and schools.

Drawings

FIG. 1 is a flow chart of PM2.5 concentration distribution simulation and scene analysis in accordance with a preferred embodiment of the present invention;

fig. 2 is a fine simulation distribution diagram of PM2.5 concentration according to a preferred embodiment of the present invention.

FIG. 3 is a response characteristic of PM2.5 concentration to road scene factors in a preferred embodiment of the present invention;

FIG. 4 is a response characteristic of PM2.5 concentration to industrial area scene factors in accordance with a preferred embodiment of the present invention;

FIG. 5 is response characteristics of PM2.5 concentrations to park genre service area scene factors in accordance with a preferred embodiment of the present invention;

FIG. 6 is a response characteristic of PM2.5 concentration to construction site situational factors in accordance with a preferred embodiment of the present invention;

FIG. 7 is a graph showing the response characteristics of PM2.5 concentration to scene factors of educational medical units in accordance with a preferred embodiment of the present invention;

FIG. 8 is a response characteristic of PM2.5 concentration to business circle scenario factors for a preferred embodiment of the present invention;

FIG. 9 is a response characteristic of PM2.5 concentration to residential scene factors in a preferred embodiment of the present invention;

fig. 10 is a schematic diagram for distinguishing line segments of various periods of response characteristics of PM2.5 concentration to scene factors according to the preferred embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application; as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

Referring to fig. 1 to 10, based on a model for simulating urban PM2.5 concentration distribution and analyzing scenes based on mobile monitoring data, based on mobile PM2.5 concentration monitoring data of different types of urban scenes, firstly, aiming at the problem of inconsistent monitoring time of the mobile PM2.5 monitoring data, performing space-time correction on the mobile PM2.5 monitoring data by using the urban atmospheric environment station PM2.5 monitoring data; then, constructing a GWR-based PM2.5 concentration spatial differentiation simulation model by using pollution source related data such as land utilization/coverage types, traffic networks, catering POIs and the like as independent variables and using a GWR method based on the spatial differentiation characteristics of PM2.5 concentration distribution; on the basis, the nonlinear response characteristics of PM2.5 concentration to meteorological factors and scene factors are fused, wind speed, temperature, humidity, wind direction indexes and scene factors are used as independent variables, the residual error of the result calculated by a PM2.5 concentration space diversity simulation model is used as a dependent variable, a GBDT method with strong explanatory power on the nonlinear action relationship is utilized, GWR simulation results are fused, a PM2.5 simulation and scene analysis model based on GWR-GBDT is constructed, and an hourly PM2.5 concentration distribution diagram is obtained; and finally, quantitatively analyzing the complex nonlinear relation between the scene factors and the PM2.5 concentration by combining a partial dependence graph, and exploring the response characteristics of the PM2.5 concentration to the scene factors.

The main content of the method comprises:

step S1, based on PM2.5 concentration mobile monitoring data space-time correction of a fixed monitoring station, constructing a PM2.5 concentration training data set with consistent space-time;

step S3, based on a gradient lifting tree method, combining pollution diffusion related factors and scene factors, further fitting the fitting residual error of the PM2.5 concentration space difference simulation model, and constructing a PM2.5 concentration simulation and scene analysis model;

The specific flow chart is shown in fig. 1, and the specific steps are introduced as follows: .

Step S1, moving PM2.5 concentration monitoring data space-time correction

In step S11, the mobile PM2.5 monitoring data is preprocessed. Abnormal values and missing values in the moving PM2.5 concentration monitoring data are processed. Step S12, study area meshing. And setting the sizes of the grid units in the vertical direction and the horizontal direction, respectively starting from the left side boundary and the lower side boundary of the research region, carrying out grid division on the research region to the right and upwards, and coding the grids. In step S13, the movement monitoring data is time-corrected. And correcting the PM2.5 concentration monitored in a moving mode based on the PM2.5 concentration variation trend of the fixed monitoring station closest to the moving monitoring position. In step S14, the concentration of PM2.5 in each grid is equalized in each period. And calculating the average value of the PM2.5 concentrations monitored in each grid in each time period, taking the average value as the PM2.5 concentration of the grid in the time period, and constructing a multi-time-period space-time consistent PM2.5 concentration training data set.

Step S2, establishing a PM2.5 concentration space differentiation simulation model

Step S21, preprocessing of pollution source related data. Buffer areas with different widths of 100m, 200 m, 300 m, 500m, 1000 m, 1500m and the like are respectively constructed based on the central points of the grids, land utilization/coverage data are reclassified, the area ratio of various types of utilization/coverage types in different buffer areas, the lengths of primary roads and secondary roads and the number of restaurant stores are calculated, and pollution source related factor data sets are constructed. And step S22, carrying out PM2.5 concentration distribution space autocorrelation test. And (4) checking whether the PM2.5 concentration distribution has spatial autocorrelation by using the global Moran index, and if the spatial autocorrelation exists, using a geographical weighted regression method. And step S23, screening pollution source related factors. And screening factors in the pollution source related factor data set by adopting a step-by-step method, and selecting the influence factors with high correlation with the PM2.5 concentration. Considering the practical significance of the variables, when the variables of the same type and different buffer zone widths are all obvious to the PM2.5 concentration, the variables with low correlation are deleted, and factor screening is carried out on the remaining variables again by adopting a step-by-step method until the influence factors of the same type do not exist, so that the optimal pollution source correlation influence factor combination is obtained. And step S24, constructing a PM2.5 concentration spatial differentiation simulation model. And modeling the spatial autocorrelation of the PM2.5 concentration by using a geographical weighted regression method, fusing the screened optimal pollution source related influence factor combination, and constructing a PM2.5 concentration spatial differentiation simulation model.

Step S3, constructing PM2.5 concentration simulation and scene analysis model

And step S31, preprocessing pollution diffusion factors and urban scene factors. Acquiring spatial distribution of wind speed, average temperature and humidity in a research area by adopting an empirical Krigin interpolation method based on meteorological station monitoring data, calculating a pollution source wind direction index of each grid based on a main city wind direction and the relative position of the grid and the nearest pollution source, calculating the area ratio of different scene types in each grid through superposition analysis, and constructing a pollution diffusion related factor and scene factor data set. And step S32, carrying out PM2.5 concentration simulation and scene analysis model construction. And (3) fusing pollution diffusion related factors and urban scene factors by adopting a gradient lifting tree method, further fitting residual errors of the PM2.5 concentration spatial differentiation simulation model, and constructing a PM2.5 concentration simulation and scene analysis model.

Step S4, response characteristic analysis of PM2.5 concentration to city scene factor

The urban scene influences the non-linear quantitative calculation of the PM2.5 concentration. And calculating the nonlinear influence of the urban scene on the PM2.5 concentration in different periods based on the PM2.5 concentration simulation and the scene analysis model combined with a partial dependency graph. And (4) visually displaying the nonlinear influence of the urban scenes in different time periods on the PM2.5 concentration, and analyzing the result by combining actual conditions.

In the above steps, the time-space correction of the moving PM2.5 monitoring data, the construction of the PM2.5 concentration spatial differentiation simulation model, the construction of the PM2.5 concentration simulation and scene analysis model, and the response characteristic analysis of the PM2.5 concentration to the urban scene factor are the key points of the present invention, and these four steps are discussed in detail in the following subsections.

Moving PM2.5 concentration monitoring data space-time correction

In step S11, the mobile PM2.5 monitoring data is preprocessed. The mobile PM2.5 concentration monitoring data are limited by objective conditions, partial data records are incomplete, and PM2.5 concentration monitoring values are abnormal, missing values and abnormal values are found by screening null values and drawing a line graph and a box graph, and the missing values and the abnormal values are deleted to ensure the integrity of the data.

Step S12, study meshing. Firstly, setting the sizes of grid units in the vertical direction and the horizontal direction, respectively starting from the left side boundary and the lower side boundary of a research area, carrying out grid division on the research area rightward and upward, coding grids according to the row and column numbers of the grids, and specifically calculating the reference formulas (1) to (3) according to the row and column numbers.

N＝count _lng *count _lat #(3)

Wherein maxlng and minng are the maximum and minimum longitude coordinates of the study area, maxlat and minlat are the maximum and minimum latitude coordinates of the study area, count _lng 、count _lat Is the total number of row and column numbers, d is the regular grid size, and N is the total number of grids.

In step S13, the movement monitoring data is time-corrected. The problem that the collection time of the moving PM2.5 monitoring data is inconsistent exists, the concentration of PM2.5 slightly fluctuates in a short time, the PM2.5 concentration distribution simulation is directly carried out by adopting the moving PM2.5 concentration monitoring data at different moments, and the simulation result has errors. The method is based on the basic assumption of the classical meteorological parameter movement measurement in the existing research, namely the general trend of PM2.5 concentration change in a certain range in a short time is assumed to be consistent, the time consistency correction is carried out on the moving PM2.5 monitoring data by utilizing the hourly PM2.5 concentration change trend of an atmospheric environment fixed monitoring station, the data are corrected to the same moment, and the moving PM2.5 monitoring data correction method is shown as a formula (4).

In the formula:

atmospheric ring at the l positionSituation monitoring station t ₃ A PM2.5 concentration observed value at the moment;

atmospheric environment monitoring station t for position l ₂ A PM2.5 concentration observed value at a moment; t is t ₁ Monitoring time for moving PM 2.5; t is t ₂ Is t ₁ First hour of previous hour, t ₃ Is t ₁ The first hour of the future.

In step S14, the concentration of PM2.5 in each grid is equalized in each period. And (3) based on the coordinate values (lng, lat) of the mobile PM2.5 concentration monitoring points, respectively serving as maxlng and maxlat to be substituted into the formulas (1) and (2), calculating grids to which the monitoring values belong, calculating the PM2.5 concentration value of each grid at different periods, serving as the PM2.5 concentration value of the grid at the period, and constructing a multi-period space-time consistent PM2.5 concentration training data set.

PM2.5 concentration space differentiation simulation model construction

The spatial autocorrelation relationship exists in the PM2.5 concentration distribution, and the urban PM2.5 concentration distribution can be better simulated by comprehensively considering the autocorrelation relationship between pollution source-related factors and the PM2.5 concentration. The invention mainly utilizes a geographical weighted regression method to construct a PM2.5 concentration spatial differentiation simulation model based on 3 types of land utilization/coverage type, road length and catering quantity and 12 indexes, and the specific operation steps are as follows:

step S21, preprocessing of pollution source related data. The land utilization/coverage data is classified into 9 types of cultivated land, high-density forest regions, low-density forest regions, high-density residential regions, low-density residential regions, water areas, urban green lands, dust-raising ground surfaces and other building regions according to research requirements by combining with high-resolution remote sensing images; extracting a main road and a secondary road from the road vector data according to the road grade; the catering data is crawled from an open data platform of a Baidu map, and POI data of three types of catering, namely Chinese restaurants, western restaurants and snack fast food restaurants, are selected according to the influence of the emission of catering sources on PM2.5 concentration in the existing research. And respectively constructing buffer areas of 100m, 200 m, 300 m, 500m, 1000 m and 1500m on the basis of the central points of the grids, calculating the area ratio of various types of utilization/coverage types in different buffer areas, the lengths of the primary roads and the secondary roads and the number of restaurant stores, and constructing a pollution source related factor data set.

And step S22, carrying out PM2.5 concentration distribution space autocorrelation test. The spatial autocorrelation of the PM2.5 concentration distribution often exists, and if the spatial autocorrelation exists, a geographical weighted regression method is suitable for being used, so that the global Moran index is adopted to verify the autocorrelation of the PM2.5 concentration spatial distribution, and the principle is shown in the formulas (5) to (9).

Wherein, w _i,j Is the spatial weight between the grids i and j, S ₀ Denotes the aggregation of all spatial weights, z _i And z _j The deviation of the PM2.5 concentration values from the global PM2.5 concentration mean value for grids I and j, respectively, is indicated, n denotes the total number of elements, and I denotes the global morland index. The global Moran index is mainly used for describing the average association degree of all space units with the surrounding area in the whole area, and the value of the global Moran index is between-1.0 and 1.0, and I>0 represents that the attribute values of all regions have positive correlation in space, that is, the larger (smaller) the attribute value is, the more easily the attribute values are gathered together, and I-0 represents that the regions are randomly distributed and have no spatial correlation; I.C. A<0 means that the attribute values of all regions have a negative correlation in space, i.e., the larger (small) the attribute values are less likely to be clustered together.

And step S23, screening pollution source related factors. The invention screens the related variables of the pollution source based on a stepwise regression method, and the basic idea is as follows: and (3) introducing results of land utilization/coverage type ratio, road length and catering quantity of different buffer areas into a regression model one by one, and introducing relevant variables when the P value of the maximum F value in the candidate variables is less than or equal to 0.05. When the originally introduced variable becomes no longer significant due to the introduction of the following variable, i.e., the P value of the minimum F value is greater than or equal to 0.1, it is eliminated. This process is repeated until neither significant variables are selected into the equation, nor insignificant independent variables are removed from the regression equation. And considering the practical significance of the variables, when the variables of the same type and different buffer area widths are all significant to the PM2.5 concentration, deleting the variables with low significance to obtain the optimal relevant factor combination, and participating in the construction of the GWR model.

And step S24, constructing a PM2.5 concentration space differentiation simulation model. The geospatial heterogeneity problem is considered in the geoweighted regression, and a geoweighted regression model is established based on the optimal correlation factor combination by comprehensively considering the influence of geospatial on dependent variables, and the principle is shown in formula (7).

y _i ＝β ₀ (U _i ,V _i )+β ₁ (U _i ,V _i )X _i1 +β ₂ (U _i ,V _i )X _i2 +…+β _p (U _i ,V _i )X _ip +ε _i ,i＝1,2,…,n#(7)

PM2.5 concentration simulation and scene analysis model construction

And step S31, preprocessing pollution diffusion factors and urban scene factors. The PM2.5 pollution diffusion related factors comprise average wind speed, temperature, humidity and wind direction indexes, hourly observation data of monitoring sites above the county level cannot represent the local meteorological condition conditions, and the spatial distribution conditions of the average wind speed, the temperature and the humidity are calculated by an empirical Krigin interpolation method. Because the wind direction has different influences on each area, the relative position of the pollution source and the fitting area needs to be combined for quantitative representation, and the influence of the wind direction on the PM2.5 concentration is represented by adopting a pollutant wind direction index, as shown in a formula (8):

therein, Wind _index The index is a wind direction index of the pollution source and represents the intensity influenced by the pollution source, theta represents the Euclidean direction from a road or a flying dust surface nearest to a monitoring point to a PM2.5 concentration monitoring station, and beta is the average wind direction of 2 minutes at the moment of a meteorological station nearest to the monitoring point. The value range of the wind direction index of the pollution source is 0-1, and when a PM2.5 concentration monitoring station is in the downwind direction of the nearest pollution source or in the pollution source, the wind direction index is 1; when the PM2.5 concentration monitoring station is in the upwind direction of the nearest pollution source, the wind direction index is 0.

The urban scene is divided into a plurality of types according to the land plot functions and geographic conditions such as human activity rules, social functions, earth surface coverage combination and the like, and the urban scene is divided into 8 types of scenes such as roads, industrial areas, residential areas, educational and medical units, park cultural and literature service areas, business circles, construction sites and other areas, and the specific definition is shown in table 1.

TABLE 1 City scene type partitioning Specification

And step S32, constructing a PM2.5 concentration simulation and scene analysis model. Meteorological factors and urban scene factors have nonlinear influence on PM2.5 concentration, the gradient lifting tree method is an improved integrated model based on a decision tree, a flexible and efficient machine learning algorithm is fused, and the nonlinear relation between independent variables and dependent variables can be well simulated. The core of the method is that the negative gradient of a loss function in a lifting tree algorithm is used as residual approximation, and the loss function is minimized by gradually reducing the residual value, and the method specifically comprises the following steps: extracting M samples from N data sets; calculating the residual error of each sample; thirdly, selecting an optimal division node from the m-dimensional features through a minimized loss function, and using residual errors as training data; fourthly, re-segmenting the samples according to the optimal partitioning node to obtain new leaf nodes and updating the model; and fifthly, iterating the step 2 to the step 4 until the mean square error is minimum. According to the method, meteorological and scene factors are used as independent variables, PM2.5 concentration residual errors calculated by a PM2.5 concentration space diversity simulation model are used as dependent variables, a GBDT model is solved by using a Bayesian optimization algorithm, a GBDT ten-fold verification average value is used as a target function, the posterior distribution of the target function is updated by continuously adding sample points, and finally the optimal hyper-parameter combination is obtained, so that the PM2.5 concentration simulation and scene analysis model is constructed.

Response characteristic analysis of PM2.5 concentration to urban scene factors

The invention analyzes the nonlinear influence of scene factors in a PM2.5 concentration simulation and scene analysis model on the PM2.5 concentration based on a partial dependency graph, changes the area ratio of the scene factors under the condition of controlling other variables to be unchanged, and calculates the comprehensive influence of the scene factor change on the PM2.5 concentration result, wherein the principle is shown as a formula (9):

in the formula: x is the number of _S As a scene variable, x _C Dividing x in PM2.5 concentration simulation and scene analysis model _S Other variables than the one or more of the above-mentioned variables,

for the trained PM2.5 concentration simulation and scene analysis model,

represents x _S The corresponding PM2.5 concentration variation under different values.

Through the method, the nonlinear influence of urban scenes on the PM2.5 concentration in different time periods is calculated, the line graph is adopted for visual display, the characteristic time period with strong PM2.5 concentration response is extracted, the reason why the PM2.5 concentration responds to urban scene factors is analyzed in combination with actual conditions, and suggestions can be provided for urban atmospheric pollution prevention and control.

The method is based on mobile monitoring data, integrates multi-source influence factors, and comprehensively considers the spatial heterogeneity of PM2.5 concentration distribution and the nonlinear relation of the influence factors. According to the method, the mobile monitoring data are adopted, the spatial heterogeneity of PM2.5 concentration distribution and the nonlinear influence of pollution diffusion and scene factors on the PM2.5 concentration are comprehensively considered, the geographic weighting and gradient lifting tree method is fused to construct the PM2.5 simulation and scene analysis model, and the spatial resolution of the urban PM2.5 concentration distribution simulation is improved. The invention realizes a method for quantitatively analyzing the nonlinear influence of the urban scene on the PM2.5 concentration. Based on a partial dependency graph method combining PM2.5 concentration simulation and a scene analysis model, the nonlinear influence degree of urban scenes in different time periods on PM2.5 concentration is quantitatively analyzed, time periods with stronger PM2.5 concentration response of each scene are extracted, and the influence reasons are analyzed by combining actual conditions, so that support can be provided for PM2.5 pollution fine treatment in different scenes, urban planning, and prevention of PM2.5 pollution exposure risks of important groups such as old people, children and the like in scenes such as hospitals and schools.

The invention adopts a plurality of mobile sensors to continuously and movably collect 10-hour PM2.5 concentration data aiming at the main urban area of a certain large city in China, and is used for analyzing and testing the effectiveness of the method in PM2.5 concentration spatial distribution simulation and urban scene analysis at different time intervals. The result of simulating PM2.5 concentration distribution according to the method of the invention is shown in FIG. 2, the spatial resolution reaches 100m × 100m, and the temporal resolution is 1 h. The result shows that the range of model fitting R2 in each period is 0.73-0.99, the model not only improves the accuracy of PM2.5 concentration space distribution simulation, but also has strong interpretability on the nonlinear relation between the scene factor and the PM2.5 concentration.

The PM2.5 concentration simulation and scene analysis model is combined with a partial dependency graph method, response characteristics of PM2.5 concentrations in different time periods to different urban scenes can be quantitatively mined, time periods with obvious influences on the PM2.5 concentrations in all scenes are extracted based on the method and visually represented and analyzed, and results are shown in FIGS. 3-9. The result shows that the PM2.5 concentration has time heterogeneity to the response of scene factors, and the human activity time-space difference characteristics and the intervention means to the PM2.5 have influence on the PM2.5 concentration distribution to a certain extent. Elderly people should avoid activities in the community during peak hours of work, reduce the risk of exposure to additional PM2.5 pollution, and be able to travel to parks for recreational activities. In urban planning and management, road grade dispersed traffic flow is reasonably planned, the increase of the local PM2.5 concentration in an urban caused by concentrated vehicle emission is reduced, and PM2.5 concentration pollution control of traffic intersections and urban main roads is enhanced; traffic dispersion in school areas at school time intervals of getting on and getting off is further strengthened, so that harm of PM2.5 pollution to children at the time intervals is reduced; scenes such as a small park can be planned in the industrial park, and measures such as watering and dust falling are adopted to reduce the PM2.5 concentration of the scenes.

According to the method, the time consistency correction is carried out on the mobile PM2.5 concentration monitoring data based on the atmospheric environment fixed station monitoring data, so that errors caused by inconsistent mobile monitoring data time are reduced. Based on mobile PM2.5 concentration monitoring data with consistent time, the spatial heterogeneity influence of urban pollution source factors on PM2.5 concentration distribution and the complex nonlinear relation between pollution diffusion factors and urban scenes and PM2.5 concentration are comprehensively considered, a geographic weighted regression method and a gradient lifting tree method are fused to construct a PM2.5 concentration simulation and scene analysis model based on the mobile monitoring data, and the spatial distribution of the urban PM2.5 concentration is well simulated. The method can better fit the nonlinear relation among meteorological factors, scene factors and PM2.5 concentration, effectively and finely simulate the spatial distribution difference of the PM2.5 concentration in the city, and improve the spatial resolution of the simulation of the PM2.5 concentration distribution in the city. The influence of urban scene factors on the PM2.5 concentration can be quantitatively analyzed by combining part of dependency graphs, and support is provided for PM2.5 pollution fine treatment and urban planning under different scenes and prevention of PM2.5 pollution exposure risk of high-risk groups such as old people and children.

Claims

1. The model for simulating urban PM2.5 concentration distribution and analyzing scenes based on mobile monitoring data is characterized by comprising the following steps:

2. The mobile monitoring data-based urban PM2.5 concentration distribution simulation and scene analysis model according to claim 1, wherein step S1 specifically comprises:

step S14, equalizing the concentration of PM2.5 in each grid in each time interval; and calculating the monitored PM2.5 concentration mean value in each grid in each time period to serve as the PM2.5 concentration of the grid in the time period, and constructing a multi-time-period space-time consistent PM2.5 concentration training data set.

3. The mobile monitoring data-based urban PM2.5 concentration distribution simulation and scene analysis model according to claim 1, wherein step S2 specifically comprises:

step S21, preprocessing the pollution source related data; buffer areas with different widths of 100m, 200 m, 300 m, 500m, 1000 m, 1500m and the like are respectively constructed based on the central points of the grids, land utilization/coverage data are reclassified, the area ratio of various types of utilization/coverage types in different buffer areas, the lengths of primary roads and secondary roads and the number of restaurant stores are calculated, and pollution source related factor data sets are constructed;

step S24, constructing a PM2.5 concentration space differentiation simulation model; and modeling the spatial autocorrelation of the PM2.5 concentration by using a geographical weighted regression method, fusing the screened optimal pollution source related influence factor combination, and constructing a PM2.5 concentration spatial differentiation simulation model.

4. The mobile monitoring data-based urban PM2.5 concentration distribution simulation and scene analysis model according to claim 1, wherein step S3 specifically comprises:

5. The mobile monitoring data-based urban PM2.5 concentration distribution simulation and scene analysis model according to claim 1, wherein step S4 specifically comprises: the influence of the urban scene on the PM2.5 concentration is subjected to nonlinear quantitative calculation; calculating the nonlinear influence of urban scenes on the PM2.5 concentration in different periods based on a partial dependency graph combining PM2.5 concentration simulation and a scene analysis model; and (4) visually displaying the nonlinear influence of the urban scenes in different time periods on the PM2.5 concentration, and analyzing the result by combining actual conditions.

6. The mobile monitoring data-based urban PM2.5 concentration distribution simulation and scenario interpretation model according to claim 2, wherein the specific row and column numbers of step S12 are calculated with reference to formula (1) to formula (3);

N＝count _lng *count _lat #(3)

wherein maxlng and minng are the maximum and minimum longitude coordinates of the study area, maxlat and minlat are the maximum and minimum latitude coordinates of the study area, count _lng 、count _lat Is the total number of row and column numbers, d is the regular gridSize, N is the total number of grids;

step S13 is specifically that the PM2.5 concentration change trend of the atmospheric environment fixed monitoring station per hour is utilized to carry out time consistency correction on the moving PM2.5 monitoring data, the moving PM2.5 monitoring data are corrected to the same moment, and the correction method of the moving PM2.5 monitoring data is shown as a formula (4);

in the formula:

7. The mobile monitoring data-based urban PM2.5 concentration distribution simulation and scenario analysis model of claim 3,

step S21 specifically includes: the land utilization/coverage data is classified into 9 types, namely cultivated land, high-density forest regions, low-density forest regions, high-density residential regions, low-density residential regions, water areas, urban green lands, dust-raising ground surfaces and other building regions according to research requirements by combining high-resolution remote sensing images; extracting a main road and a secondary road from the road vector data according to the road grade; the catering data is crawled from an open data platform, and POI data of three types of catering, namely a Chinese restaurant, a western restaurant and a snack fast food restaurant, are selected according to the influence of the emission of catering sources on PM2.5 concentration in the existing research; respectively constructing buffer areas of 100m, 200 m, 300 m, 500m, 1000 m and 1500m on the basis of the central points of the grids, calculating the area ratio of various types of utilization/coverage types in different buffer areas, the lengths of primary roads and secondary roads and the number of restaurant stores, and constructing a pollution source related factor data set;

wherein, w _i,j Is the spatial weight between the grids i and j, S ₀ Representing the aggregation of all spatial weights, z _i And z _j Respectively representing the deviation of the PM2.5 concentration values of grids I and j from the average value of the global PM2.5 concentration, wherein n represents the total number of elements, and I represents the global Moran index; the global Moran index is used for describing the average association degree of all the space units with the surrounding area in the whole area, and the value of the global Moran index is between-1.0 and 1.0, and I>0 means that the attribute values of all regions have positive correlation in space, i.e. the closer the attribute values are, the easier they are to be gathered together,i is 0, which represents that the regions are randomly distributed and have no spatial correlation; I.C. A<0 means that the attribute values of all regions have a negative correlation in space, i.e., the more distinct the attribute values are, the easier they are to be grouped together;

step S23 specifically includes: introducing results of land utilization/coverage type ratio, road length and catering quantity of different buffer areas into a regression model one by one, and introducing relevant variables when the P value of the maximum F value in the candidate variables is less than or equal to 0.05; when the originally introduced variable becomes no longer significant due to the introduction of the following variable, namely the P value of the minimum F value is greater than or equal to 0.1, the originally introduced variable is removed; the process is repeated until no significant variable is selected into the equation, and no insignificant independent variable is removed from the regression equation; when the variables of the same type and different buffer area widths are significant to the PM2.5 concentration, deleting the variables with low significance to obtain the optimal relevant factor combination, and participating in the construction of a GWR model;

y _i ＝β ₀ (U _i ,V _i )+β ₁ (U _i ,V _i )X _i1 +β ₂ (U _i ,V _i )X _i2 +…+β _p (U _i ,V _i )X _ip +ε _i ,i＝1,2,…,n#(7

wherein (U) _i ,V _i ) Is the position of observation point i, beta ₀ (U _i ,V _i )，β ₁ (U _i ,V _i )，……，β _p (U _i ,V _i ) Is a regression coefficient, X, at the ith geospatial location _ip Is the observed value of the p-th group at the i position.

8. The mobile monitoring data-based urban PM2.5 concentration distribution simulation and scenario analysis model of claim 4,

step S31 specifically includes: the PM2.5 pollution diffusion related factors comprise average wind speed, temperature, humidity and wind direction index, and the influence of wind direction on the PM2.5 concentration is expressed by adopting the pollutant wind direction index as shown in a formula (8):

therein, Wind _index The method is characterized in that the method is a pollution source wind direction index and represents the intensity influenced by a pollution source, theta represents the Euclidean direction from a road or a flying dust surface nearest to a monitoring point to a PM2.5 concentration monitoring station, and beta is the average wind direction of 2 minutes at the moment of a meteorological station nearest to the monitoring point; the value range of the wind direction index of the pollution source is 0-1, and when a PM2.5 concentration monitoring station is in the downwind direction of the nearest pollution source or in the pollution source, the wind direction index is 1; when the PM2.5 concentration monitoring station is in the upwind direction of the nearest pollution source, the wind direction index is 0.

9. The model for urban PM2.5 concentration distribution simulation and scene analysis based on mobile monitoring data according to claim 4, wherein step S32 specifically comprises the following steps:

step S321, extracting M samples from the N data sets;

step S322, calculating the residual error of each sample;

step S323, selecting an optimal division node from the m-dimensional features through a minimized loss function, and using a residual error as training data;

step S325, iterate steps 322 and 324 until the mean square error is minimum;

10. The mobile monitoring data-based urban PM2.5 concentration distribution simulation and scenario analysis model of claim 5, wherein:

step S4 specifically includes: based on the nonlinear influence of scene factors on the PM2.5 concentration in a partial dependence graph analysis PM2.5 concentration simulation and scene analysis model, the comprehensive influence of scene factor change on the PM2.5 concentration result is calculated by changing the area ratio of the scene factors under the condition of controlling other variables to be unchanged, and the principle is shown as a formula (9):

for the trained PM2.5 concentration simulation and scene analysis model,

denotes x _S The corresponding PM2.5 concentration change under different values.