CN115950832A - Method for inverting near-surface nitrogen dioxide concentration based on satellite data - Google Patents
Method for inverting near-surface nitrogen dioxide concentration based on satellite data Download PDFInfo
- Publication number
- CN115950832A CN115950832A CN202310008458.0A CN202310008458A CN115950832A CN 115950832 A CN115950832 A CN 115950832A CN 202310008458 A CN202310008458 A CN 202310008458A CN 115950832 A CN115950832 A CN 115950832A
- Authority
- CN
- China
- Prior art keywords
- data
- satellite
- nitrogen dioxide
- dioxide concentration
- ground
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- MGWGWNFMUOTEHG-UHFFFAOYSA-N 4-(3,5-dimethylphenyl)-1,3-thiazol-2-amine Chemical compound CC1=CC(C)=CC(C=2N=C(N)SC=2)=C1 MGWGWNFMUOTEHG-UHFFFAOYSA-N 0.000 title claims abstract description 129
- JCXJVPUVTGWSNB-UHFFFAOYSA-N nitrogen dioxide Inorganic materials O=[N]=O JCXJVPUVTGWSNB-UHFFFAOYSA-N 0.000 title claims abstract description 129
- 238000000034 method Methods 0.000 title claims abstract description 59
- 230000003595 spectral effect Effects 0.000 claims abstract description 45
- 238000005070 sampling Methods 0.000 claims abstract description 35
- 238000007637 random forest analysis Methods 0.000 claims abstract description 32
- 230000005855 radiation Effects 0.000 claims abstract description 30
- 238000007781 pre-processing Methods 0.000 claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 16
- 238000012216 screening Methods 0.000 claims abstract description 4
- 238000012544 monitoring process Methods 0.000 claims description 38
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 16
- 230000003203 everyday effect Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 8
- 238000005259 measurement Methods 0.000 abstract description 6
- 238000004364 calculation method Methods 0.000 abstract description 4
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 238000009826 distribution Methods 0.000 description 11
- 239000007789 gas Substances 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000010801 machine learning Methods 0.000 description 8
- 238000010521 absorption reaction Methods 0.000 description 5
- 239000003570 air Substances 0.000 description 5
- 239000003344 environmental pollutant Substances 0.000 description 5
- 231100000719 pollutant Toxicity 0.000 description 5
- 239000005436 troposphere Substances 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 4
- 238000013179 statistical model Methods 0.000 description 4
- 238000000862 absorption spectrum Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 2
- CBENFWSGALASAD-UHFFFAOYSA-N Ozone Chemical compound [O-][O+]=O CBENFWSGALASAD-UHFFFAOYSA-N 0.000 description 2
- 238000001069 Raman spectroscopy Methods 0.000 description 2
- 238000003916 acid precipitation Methods 0.000 description 2
- 239000012080 ambient air Substances 0.000 description 2
- 208000006673 asthma Diseases 0.000 description 2
- 239000008277 atmospheric particulate matter Substances 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 229930002875 chlorophyll Natural products 0.000 description 2
- 235000019804 chlorophyll Nutrition 0.000 description 2
- ATNHDLDRLWWWCB-AENOIHSZSA-M chlorophyll a Chemical compound C1([C@@H](C(=O)OC)C(=O)C2=C3C)=C2N2C3=CC(C(CC)=C3C)=[N+]4C3=CC3=C(C=C)C(C)=C5N3[Mg-2]42[N+]2=C1[C@@H](CCC(=O)OC\C=C(/C)CCC[C@H](C)CCC[C@H](C)CCCC(C)C)[C@H](C)C2=C5 ATNHDLDRLWWWCB-AENOIHSZSA-M 0.000 description 2
- 239000000356 contaminant Substances 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000001658 differential optical absorption spectrophotometry Methods 0.000 description 2
- 238000009792 diffusion process Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 201000005202 lung cancer Diseases 0.000 description 2
- 208000020816 lung neoplasm Diseases 0.000 description 2
- 239000013618 particulate matter Substances 0.000 description 2
- 239000002243 precursor Substances 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000002035 prolonged effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 208000023504 respiratory system disease Diseases 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 239000002028 Biomass Substances 0.000 description 1
- WSFSSNUMVMOOMR-UHFFFAOYSA-N Formaldehyde Chemical compound O=C WSFSSNUMVMOOMR-UHFFFAOYSA-N 0.000 description 1
- 241000607479 Yersinia pestis Species 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 239000000443 aerosol Substances 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 239000002803 fossil fuel Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 239000003595 mist Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Images
Landscapes
- Image Processing (AREA)
Abstract
The invention relates to the field of satellite remote sensing inversion, in particular to a method for inverting near-ground nitrogen dioxide concentration based on satellite data, which comprises the following steps: acquiring spectral radiation value data of a satellite Level1 channel 1 to a satellite Level 6, and preprocessing the spectral radiation value data to obtain satellite Level one data; acquiring and preprocessing ground monitored nitrogen dioxide concentration sampling data; acquiring meteorological data of a ground station; matching the satellite primary data, the preprocessed nitrogen dioxide concentration sampling data and meteorological data to form a new data set; establishing a random forest model and training by using a new data set; resolving a random forest model, calculating the weight of a regression tree, and screening out a channel where primary satellite data with a larger weight value of the regression tree is located; and establishing an inversion model, and inputting the screened channel data serving as input data into the inversion model to obtain the concentration of the near-ground nitrogen dioxide. The beneficial technical effects of the invention are as follows: the timeliness and the accuracy of near-surface nitrogen dioxide concentration measurement and calculation are improved.
Description
Technical Field
The invention relates to the field of satellite remote sensing inversion, in particular to a method for inverting near-ground nitrogen dioxide concentration based on satellite data.
Background
Nitrogen dioxide is an important trace gas in the atmosphere and forms atmospheric Particulate Matter (PM) and ozone (O) 3 ) And important precursor contaminants of acid rain. Prolonged exposure to high concentrations of nitrogen dioxide will increase the morbidity and mortality of respiratory diseases such as lung cancer, asthma, etc. Therefore, monitoring of near-surface nitrogen dioxide gas concentration is particularly important.
At present, the monitoring of near-ground nitrogen dioxide gas mainly depends on ground instrument monitoring, the method can acquire the more accurate concentration numerical conditions of near-ground or troposphere real-time nitrogen dioxide gas or other pollutants around a station, but can only acquire the concentration numerical conditions of a small range of nitrogen dioxide gas or other pollutants, satellite data monitoring can be carried out all-weather monitoring all day long, the monitoring range is wider, and the technical problem that the ground station monitoring nitrogen dioxide concentration range is small is solved.
However, because the mechanism of the near-surface nitrogen dioxide cause is complex, the influencing factors are numerous and change greatly, and the traditional linear model is not enough to fully explain the complex nonlinear and high-order interaction relationship between the nitrogen dioxide and the influencing factors, some algorithms need to be applied to express and explain the relationship between the nitrogen dioxide and the influencing factors. The machine learning method is the most popular algorithm at present, and various factors influencing the near-ground nitrogen dioxide concentration can be comprehensively utilized by the method to establish a model for non-linearly inverting the near-ground nitrogen dioxide concentration, so that the near-ground nitrogen dioxide concentration inverted based on the machine learning method is obtained.
At present, the most used input data of an inversion model based on a machine learning method is satellite Level 2-Level troposphere nitrogen dioxide column concentration data, which can be generally downloaded and obtained from a website 1-2 days after a satellite passes a border, but the delay of the input data can cause the delay of near-ground nitrogen dioxide concentration data output by the inversion model, and the near-ground nitrogen dioxide concentration distribution result cannot be obtained on the same day, so that the defect of time delay of near-ground nitrogen dioxide concentration data business operation is caused. Therefore, a satellite data inversion method capable of improving timeliness and accuracy of near-surface nitrogen dioxide concentration measurement needs to be researched.
For example, CN110389103A, published 2019, 10 and 29, an inversion method of the concentration of nitrogen dioxide at the bottom of atmosphere. The method comprises the following steps: based on an atmospheric radiation transmission model and a satellite sensor, a differential absorption spectrum algorithm is adopted to carry out differential processing on the solar spectrum and the earth observation radiation flux to obtain a differential absorption spectrum; calculating an atmospheric rotation Raman scattering cross section of the atmospheric radiation transmission model; obtaining a Ring effect differential pseudo-absorption cross section through a solar spectrum and an atmospheric rotation Raman scattering cross section; obtaining a nitrogen dioxide differential absorption cross section by the differential absorption spectrum, and obtaining the concentration of the whole nitrogen dioxide layer inclined column by utilizing a Ring effect differential pseudo absorption cross section; and subtracting the concentration of the inclined columns from the top layer to the bottom layer from the concentration of the whole layer of the nitrogen dioxide inclined columns to obtain the concentration of the nitrogen dioxide inclined columns at the bottom layer, thereby realizing accurate and rapid calculation of the concentration of the nitrogen dioxide at the bottom layer of the atmosphere. However, because the relationship between the satellite data and the near-ground nitrogen dioxide concentration data is not a simple linear relationship due to a plurality of factors influencing the relationship between the satellite data and the near-ground nitrogen dioxide concentration data, the simple mathematical statistic model in the technical scheme cannot fully calculate the relationship between the satellite data and the near-ground nitrogen dioxide concentration data, and the technical problem that the satellite data inversion of the near-ground nitrogen dioxide concentration is lack of timeliness is not solved.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: at present, for the technical problems of serious delay and poor accuracy in measurement and calculation of the concentration of the near-surface nitrogen dioxide, a method for inverting the concentration of the near-surface nitrogen dioxide based on satellite data is provided, timeliness of measurement and calculation of the concentration of the near-surface nitrogen dioxide is improved, and accuracy of an inversion result is further improved.
The technical scheme adopted by the invention is as follows: a method for inverting the concentration of near-surface nitrogen dioxide based on satellite data comprises the following steps:
acquiring spectral radiation value data of a satellite Level1 channel 1 to a satellite Level 6, and preprocessing the spectral radiation value data to obtain satellite Level one data;
acquiring ground monitored nitrogen dioxide concentration sampling data, and preprocessing the nitrogen dioxide concentration sampling data;
acquiring meteorological data of a ground station;
matching the satellite primary data, the preprocessed nitrogen dioxide concentration sampling data and meteorological data to form a new data set;
establishing a random forest model, and training a regression tree in the random forest model by using the new data set;
resolving a random forest model formed by the trained regression tree, calculating the weight of the regression tree, and screening out a channel where the satellite primary data with the larger weight value of the regression tree is located;
and establishing an inversion model, taking the spectral radiation value data in the screened channel as input data, and inputting the input data into the inversion model to obtain the concentration data of the near-surface nitrogen dioxide.
Preferably, the method of preprocessing the spectral radiation value data comprises:
dividing the spectral radiance value data into a plurality of pixel areas of 3 x 3, calculating the standard deviation sigma of each pixel area,
wherein i is the ith in the current pixel areaPixel, p 0.51 Is a spectral observation value of the pixel at a wave band of 0.51 mu m 0.51 The average value of 9 pixels in the current pixel area is obtained;
judging whether the standard deviation sigma is larger than 0.06, if so, determining that the pixel area is a cloud body, and if not, determining that the pixel area is not a cloud body;
and eliminating the pixel area which is judged as the cloud body in the spectral radiation value data.
Preferably, the method of preprocessing the spectral radiation value data further comprises:
calculating the normalized vegetation index NDVI of the pixel in the spectral radiance value data,
wherein ρ 0.86 Is a spectral observation value rho of the pixel at a wave band of 0.86 mu m 0.64 Is the spectral observation value of the pixel at the wave band of 0.64 mu m;
judging whether the normalized vegetation index NDVI is less than 0, if so, determining that the pixel is a water body, and if so, determining that the pixel is not the water body;
and eliminating the pixel area which is judged as the water body in the spectral radiation value data.
Preferably, the method for preprocessing the nitrogen dioxide concentration sampling data comprises the following steps:
and eliminating the negative value, zero value and missing value of the ground monitored nitrogen dioxide concentration sampling data.
Preferably, after the nitrogen dioxide concentration sampling data is preprocessed, the ground sites are reserved according to the preprocessing result, and the reserved ground sites are ground sites which have average concentration values of at least 20 hours every day, at least 27 balance average concentration values every month and at least 324 days every year.
Preferably, the method for matching the satellite primary data, the preprocessed nitrogen dioxide concentration sampling data and the meteorological data comprises the following steps:
calculating discrete point to interpolation point in satellite primary dataIs a distance h i ,
Wherein, (x, y) is longitude and latitude of the ground station, (x) i ,y i ) The longitude and latitude of the primary data of the corresponding satellite;
calculating a distance weight W i ,
Wherein, p is a distance weight parameter, and n is the number of discrete points in the satellite primary data;
according to inverse distance weight interpolation, according to distance weight W i And distributing to obtain a data result of matching the satellite primary data, the preprocessed nitrogen dioxide concentration sampling data and the meteorological data in space.
Preferably, the distance weight p has a value of 2.
Preferably, the method for matching the satellite primary data, the preprocessed nitrogen dioxide concentration sampling data and the meteorological data further comprises the following steps:
adding time weighting coefficients to satellite primary dataCalculating the proportion A of the transit time of the satellite,
wherein,is the time of the satellite crossing region>For ground monitoring when satellite passes through a regionMeasure the time and>the ground monitoring time at the next moment of the satellite transit area is obtained;
and obtaining a data result of the satellite primary data, the preprocessed nitrogen dioxide concentration sampling data and the meteorological data which are matched in time according to the proportion A of the satellite transit time.
Preferably, the method for establishing the random forest model comprises the following steps:
sampling a new training set containing p characteristic variables in a specified proportion by a sample-with-put-back method, and randomly generating k training sets theta 1 、θ 2 、…θ k ;
Randomly selecting a fixed number n of variables from the p characteristic variables as branch nodes of the classification tree to construct a regression tree, wherein n is less than p;
each training set generates a corresponding regression tree { H (X, θ) } 1 )}、{H(X,θ 2 )}、…{H(X,θ k ) And f, wherein X is a prediction variable, and k regression trees form a random forest model.
Preferably, the method for calculating the weight of the regression tree by using the random forest model comprises the following steps:
wherein, imp i Is a regression tree weight value of a predictive variable in a random forest model, v is a split node, S X The set of nodes split by the predictor variable X in a random forest composed of k regression trees, and Gain (X, v) is the information Gain of the kini of the predictor variable X at the split node v.
The beneficial technical effects of the invention comprise: by adopting a method for inverting the concentration of the near-ground nitrogen dioxide based on satellite data and by means of the characteristic of short time delay of satellite Level1 Level data acquisition, near-real-time satellite data is obtained for inverting the concentration of the near-ground nitrogen dioxide, the accuracy is ensured, the timeliness of an inversion result is greatly improved, and the timeliness of the inversion result is ensuredThe result is displayed on the same day, so that the method is more suitable for application of an actual platform; a random forest model is established, a channel where primary satellite data with high correlation with ground monitoring nitrogen dioxide concentration is located is screened out, spectral radiation value data in the screened channel are used as input data to invert the near-ground nitrogen dioxide concentration, and accuracy of an inversion result is further improved; calculating a distance weight W by means of an improved satellite transit orbit time geographical weighted regression method i And the proportion A of the transit time of the satellite, matching the primary data of the satellite, the preprocessed nitrogen dioxide concentration sampling data and the meteorological data into a new data set, and training a random forest model by using the new data set, thereby greatly reducing the influence of the diffusion of the polluted gas on the data result error.
Other features and advantages of the present invention will be disclosed in more detail in the following detailed description of the invention and the accompanying drawings.
Drawings
The invention is further described below with reference to the accompanying drawings:
fig. 1 is a flowchart of a method for inverting the near-surface nitrogen dioxide concentration based on satellite data according to an embodiment of the invention.
Detailed Description
The technical solutions of the embodiments of the present invention are explained and illustrated below with reference to the drawings of the embodiments of the present invention, but the following embodiments are only preferred embodiments of the present invention, and not all embodiments. Other embodiments obtained by persons skilled in the art without making creative efforts based on the embodiments in the implementation belong to the protection scope of the invention.
In the following description, the appearances of the indicating orientation or positional relationship such as the terms "inner", "outer", "upper", "lower", "left", "right", etc. are only for convenience in describing the embodiments and for simplicity in description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and are not to be construed as limiting the present invention.
Before explaining the technical solution of the present embodiment in detail, a background applied to the present embodiment will be described first.
Nitrogen dioxide is an important trace gas in the atmosphere and forms atmospheric Particulate Matter (PM) and ozone (O) 3 ) And important precursor contaminants of acid rain. Prolonged exposure to high concentrations of nitrogen dioxide will increase the morbidity and mortality of respiratory diseases such as lung cancer, asthma, etc. Atmospheric nitrogen dioxide is mainly derived from anthropogenic emissions, especially the consumption of fossil fuels in industrial and economic activities. In recent years, with rapid development of economy, energy consumption caused by increase of industrial production and automobiles is further increased sharply, and nitrogen dioxide pollution becomes an environmental problem which is increasingly concerned by the public and government departments. At present, nitrogen dioxide gas monitoring mainly depends on ground station monitoring, although the number of current air quality ground monitoring stations is considerable, the difference of station spacing ranges is large (1-300 km), monitoring stations of individual cities are relatively sparse, and the space representativeness of partial stations is limited. The method is limited by station standards, human resources and the like, a high-density full-coverage air quality ground monitoring network cannot be built in a short term, and the current ground sparse monitoring method cannot directly provide space-time distribution information of continuous nitrogen dioxide concentration in a large-range area space.
Compared with the prior art, the satellite remote sensing technology is not limited by site selection of ground monitoring sites, has wide coverage range and high space-time resolution, has incomparable space coverage of the ground monitoring sites, and provides a reliable technical means for researching the space-time distribution of atmospheric nitrogen dioxide in a large-range area. Based on the inversion products of satellite sensors such as OMI and the like, many scholars at home and abroad discuss the regional, national and global scales of the time-space distribution and the variation trend of the concentration of the atmospheric tropospheric nitrogen dioxide column in a long-time sequence. However, near-surface nitrogen dioxide concentrations are more relevant to anthropogenic emissions, human health, than tropospheric nitrogen dioxide column concentrations. The concentration is different from the concentration of the troposphere nitrogen dioxide column monitored by a satellite to a certain extent, and is easily influenced by adverse weather conditions such as cloud and mist and the like, and faults of sensing instruments and the like, so that data loss or errors occur. The "near ground" is defined as the height of the sampling port of the ambient air quality monitoring site or the monitoring beam from the ground, and the value is usually in the range of 3-15m from the ground according to "ambient air quality monitoring criteria" (trial).
In order to more reliably obtain the space-time distribution characteristics of the near-ground nitrogen dioxide concentration, domestic and foreign scholars research a series of near-ground nitrogen dioxide concentration estimation models based on satellite monitoring, mainly a physical mechanism model and an empirical statistical model. The physical mechanism model estimates the near-ground nitrogen dioxide concentration by depending on the atmospheric physical and chemical transmission mode and coupling the concentration of the nitrogen dioxide column observed by the satellite, and solves the technical problems of limited space representativeness, high cost, low efficiency and the like inherent in ground monitoring to a certain extent. However, the model structure of the method is complex, the estimation precision and the time resolution are easily influenced by hardware computing resource investment, pollutant emission lists, key parameter setting of the atmospheric physical and chemical reaction process and the like, and the method is still more limited in actual atmospheric pollution concentration space-time distribution simulation. The empirical statistical model is based on the correlation between the concentration of the satellite nitrogen dioxide column and the concentration of the near-ground nitrogen dioxide, and is used for carrying out inversion modeling by fusing meteorological auxiliary factors and the like. The empirical statistical model has the characteristics of wide data source, easiness in acquisition, flexibility in modeling method, wide applicability and the like, and becomes a common means for simulating and revealing urban-scale atmospheric pollution space-time distribution characteristics. Compared with the prediction and forecast of the single-point time dimension of the atmospheric pollutants, the method has the advantages that the spatial distribution of the concentration of the atmospheric pollutants is simulated and estimated by using the empirical statistical model under the sparse ground monitoring condition and combining satellite observation and the atmospheric pollution geographic driving element, and the technical problem of insufficient spatial extension information caused by sparse ground monitoring data can be effectively solved to a certain extent.
However, because the mechanism of the near-surface nitrogen dioxide cause is complex, the influencing factors are numerous and change greatly, and the traditional linear model is not enough to fully explain the complex nonlinear and high-order interaction relationship between the nitrogen dioxide and the influencing factors, some algorithms need to be applied to express and explain the relationship between the nitrogen dioxide and the influencing factors. The machine learning method is the most popular algorithm at present, and various factors influencing the near-ground nitrogen dioxide concentration can be comprehensively utilized by the method to establish a model for non-linearly inverting the near-ground nitrogen dioxide concentration, so that the near-ground nitrogen dioxide concentration inverted based on the machine learning method is obtained. The random forest model in machine learning has extremely strong fitting capability and a complex model structure, can be used for capturing nonlinear and nonparametric relations between nitrogen dioxide concentration and influencing elements, is high in model training speed, and can efficiently process a large data set. Meanwhile, different from other 'black box' machine learning algorithms, the random forest model can be used for predictive variable importance measurement evaluation, and the model has strong interpretability.
At present, the most used input data of an inversion model based on a machine learning method is satellite Level 2-Level troposphere nitrogen dioxide column concentration data, which can be generally downloaded and obtained from a website 1-2 days after a satellite passes a border, but the delay of the input data can cause the delay of near-ground nitrogen dioxide concentration data output by the inversion model, and the near-ground nitrogen dioxide concentration distribution result cannot be obtained on the same day, so that the defect of time delay of near-ground nitrogen dioxide concentration data business operation is caused. Therefore, it is necessary to research a satellite data inversion method capable of improving the timeliness and the accuracy of near-surface nitrogen dioxide concentration measurement.
Therefore, the embodiment of the present application provides a method for inverting the near-surface nitrogen dioxide concentration based on satellite data, please refer to fig. 1, which includes the following steps:
and A01) acquiring spectral radiation value data of the satellite Level1 channels 1 to 6, and preprocessing the spectral radiation value data to obtain satellite primary data.
The real-time data in the satellite Level1 Level data can be generally obtained four hours after the satellite passes, and the corresponding result can be issued after the satellite Level2 Level tropospheric nitrogen dioxide column concentration data is generally 1-2 days later, so that an obvious time difference exists between the two results. Therefore, by means of the characteristic of short time delay of satellite Level1 Level data acquisition, near-real-time satellite data are obtained to invert near-ground nitrogen dioxide concentration, accuracy is guaranteed, timeliness of an inversion result can be greatly improved, the result is displayed on the same day, and the method is more suitable for practical platform application.
Optionally, this embodiment downloads and acquires spectral radiation value data of Sentinel-5P satellite Level1 channels 1 to 6 carrying a tropimi sensor.
Wherein, sentinel-5P is a global atmospheric pollution monitoring satellite emitted by the European Space Agency (ESA) in 2017, 10 and 13. The satellite carries the tropoMI (Tropospheric Monitoring Instrument), and can effectively observe trace gas components including NO in the atmosphere of all parts of the world 2 、O 3 、SO 2 。HCHO、CH 4 And important indexes such as CO and the like closely related to human activities strengthen the observation of aerosol and cloud. The tropimi sensor is capable of processing satellite Level1 geo-location and radiation corrected spectral radiance value data of all spectral bands at the top of the atmosphere.
Among these, the reasons for selecting the channels 1 to 6 are two: firstly, when the TropoMI sensor utilizes a DOAS algorithm to invert the concentration of the total column of nitrogen dioxide, the data of ultraviolet and visible light wave bands (medium wave and long wave) of 0.3-0.57 mu m are utilized; secondly, the wave bands of the channels contain the detection parts of the cloud body and the water body pixels, the cloud body and the water body pixels can also have certain influence on the near-ground nitrogen dioxide concentration of the satellite inversion, and the part of pixels need to be removed, namely, the spectral radiation value data is preprocessed.
Further, the spectral radiance value data is divided into a plurality of pixel areas of 3 × 3, the standard deviation σ of each pixel area is calculated,
wherein i is the ith pixel in the current pixel region, and rho 0.51 Is a spectral observation value of the pixel at a wave band of 0.51 mu m 0.51 The average value of 9 pixels in the current pixel area is obtained;
judging whether the standard deviation sigma is larger than 0.06, if so, determining that the pixel area is a cloud body, and if not, determining that the pixel area is not a cloud body;
and eliminating the pixel area which is judged as the cloud body in the spectral radiation value data.
Compared with cloud pictures at the same time, the cloud picture elements can be accurately identified by the threshold value of 0.06, and the cloud picture elements can be removed by utilizing the threshold value.
Further, the method of preprocessing the spectral radiation value data further comprises:
calculating the normalized vegetation index NDVI of the pixel in the spectral radiance value data,
where ρ is 0.86 Is a spectral observation value rho of the pixel at a wave band of 0.86 mu m 0.64 Is the spectral observation value of the pixel at the wave band of 0.64 mu m;
judging whether the normalized vegetation index NDVI is less than 0, if so, determining that the pixel is a water body, and if so, determining that the pixel is not the water body;
and eliminating the pixel area which is judged as the water body in the spectral radiation value data.
The green wave band (0.52-0.66 μm) can well reflect a plurality of information characteristics of the water body. The 0.65-0.7 μm wavelength band in the red wavelength band (0.63-0.69 μm) has a good effect on monitoring chlorophyll, most phytoplankton clearly show a second absorption spectral band, and the absorption and reflection characteristics of water must be considered when remote sensing is carried out on water characteristics. The near-infrared band (0.76-0.9 μm) is used for measuring biomass and crop trend, determining water body contour, and has irreplaceable effects in distinguishing waterway boundary lines and crop distribution areas, growing trend, classification, crop estimation, disease and pest disaster monitoring and the like.
Normalized vegetation index (NDVI), i.e., quantifying vegetation by measuring the difference between near infrared (vegetation strongly reflected) and red light (vegetation absorbed), healthy vegetation (chlorophyll) reflects more near infrared and green light than other wavelengths. But it absorbs more of the red and blue light. The result of the NDVI formula yields a value between-1 and +1, and if the result is negative, the water body can be judged, and if the result is close to +1, the dense green leaves can be judged.
Step A02) acquiring the nitrogen dioxide concentration sampling data monitored on the ground, and preprocessing the nitrogen dioxide concentration sampling data.
Wherein, the ground monitoring nitrogen dioxide hourly concentration data come from a national city air quality real-time release platform of a Chinese environmental monitoring central station.
Further, the method for preprocessing the nitrogen dioxide concentration sampling data comprises the following steps: and eliminating the negative value, zero value and missing value of the ground monitored nitrogen dioxide concentration sampling data.
Further, after preprocessing the nitrogen dioxide concentration sampling data, reserving the ground sites according to the preprocessing result, wherein the reserved ground sites have average concentration values of at least 20 hours every day, at least 27 balance average concentration values every month and at least 324 days every year.
Step A03) acquiring meteorological data of the ground station.
Alternatively, the meteorological data may include air temperature, wind direction, wind speed, humidity and air pressure data, the meteorological data being collected from a chinese meteorological science data sharing service network.
And step A04) matching the satellite primary data, the preprocessed nitrogen dioxide concentration sampling data and the meteorological data into a new data set.
Further, calculating the distance h from the discrete point to the interpolation point in the primary data of the satellite i ,
Wherein, (x, y) is longitude and latitude of the ground station, (x) i ,y i ) The longitude and latitude of the primary data of the corresponding satellite; optionally, in this embodiment, primary satellite data is selected within a range of 0.05 ° in longitude and latitude around a ground station according to the spatial resolution of the tropimi satellite sensor;
calculating a distance weight W i ,
Wherein p is a distance weight parameter, and n is the number of discrete points in the satellite primary data;
according to inverse distance weight interpolation, according to distance weight W i And distributing to obtain a data result of matching the satellite primary data, the preprocessed nitrogen dioxide concentration sampling data and the meteorological data in space. Illustratively, for point meteorological data, inverse distance weighted interpolation is adopted to obtain spatial distribution raster data of various meteorological elements in the national region, and then each meteorological element in each time phase is extracted to a corresponding ground monitoring station, so that a data result of the meteorological data matched in space can be obtained.
Optionally, in this embodiment, the value of the distance weight p is selected to be 2.
Further, a time weight coefficient is added to the primary data of the satelliteCalculating the proportion A of the transit time of the satellite,
wherein,time for satellite cross-border area>Based on the ground monitoring time when the satellite passes a situation>The ground monitoring time at the next moment of the satellite transit area is obtained;
and distributing according to the proportion A of the satellite transit time to obtain a data result of matching the satellite primary data, the preprocessed nitrogen dioxide concentration sampling data and the meteorological data in time.
For a match in time, the time of transit of the satellite data is continuous, butThe concentration data of the ground monitoring station is discontinuous, and the near-ground nitrogen dioxide concentration is greatly influenced by time, so that a time weight coefficient needs to be added to the satellite primary data in time matchingBecause the satellite scanning detector transits the China area, each day is covered by 3-4 tracks, the time of each track scanning the different areas of the transit is different, and according to the characteristic, the distance weight W can be calculated according to the time of each track transiting the different areas of the transit by means of an improved satellite transit orbit time geographical weighted regression method i And the proportion A of the transit time of the satellite, matching the primary data of the satellite, the preprocessed nitrogen dioxide concentration sampling data and the meteorological data into a new data set, and training a random forest model by using the new data set, thereby greatly reducing the influence of the diffusion of the polluted gas on the data result error.
Step A05), establishing a random forest model, and training a regression tree in the random forest model by using the new data set.
Further, the method for establishing the random forest model comprises the following steps:
sampling a new training set containing p characteristic variables in a specified proportion by adopting a sample-back method, and randomly generating k training sets theta 1 、θ 2 、…θ k ;
Randomly selecting a fixed number n of variables from the p characteristic variables as branch nodes of the classification tree to construct a regression tree, wherein n is less than p;
each training set generates a corresponding regression tree { H (X, θ) } 1 )}、{H(X,θ 2 )}、…{H(X,θ k ) And f, wherein X is a prediction variable, and k regression trees form a random forest model.
And A06) resolving a random forest model formed by the trained regression tree, calculating the weight of the regression tree, and screening a channel where the satellite primary data with the larger weight value of the regression tree is located.
Further, the method for calculating the weight of the regression tree by using the random forest model comprises the following steps:
wherein, imp i Is the weight value of the regression tree of the prediction variable in the random forest model, v is the splitting node, S X The set of nodes split by the predictor variable X in a random forest composed of k regression trees, and Gain (X, v) is the information Gain of the kini of the predictor variable X at the split node v.
Given that certain correlation exists between troposphere nitrogen dioxide column concentration data and satellite primary spectral radiation value data, the correlation between radiation quantity data of different channels in the satellite primary data and near-ground nitrogen dioxide concentration data is good or bad, the channel where the satellite primary data with high correlation with ground monitoring nitrogen dioxide concentration is located is screened out by calculating regression tree weight through variable importance judgment of a random forest model, the near-ground nitrogen dioxide concentration is inverted by taking the spectral radiation value data in the screened channel as input data, and the accuracy of an inversion result is further improved.
And A07) establishing an inversion model, taking the spectral radiation value data in the screened channel as input data, and inputting the input data into the inversion model to obtain the concentration data of the near-ground nitrogen dioxide.
The inversion model is similar to the operation of inverting the concentration of the total column of nitrogen dioxide by using a DOAS algorithm in the related art, and the details of the inversion model are not described in the embodiment of the application.
While the invention has been described with reference to specific embodiments thereof, it will be understood by those skilled in the art that the invention is not limited thereto, and may be embodied in many different forms without departing from the spirit and scope of the invention as set forth in the following claims. Any modification which does not depart from the functional and structural principles of the present invention is intended to be included within the scope of the claims.
Claims (10)
1. A method for inverting the concentration of near-surface nitrogen dioxide based on satellite data is characterized by comprising the following steps:
acquiring spectral radiation value data of a satellite Level1 channel 1 to a satellite Level 6, and preprocessing the spectral radiation value data to obtain satellite Level one data;
acquiring ground monitored nitrogen dioxide concentration sampling data, and preprocessing the nitrogen dioxide concentration sampling data;
acquiring meteorological data of a ground station;
matching the satellite primary data, the preprocessed nitrogen dioxide concentration sampling data and meteorological data to form a new data set;
establishing a random forest model, and training a regression tree in the random forest model by using the new data set;
resolving a random forest model formed by the trained regression tree, calculating the weight of the regression tree, and screening out a channel where the satellite primary data with the larger weight value of the regression tree is located;
and establishing an inversion model, taking the spectral radiation value data in the screened channel as input data, and inputting the input data into the inversion model to obtain the concentration data of the near-surface nitrogen dioxide.
2. The method of claim 1, wherein the method comprises inverting the near-surface nitrogen dioxide concentration based on the satellite data,
the method for preprocessing the spectral radiance value data comprises the following steps:
dividing the spectral radiance value data into a plurality of pixel areas of 3 x 3, calculating the standard deviation sigma of each pixel area,
wherein i is the ith pixel in the current pixel region, and rho 0.51 Is a spectral observation value of the pixel at a wave band of 0.51 mu m 0.51 The average value of 9 pixels in the current pixel area is obtained;
judging whether the standard deviation sigma is larger than 0.06, if so, determining that the pixel area is a cloud body, and if not, determining that the pixel area is not a cloud body;
and eliminating the pixel area which is judged to be a cloud body in the spectral radiation value data.
3. The method for inverting the near-surface nitrogen dioxide concentration based on satellite data of claim 2,
the method of pre-processing spectral radiance value data further comprises:
calculating the normalized vegetation index NDVI of the pixel in the spectral radiance value data,
wherein ρ 0.86 Is the spectral observed value, rho, of the pixel at the wave band of 0.86 mu m 0.64 Is the spectral observation value of the pixel at the wave band of 0.64 mu m;
judging whether the normalized vegetation index NDVI is less than 0, if so, determining that the pixel is a water body, and if so, determining that the pixel is not the water body;
and eliminating the pixel area which is judged as the water body in the spectral radiation value data.
4. A method for satellite data based inversion of near-surface nitrogen dioxide concentration according to any one of claims 1 to 3,
the method for preprocessing the nitrogen dioxide concentration sampling data comprises the following steps:
and eliminating the negative value, zero value and missing value of the ground monitored nitrogen dioxide concentration sampling data.
5. The method for inverting near-surface nitrogen dioxide concentration based on satellite data of claim 4,
and after preprocessing the nitrogen dioxide concentration sampling data, reserving the ground stations according to the preprocessing result, wherein the reserved ground stations have average concentration values of at least 20 hours every day, at least 27 balance average concentration values every month and at least 324 days every year.
6. The method for inverting near-surface nitrogen dioxide concentration based on satellite data of claim 1,
the method for matching the satellite primary data, the preprocessed nitrogen dioxide concentration sampling data and the meteorological data comprises the following steps:
calculating the distance h from the discrete point to the interpolation point in the satellite primary data i ,
Wherein, (x, y) is longitude and latitude of the ground station, (x) i ,y i ) The longitude and latitude of the primary data of the corresponding satellite;
calculating a distance weight W i ,
Wherein, p is a distance weight parameter, and n is the number of discrete points in the satellite primary data;
according to inverse distance weight interpolation, according to distance weight W i And distributing to obtain a data result of the satellite primary data, the preprocessed nitrogen dioxide concentration sampling data and the meteorological data which are matched in space.
7. The method for satellite data-based inversion of near-surface nitrogen dioxide concentration of claim 6,
the distance weight p has a value of 2.
8. The method for inverting the near-ground nitrogen dioxide concentration based on satellite data as claimed in claim 6, wherein the method of matching the satellite primary data, the preprocessed nitrogen dioxide concentration sample data and the meteorological data further comprises:
adding time weighting coefficients to satellite primary dataCalculating the proportion A of the transit time of the satellite,
wherein,time for satellite cross-border area>For the ground monitoring time when a satellite crosses a situation area>The ground monitoring time at the next moment of the satellite transit area is obtained;
and distributing according to the proportion A of the satellite transit time to obtain a data result of matching the satellite primary data, the preprocessed nitrogen dioxide concentration sampling data and the meteorological data in time.
9. The method for inverting near-surface nitrogen dioxide concentration based on satellite data of claim 1,
the method for establishing the random forest model comprises the following steps:
sampling a new training set containing p characteristic variables in a specified proportion by adopting a sample-back method, and randomly generating k training sets theta 1 、θ 2 、…θ k ;
Randomly selecting a fixed number n of variables from the p characteristic variables as branch nodes of the classification tree to construct a regression tree, wherein n is less than p;
each training set generates a corresponding regression tree { H (X, theta) } 1 )}、{H(X,θ 2 )}、…{H(X,θ k ) And f, wherein X is a prediction variable, and k regression trees form a random forest model.
10. The method for inverting the near-surface nitrogen dioxide concentration based on satellite data of claim 9,
the method for calculating the weight of the regression tree by using the random forest model comprises the following steps:
wherein, imp i Is the weight value of the regression tree of the prediction variable in the random forest model, v is the splitting node, S X The set of nodes split by the predictor variable X in a random forest consisting of k regression trees, and Gain (X, v) is the Gain of kini information of the predictor variable X at the split node v.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310008458.0A CN115950832A (en) | 2023-01-04 | 2023-01-04 | Method for inverting near-surface nitrogen dioxide concentration based on satellite data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310008458.0A CN115950832A (en) | 2023-01-04 | 2023-01-04 | Method for inverting near-surface nitrogen dioxide concentration based on satellite data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115950832A true CN115950832A (en) | 2023-04-11 |
Family
ID=87296580
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310008458.0A Pending CN115950832A (en) | 2023-01-04 | 2023-01-04 | Method for inverting near-surface nitrogen dioxide concentration based on satellite data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115950832A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117952025A (en) * | 2024-03-27 | 2024-04-30 | 北京英视睿达科技股份有限公司 | Method, device, equipment and medium for monitoring atmospheric pollutants for satellite autonomous planning |
-
2023
- 2023-01-04 CN CN202310008458.0A patent/CN115950832A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117952025A (en) * | 2024-03-27 | 2024-04-30 | 北京英视睿达科技股份有限公司 | Method, device, equipment and medium for monitoring atmospheric pollutants for satellite autonomous planning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ghasempour et al. | Google Earth Engine based spatio-temporal analysis of air pollutants before and during the first wave COVID-19 outbreak over Turkey via remote sensing | |
Singh et al. | Sensors and systems for air quality assessment monitoring and management: A review | |
Matsushita et al. | Integrating remotely sensed data with an ecosystem model to estimate net primary productivity in East Asia | |
Nassar et al. | Inverse modeling of CO 2 sources and sinks using satellite observations of CO 2 from TES and surface flask measurements | |
CN111579504A (en) | Atmospheric pollution component vertical distribution inversion method based on optical remote sensing | |
CN116050612A (en) | Atmospheric chamber gas monitoring site location method and system based on multi-technology integration and storage medium | |
Wespes et al. | Ozone variability in the troposphere and the stratosphere from the first 6 years of IASI observations (2008–2013) | |
CN114005048A (en) | Multi-temporal data-based land cover change and thermal environment influence research method | |
CN113408111B (en) | Atmospheric precipitation inversion method and system, electronic equipment and storage medium | |
Scheibenreif et al. | Toward global estimation of ground-level no 2 pollution with deep learning and remote sensing | |
Lian et al. | Sensitivity to the sources of uncertainties in the modeling of atmospheric CO 2 concentration within and in the vicinity of Paris | |
Xu et al. | When remote sensing data meet ubiquitous urban data: Fine-grained air quality inference | |
Karion et al. | Background conditions for an urban greenhouse gas network in the Washington, DC, and Baltimore metropolitan region | |
CN115950832A (en) | Method for inverting near-surface nitrogen dioxide concentration based on satellite data | |
Callewaert et al. | Analysis of CO 2, CH 4, and CO surface and column concentrations observed at Réunion Island by assessing WRF-Chem simulations | |
Rivera Cárdenas et al. | Formaldehyde total column densities over Mexico City: comparison between multi-axis differential optical absorption spectroscopy and solar-absorption Fourier transform infrared measurements | |
He et al. | Seamless reconstruction and spatiotemporal analysis of satellite-based XCO2 incorporating temporal characteristics: A case study in China during 2015–2020 | |
Chen et al. | Remote sensing retrieval of aerosol types in China using geostationary satellite | |
He et al. | Insights into global visibility patterns: Spatiotemporal distributions revealed by satellite remote sensing | |
Nielsen | thesis | |
Clow et al. | The utility of simulated ocean chlorophyll observations: a case study with the Chlorophyll Observation Simulator Package (version 1) in CESMv2. 2 | |
Gupta et al. | An evaluation of long-term gridded datasets of total columnar ozone retrieved from MERRA-2 and AIRS over the Indian region | |
Muthukumar et al. | Multi-Pollutant Ground-level Air Pollution Prediction through Deep MeteoGCN-ConvLSTM | |
Tikle et al. | Geospatial Practices for Airpollution and Meteorological Monitoring, Prediction, and Forecasting | |
Lei et al. | Analysis of remote sensing monitoring of atmospheric ozone in Japan from 2010 to 2021 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |