CN116189796A - Machine learning-based satellite-borne short wave infrared CO 2 Column concentration estimation method - Google Patents
Machine learning-based satellite-borne short wave infrared CO 2 Column concentration estimation method Download PDFInfo
- Publication number
- CN116189796A CN116189796A CN202211594763.4A CN202211594763A CN116189796A CN 116189796 A CN116189796 A CN 116189796A CN 202211594763 A CN202211594763 A CN 202211594763A CN 116189796 A CN116189796 A CN 116189796A
- Authority
- CN
- China
- Prior art keywords
- data
- machine learning
- value
- column concentration
- column
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000010801 machine learning Methods 0.000 title claims abstract description 16
- 238000007637 random forest analysis Methods 0.000 claims abstract description 18
- 239000000443 aerosol Substances 0.000 claims abstract description 6
- 238000012216 screening Methods 0.000 claims abstract description 6
- 238000010219 correlation analysis Methods 0.000 claims abstract description 4
- CURLTUGMZLYLDI-UHFFFAOYSA-N Carbon dioxide Chemical compound O=C=O CURLTUGMZLYLDI-UHFFFAOYSA-N 0.000 claims description 58
- 229910002092 carbon dioxide Inorganic materials 0.000 claims description 29
- 238000003066 decision tree Methods 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 6
- 238000005192 partition Methods 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 5
- 238000012952 Resampling Methods 0.000 claims description 4
- 239000001569 carbon dioxide Substances 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims description 3
- 230000000717 retained effect Effects 0.000 claims description 3
- 238000010206 sensitivity analysis Methods 0.000 claims description 3
- 230000002159 abnormal effect Effects 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 4
- 238000002310 reflectometry Methods 0.000 abstract description 2
- 238000012876 topography Methods 0.000 abstract description 2
- 238000010521 absorption reaction Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000003203 everyday effect Effects 0.000 description 2
- 239000012535 impurity Substances 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000005855 radiation Effects 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Chemical compound O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 238000010835 comparative analysis Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000005431 greenhouse gas Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000010792 warming Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/30—Prediction of properties of chemical compounds, compositions or mixtures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Computational Biology (AREA)
- Medical Informatics (AREA)
- Crystallography & Structural Chemistry (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Investigating Or Analysing Materials By Optical Means (AREA)
Abstract
Description
技术领域Technical Field
本发明涉及大气卫星遥感预测技术领域,具体涉及基于机器学习的星载短波红外CO2柱浓度估算方法。The present invention relates to the technical field of atmospheric satellite remote sensing prediction, and in particular to a satellite-borne shortwave infrared CO2 column concentration estimation method based on machine learning.
背景技术Background Art
CO2是大气中主要的温室气体,对全球气候变化具有非常重要的影响。自工业时代以来,CO2浓度已增长至约30%,且保持着持续增长的趋势。所以监测CO2对于全球气候变暖的研究具有重要意义;因此准确掌握大气中的CO2含量及其变化,可为气候预测以及环境决策提供支持。 CO2 is the main greenhouse gas in the atmosphere and has a very important impact on global climate change. Since the industrial age, the concentration of CO2 has increased to about 30% and has maintained a trend of continuous growth. Therefore, monitoring CO2 is of great significance for the study of global warming; therefore, accurately grasping the CO2 content and its changes in the atmosphere can provide support for climate prediction and environmental decision-making.
传统的地基大气CO2探测方法虽然具有精度高、可靠性强的优点,但都是单点测量,缺乏对区域和全球大范围实时探测的能力,所以发展卫星观测CO2的方法和技术势在必行。短波红外波段对近地面CO2更敏感,因此更适合用于地面碳源汇动态变化的监测。Although traditional ground-based atmospheric CO 2 detection methods have the advantages of high accuracy and reliability, they are all single-point measurements and lack the ability to detect large-scale regional and global real-time detection. Therefore, it is imperative to develop satellite observation methods and technologies for CO 2. The short-wave infrared band is more sensitive to near-ground CO 2 , so it is more suitable for monitoring the dynamic changes of ground carbon sources and sinks.
目前国际上短波红外CO2观测数据多采用全物理反演算法,需要对整个光学路径进行模拟,辐射传输方程计算复杂且比较耗时。由于气溶胶、水汽和地表反射率对短波红外辐射过程影响复杂,现有的物理反演模型需要输入参数多且具有不确定性。At present, most short-wave infrared CO 2 observation data in the world use full physical inversion algorithms, which require simulation of the entire optical path, and the calculation of the radiation transfer equation is complex and time-consuming. Due to the complex influence of aerosols, water vapor and surface reflectivity on the short-wave infrared radiation process, the existing physical inversion model requires many input parameters and has uncertainty.
发明内容Summary of the invention
本发明的目的是,通过使用决策树、XGBoost、普通随机森林、极端随机森林和梯度提升回归模型分别对CO2柱浓度进行估算,然后对比分析找出估算精度最高的模型,对卫星遥感的大气CO2柱浓度进行估算,该方法具有预测的精度高、可解释性强的优点,并极大地提高了预测效率。The purpose of the present invention is to estimate the CO2 column concentration by using decision tree, XGBoost, ordinary random forest, extreme random forest and gradient boosting regression model respectively, and then compare and analyze to find the model with the highest estimation accuracy, and estimate the atmospheric CO2 column concentration from satellite remote sensing. This method has the advantages of high prediction accuracy and strong interpretability, and greatly improves the prediction efficiency.
为实现上述目的,本申请的技术方案为:基于机器学习的星载短波红外CO2柱浓度估算方法,包括:To achieve the above purpose, the technical solution of the present application is: a spaceborne shortwave infrared CO2 column concentration estimation method based on machine learning, comprising:
S1.获取OCO-2卫星波段数据,通过大气二氧化碳反演参数的敏感性分析对所述OCO-2卫星波段数据进行提取,得到9个weak_CO2波段以及6个O2波段数据;S1. Obtain OCO-2 satellite band data, extract the OCO-2 satellite band data through sensitivity analysis of atmospheric carbon dioxide inversion parameters, and obtain 9 weak_CO 2 bands and 6 O 2 band data;
S2.将9个weak_CO2波段以及6个O2波段数据、NDVI归一化植被指数、SR地表反射率数据、DEM高程地形数据、ERA5大气数据、AOD气溶胶数据、TCCON站观测数据进行特征筛选,按照重要性保留筛选的前31个特征;S2. The 9 weak_CO 2 bands and 6 O 2 bands data, NDVI normalized vegetation index, SR surface reflectance data, DEM elevation terrain data, ERA5 atmospheric data, AOD aerosol data, and TCCON station observation data were used for feature screening, and the top 31 features were retained according to importance;
S3.通过热图对筛选的前31个特征进行相关性分析,找出与CO2柱浓度相关性较强的特征和较弱的特征;S3. Perform correlation analysis on the first 31 features screened through heat maps to find out the features with strong and weak correlation with CO2 column concentration;
S4.将与CO2柱浓度相关性较强的特征和较弱的特征进行合并,作为输入的特征数据集,然后分别采用决策树、XGBoost、普通随机森林、极端随机森林和梯度提升回归模型对CO2平均柱浓度进行估算,通过对不同回归模型估算的决定系数R2、均方根误差RMSE、平均绝对误差MAE、平均相对误差MRE以及在误差允许范围内预测的精度进行对比分析,找出预测精度最高的模型为极端随机森林回归模型,使用极端随机森林回归模型对CO2柱平均浓度进行预测。S4. The features with strong and weak correlation with CO2 column concentration are merged as the input feature data set, and then the decision tree, XGBoost, ordinary random forest, extreme random forest and gradient boosting regression models are used to estimate the average column concentration of CO2 . By comparing and analyzing the determination coefficient R2 , root mean square error RMSE, mean absolute error MAE, mean relative error MRE estimated by different regression models and the prediction accuracy within the allowable error range, the model with the highest prediction accuracy is found to be the extreme random forest regression model, and the extreme random forest regression model is used to predict the average CO2 column concentration.
进一步的,所述OCO-2卫星波段数据包括经度lon、维度lat、太阳的天顶角和方位角、卫星的天顶角和方位角;所述ERA5大气数据包括温度、湿度、压强、风的U/V分量、降雨量、边界层高度(blh)、云底高(cbh)、云覆盖(tcc)、总降雨(tp)、风的垂直速度。Furthermore, the OCO-2 satellite band data includes longitude lon, latitude lat, zenith angle and azimuth of the sun, zenith angle and azimuth of the satellite; the ERA5 atmospheric data includes temperature, humidity, pressure, U/V component of wind, rainfall, boundary layer height (blh), cloud base height (cbh), cloud cover (tcc), total rainfall (tp), and vertical speed of wind.
进一步的,对OCO-2卫星波段数据进行提取前采用重采样方式确定提取范围,即根据目标区域的经纬度范围绘制网格,设置采样后的分辨率为0.5°×0.5°,通过每个网格的经纬度,得到每个网格中心点与原图像对应的每个像元中心点的欧式距离为:Furthermore, before extracting the OCO-2 satellite band data, the resampling method is used to determine the extraction range, that is, the grid is drawn according to the latitude and longitude range of the target area, and the resolution after sampling is set to 0.5°×0.5°. Through the longitude and latitude of each grid, the Euclidean distance between the center point of each grid and the center point of each pixel corresponding to the original image is obtained:
式中lonk为固定站点的经度、latk为固定站点的纬度、loni、lati分别为网格的经纬度。Where lon k is the longitude of the fixed site, lat k is the latitude of the fixed site, lon i and lat i are the longitude and latitude of the grid respectively.
进一步的,对9个weak_CO2波段以及6个O2波段数据中的异常值进行处理为:Furthermore, the outliers in the 9 weak_CO 2 bands and 6 O 2 bands are processed as follows:
式中σ为当天数据的标准差,即把±3σ以外的异常值全部剔除,并对每个站点每天多次测得的各波段数据取均值。Where σ is the standard deviation of the data for the day, that is, all outliers outside ±3σ are eliminated, and the average of the data of each band measured multiple times at each station every day is taken.
进一步的,决策树使用基尼指数来划分属性,假定当前样本集合X中第k类样本所占的比例为pk(k=1,2,3,…,y),则基尼值为:Furthermore, the decision tree uses the Gini index to divide attributes. Assuming that the proportion of the k-th class of samples in the current sample set X is p k (k = 1, 2, 3, ..., y), the Gini value is:
Gini(X)表明了在两个不同类型标签之间不一致性的随机抽样的可能性;基尼不纯度是指该样品被选择的概率乘上错误的概率。Gini(X)越小,则样本集合X的纯度越高;当一个节点中所有的样本都是一个类时,基尼不纯度为0。Gini(X) indicates the possibility of random sampling of inconsistency between two different types of labels; Gini impurity refers to the probability of the sample being selected multiplied by the probability of error. The smaller Gini(X), the higher the purity of the sample set X; when all samples in a node are of the same class, Gini impurity is 0.
假定离散属性a有v个可能的取值,若使用a对样本集合X进行分类,则会产生v个分支结点,记Xv为第v个分支结点包含样本集合X中所有在属性a上取值的样本;则属性a的基尼指数定义为:Assuming that the discrete attribute a has v possible values, if a is used to classify the sample set X, v branch nodes will be generated. Let Xv be the vth branch node containing all samples in the sample set X that have values on attribute a. Then the Gini index of attribute a is defined as:
基尼指数Gini(X,A)表示经过A=a分割后样本集X的不确定性;基尼指数越大,样本的不确定性就越大。The Gini index Gini(X, A) represents the uncertainty of the sample set X after the partition by A=a; the larger the Gini index, the greater the uncertainty of the sample.
进一步的,XGBoost中假设总共有K棵树,F表示树模型,则预测值表示为:Furthermore, XGBoost assumes that there are K trees in total, and F represents the tree model, then the predicted value It is expressed as:
式中xi为输入实例,表示第i个数据点的特征向量;K为CART树的数量;fk为表示第k棵CART树;Where xi is the input instance, representing the feature vector of the i-th data point; K is the number of CART trees; fk represents the k-th CART tree;
对应的目标函数L为:The corresponding objective function L is:
式中,l为损失函数,表示预测值与真实值之间的误差;yi为真实值;Ω为正则化函数,防止模型过拟合。Where l is the loss function, which represents the error between the predicted value and the true value; yi is the true value; Ω is the regularization function to prevent the model from overfitting.
进一步的,普通随机森林中,对于数据集的特征参数集X,建立模型h(X,θi),i=1,2,…,k,随机选择m个特征,使得每个叶节点选择最大信息增益的特征进行分裂;其中信息增益表示为:Furthermore, in ordinary random forests, for the feature parameter set X of the data set, a model h(X,θ i ), i=1,2,…,k is established, and m features are randomly selected so that each leaf node selects the feature with the maximum information gain for splitting; the information gain is expressed as:
式中i为回归值,pi表示对应值发生的概率,w为划分节点的个数,为第m个划分叶节点的权重值。Where i is the regression value, pi represents the probability of the corresponding value, and w is the number of partition nodes. is the weight value of the mth partition leaf node.
进一步的,极端随机森林中,假设个体学习器的泛化误差为Ei,则学习器的泛化误差加权值为:Furthermore, in extreme random forests, assuming that the generalization error of an individual learner is E i , the weighted value of the generalization error of the learner is:
假设个体学习器的分歧值为Ai,则学习器的加权分歧值为:Assuming that the divergence value of the individual learner is A i , the weighted divergence value of the learner is:
集成后的泛化误差表示为:The generalization error after integration is expressed as:
式中wi为权重,T为结构不同的决策树总数。Where wi is the weight and T is the total number of decision trees with different structures.
进一步的,梯度提升每次迭代得到的新学习器都是针对前一个学习器的残差进行拟合,最后将所有树的预测相加,从而完成预测任务;残差获取方式为:Furthermore, the new learner obtained in each iteration of gradient boosting is fitted to the residual of the previous learner, and finally the predictions of all trees are added together to complete the prediction task; the residual is obtained as follows:
rni=yi-fn-1(xi)r ni = yi -f n-1 ( xi )
式中,yi为第i个样本的实测值,fn-1(xi)为前一轮学习器的预测值;对残差记性拟合,得到一个拟合残差模型hn(x),更新回归树:In the formula, yi is the measured value of the ith sample, and fn -1 ( xi ) is the predicted value of the previous round of learner. The residual memory is fitted to obtain a fitted residual model hn (x), and the regression tree is updated:
fn(x)=fn-1(x)+hn(x)f n (x) = f n-1 (x) + hn (x)
进一步的,所述决定系数R2、均方根误差RMSE、平均绝对误差MAE、平均相对误差MRE获取方式为:Furthermore, the determination coefficient R 2 , root mean square error RMSE, mean absolute error MAE, and mean relative error MRE are obtained as follows:
式中:N为样本个数;fi为预测值;yi为真实值;为平均值。Where: N is the number of samples; fi is the predicted value; yi is the true value; is the average value.
本发明由于采用以上技术方案,能够取得如下的技术效果:本方法使用不同集成学习的方法通过卫星、植被、地表地形、大气、气溶胶数据等对大气CO2柱平均浓度进行预测,具有现实意义。相比于传统的物理反演方法,本方法考虑的特征充足、易修改、易解释、操作简单,并极大地提高了预测的效率。可以较好的预测CO2柱浓度,让集成学习模型预测不同卫星的CO2柱浓度结果可以更加精准,为环保部门的决策提供数据支持。Due to the adoption of the above technical scheme, the present invention can achieve the following technical effects: This method uses different integrated learning methods to predict the average concentration of atmospheric CO2 column through satellite, vegetation, surface topography, atmosphere, aerosol data, etc., which has practical significance. Compared with the traditional physical inversion method, this method considers sufficient features, is easy to modify, easy to explain, simple to operate, and greatly improves the prediction efficiency. The CO2 column concentration can be better predicted, so that the integrated learning model can predict the CO2 column concentration of different satellites more accurately, providing data support for the decision-making of the environmental protection department.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为OCO-2卫星观测光谱图,观测样本来自Tsukuba站点,(36.05N°,140.12E°),2019年1月1日;其中(a)为CO2的弱吸收波段,(b)为O2-A吸收波段;Figure 1 is the spectrum observed by the OCO-2 satellite. The observation sample is from the Tsukuba station (36.05N°, 140.12E°) on January 1, 2019. (a) is the weak absorption band of CO 2 , and (b) is the absorption band of O2-A.
图2为前31个特征重要性柱状图;Figure 2 is a bar chart of the importance of the first 31 features;
图3为CO2柱平均浓度卫星反演中各影响因子之间的相关性图;Figure 3 is a correlation diagram between various influencing factors in the satellite inversion of the average CO 2 column concentration;
图4为五种预测模型的训练结果图;Figure 4 is a diagram showing the training results of five prediction models;
图5为五种预测模型测试集预测CO2柱浓度与真实值的差值图;Figure 5 is a graph showing the difference between the predicted CO 2 column concentration and the true value of the test set of five prediction models;
图6为极端随机森林回归模型预测性能随自身参数的影响图;Figure 6 is a graph showing the influence of the prediction performance of the extreme random forest regression model on its own parameters;
图7为基于机器学习的星载短波红外CO2柱浓度估算方法流程图。Figure 7 is a flow chart of the spaceborne shortwave infrared CO 2 column concentration estimation method based on machine learning.
具体实施方式DETAILED DESCRIPTION
本发明的实施例是在以本发明技术方案为前提下进行实施的,给出了详细的实施方式和具体的操作过程,但本发明的保护范围不限于下述实施例。The embodiments of the present invention are implemented on the premise of the technical solution of the present invention, and detailed implementation methods and specific operation processes are given, but the protection scope of the present invention is not limited to the following embodiments.
实施例1Example 1
本实施例提供基于机器学习的星载短波红外CO2柱浓度估算方法,包括:This embodiment provides a spaceborne shortwave infrared CO 2 column concentration estimation method based on machine learning, including:
S1.获取OCO-2卫星波段数据,通过大气二氧化碳反演参数的敏感性分析对所述OCO-2卫星波段数据进行提取;由于CO2弱吸收波段受到水汽影响比较大,因此在强吸收波段(1.61μm)处选取对应的吸收通道,得到9个weak_CO2波段以及6个O2波段数据;S1. Obtain OCO-2 satellite band data, and extract the OCO-2 satellite band data through sensitivity analysis of atmospheric carbon dioxide inversion parameters; since the CO 2 weak absorption band is greatly affected by water vapor, the corresponding absorption channel is selected at the strong absorption band (1.61 μm) to obtain 9 weak_CO 2 bands and 6 O 2 bands data;
其中,所述OCO-2卫星波段数据包括经度lon、维度lat、太阳的天顶角和方位角、卫星的天顶角和方位角。The OCO-2 satellite band data includes longitude lon, latitude lat, the zenith angle and azimuth of the sun, and the zenith angle and azimuth of the satellite.
需要说明的是:对OCO-2卫星波段数据进行提取前采用重采样方式确定提取范围,即根据目标区域的经纬度范围绘制网格,设置采样后的分辨率为0.5°×0.5°,通过每个网格的经纬度,得到每个网格中心点与原图像对应的每个像元中心点的欧式距离为:It should be noted that before extracting the OCO-2 satellite band data, the extraction range is determined by resampling, that is, a grid is drawn according to the latitude and longitude range of the target area, and the resolution after sampling is set to 0.5°×0.5°. Through the longitude and latitude of each grid, the Euclidean distance between the center point of each grid and the center point of each pixel corresponding to the original image is obtained:
式中lonk为固定站点的经度、latk为固定站点的纬度、loni、lati分别为网格的经纬度。Where lon k is the longitude of the fixed site, lat k is the latitude of the fixed site, lon i and lat i are the longitude and latitude of the grid respectively.
优选的,对9个weak_CO2波段以及6个O2波段数据中的异常值进行处理为:Preferably, the outliers in the 9 weak_CO 2 bands and 6 O 2 bands are processed as follows:
式中σ为当天数据的标准差,即把±3σ以外的异常值全部剔除,并对每个站点每天多次测得的各波段数据取均值。Where σ is the standard deviation of the data for the day, that is, all outliers outside ±3σ are eliminated, and the average of the data of each band measured multiple times at each station every day is taken.
S2.将9个weak_CO2波段以及6个O2波段数据、NDVI归一化植被指数、SR地表反射率数据、DEM高程地形数据、ERA5大气数据、AOD气溶胶数据、TCCON站观测数据进行特征筛选,按照重要性保留筛选的前31个特征;S2. The 9 weak_CO 2 bands and 6 O 2 bands data, NDVI normalized vegetation index, SR surface reflectance data, DEM elevation terrain data, ERA5 atmospheric data, AOD aerosol data, and TCCON station observation data were used for feature screening, and the top 31 features were retained according to importance;
具体的,通过上述重采样方式进行特征筛选。Specifically, feature screening is performed through the above-mentioned resampling method.
S3.通过热图对筛选的前31个特征进行相关性分析,找出与CO2柱浓度相关性较强的特征和较弱的特征;S3. Perform correlation analysis on the first 31 features screened through heat maps to find out the features with strong and weak correlation with CO2 column concentration;
S4.将与CO2柱浓度相关性较强的特征和较弱的特征进行合并,输入至五种集成学习的回归模型中,分别输出所预测的CO2柱浓度;S4. Merge the features with strong correlation with CO2 column concentration and the features with weak correlation with CO2 column concentration, input them into five ensemble learning regression models, and output the predicted CO2 column concentration respectively;
具体的,极端随机森林中,假设个体学习器的泛化误差为Ei,则学习器的泛化误差加权值为:Specifically, in extreme random forests, assuming that the generalization error of an individual learner is E i , the weighted value of the generalization error of the learner is:
假设个体学习器的分歧值为Ai,则学习器的加权分歧值为:Assuming that the divergence value of the individual learner is A i , the weighted divergence value of the learner is:
集成后的泛化误差可表示为:The generalization error after integration can be expressed as:
式中wi为权重,T为结构不同的决策树总数。Where wi is the weight and T is the total number of decision trees with different structures.
使用这几种不同模型预测出的准确率,通过对比分析,极端随机森林回归模型的决定系数R2最高、误差最小、预测效果最好,明显优于其余模型的预测结果;四个评价指标相关数据如下表所示:Using these different models to predict the accuracy, through comparative analysis, the extreme random forest regression model has the highest coefficient of determination R2 , the smallest error, and the best prediction effect, which is significantly better than the prediction results of other models; the relevant data of the four evaluation indicators are shown in the following table:
表1四个评价指标相关数据Table 1 Related data of four evaluation indicators
本发明的实施例有较佳的实施性,并非是对本发明任何形式的限定。本发明实施例中描述的技术特征或技术特征的组合不应当被认为是孤立的,它们可以被互相组合从而达到更好的技术效果。本发明优选实施方式的范围也可以包括另外的实现,且者应被发明实施例所属技术领域的技术人员所理解。The embodiments of the present invention have better practicability and are not intended to limit the present invention in any form. The technical features or combinations of technical features described in the embodiments of the present invention should not be considered isolated, and they can be combined with each other to achieve better technical effects. The scope of the preferred embodiments of the present invention may also include other implementations, and should be understood by those skilled in the art of the invention embodiments.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211594763.4A CN116189796A (en) | 2022-12-13 | 2022-12-13 | Machine learning-based satellite-borne short wave infrared CO 2 Column concentration estimation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211594763.4A CN116189796A (en) | 2022-12-13 | 2022-12-13 | Machine learning-based satellite-borne short wave infrared CO 2 Column concentration estimation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116189796A true CN116189796A (en) | 2023-05-30 |
Family
ID=86451347
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211594763.4A Pending CN116189796A (en) | 2022-12-13 | 2022-12-13 | Machine learning-based satellite-borne short wave infrared CO 2 Column concentration estimation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116189796A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117455066A (en) * | 2023-11-13 | 2024-01-26 | 哈尔滨航天恒星数据系统科技有限公司 | Corn planting accurate fertilizer distribution method based on multi-strategy optimization random forest, electronic equipment and storage medium |
-
2022
- 2022-12-13 CN CN202211594763.4A patent/CN116189796A/en active Pending
Non-Patent Citations (1)
Title |
---|
李静波等: "基于机器学习的星载短波红外CO2柱浓度估算研究", 《中国环境科学》, 21 November 2022 (2022-11-21), pages 1 - 14 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117455066A (en) * | 2023-11-13 | 2024-01-26 | 哈尔滨航天恒星数据系统科技有限公司 | Corn planting accurate fertilizer distribution method based on multi-strategy optimization random forest, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kaba et al. | Estimation of daily global solar radiation using deep learning model | |
Li et al. | Predicting ground-level PM2. 5 concentrations in the Beijing-Tianjin-Hebei region: a hybrid remote sensing and machine learning approach | |
Fraser et al. | A method for detecting large-scale forest cover change using coarse spatial resolution imagery | |
Linares-Rodriguez et al. | An artificial neural network ensemble model for estimating global solar radiation from Meteosat satellite images | |
Sayeed et al. | A deep convolutional neural network model for improving WRF simulations | |
CN110427818B (en) | Deep learning satellite data cloud detection method supported by hyperspectral data | |
Radman et al. | S2MetNet: A novel dataset and deep learning benchmark for methane point source quantification using Sentinel-2 satellite imagery | |
Haq et al. | Snow and glacial feature identification using Hyperion dataset and machine learning algorithms | |
CN117075138B (en) | Remote sensing measurement and calculation method, system and medium for canopy height of 30-meter forest in area | |
CN115187441A (en) | Method and device for calculating solid carbon amount of grassland, storage medium and computer equipment | |
Liu et al. | Hyperspectral infrared sounder cloud detection using deep neural network model | |
CN119025927A (en) | Rapid water quality inversion method, device, equipment and storage medium | |
Lee et al. | New approach for snow cover detection through spectral pattern recognition with MODIS data | |
Riihimaki et al. | Improving prediction of surface solar irradiance variability by integrating observed cloud characteristics and machine learning | |
Pouliot et al. | Evaluation of annual forest disturbance monitoring using a static decision tree approach and 250 m MODIS data | |
Braghiere et al. | Characterization of the radiative impact of aerosols on CO 2 and energy fluxes in the Amazon deforestation arch using artificial neural networks | |
CN111191594A (en) | Cloud bottom height inversion method and system based on multi-source satellite data | |
CN116189796A (en) | Machine learning-based satellite-borne short wave infrared CO 2 Column concentration estimation method | |
Milstein et al. | Detail enhancement of AIRS/AMSU temperature and moisture profiles using a 3D deep neural network | |
CN118656650B (en) | Soil humidity inversion method, equipment, medium and product | |
Shichkin et al. | Comparison of artificial neural network, random forest and random perceptron forest for forecasting the spatial impurity distribution | |
Mogaraju | Machine learning assisted prediction of land surface temperature (LST) based on major air pollutants over the Annamayya District of India | |
Chen et al. | Remote sensing retrieval of aerosol types in China using geostationary satellite | |
CN116449460B (en) | Regional month precipitation prediction method and system based on convolution UNet and transfer learning | |
Putra et al. | Rainfall estimation using machine learning approaches with raingauge, radar, and satellite data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |