CN116502050A - Dynamic interpolation method and system for missing evapotranspiration observations at global flux sites - Google Patents
Dynamic interpolation method and system for missing evapotranspiration observations at global flux sites Download PDFInfo
- Publication number
- CN116502050A CN116502050A CN202310750877.1A CN202310750877A CN116502050A CN 116502050 A CN116502050 A CN 116502050A CN 202310750877 A CN202310750877 A CN 202310750877A CN 116502050 A CN116502050 A CN 116502050A
- Authority
- CN
- China
- Prior art keywords
- evapotranspiration
- data
- observation
- site
- interpolation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000004907 flux Effects 0.000 title claims abstract description 102
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000007637 random forest analysis Methods 0.000 claims abstract description 33
- 230000007812 deficiency Effects 0.000 claims abstract 5
- 230000005855 radiation Effects 0.000 claims description 22
- 238000007781 pre-processing Methods 0.000 claims description 10
- 238000010276 construction Methods 0.000 claims description 6
- 230000008020 evaporation Effects 0.000 claims 4
- 238000001704 evaporation Methods 0.000 claims 4
- 238000012217 deletion Methods 0.000 claims 3
- 230000037430 deletion Effects 0.000 claims 3
- 230000005068 transpiration Effects 0.000 claims 3
- 238000012163 sequencing technique Methods 0.000 claims 1
- 238000012545 processing Methods 0.000 abstract description 2
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000006735 deficit Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002262 irrigation Effects 0.000 description 1
- 238000003973 irrigation Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Complex Calculations (AREA)
Abstract
本发明属于通量观测数据处理技术领域,涉及全球通量站点蒸散发观测缺失的动态插补方法与系统。该方法包括:获取全球通量站点的观测数据与MODIS遥感数据;构建全球通量站点的蒸散发观测缺失动态插补数据库;逐站点获取可用变量及各可用变量的数量;可用变量重要性排序;确定包含各种可用变量数量的变量组合;将变量组合按插补精度排序;利用随机森林方法逐站点建立各个变量组合的蒸散发观测缺失动态插补模型;动态插补缺失时刻的蒸散发观测数据,不断更新插补率,直到插补率达到100%。本发明可实现通量观测蒸散发缺失的高精度动态插补,有助于提升通量观测数据的实用价值。
The invention belongs to the technical field of flux observation data processing, and relates to a dynamic interpolation method and system for missing evapotranspiration observations at global flux sites. The method includes: obtaining observational data of global flux stations and MODIS remote sensing data; constructing a dynamic interpolation database of missing evapotranspiration observations of global flux stations; obtaining available variables and the quantity of each available variable station by station; sorting the importance of available variables; Variable combinations of various available variable quantities; sort the variable combinations according to the interpolation accuracy; use the random forest method to establish a dynamic interpolation model for the missing evapotranspiration observations of each variable combination site by site; dynamically interpolate the evapotranspiration observation data at the missing time, and continuously Update the imputation rate until the imputation rate reaches 100%. The invention can realize the high-precision dynamic interpolation of flux observation evapotranspiration deficiency, and helps to improve the practical value of flux observation data.
Description
技术领域technical field
本发明涉及通量数据处理技术领域,具体而言,涉及全球通量站点蒸散发观测缺失的动态插补方法与系统。The invention relates to the technical field of flux data processing, in particular to a dynamic interpolation method and system for missing evapotranspiration observations at global flux sites.
背景技术Background technique
EC(Eddy covariance,简称EC)是基于微气象学方法和涡度相关技术对生态系统与大气间湍流通量进行实时测量的方法。长时序EC通量观测数据,对于水循环和能量收支分析、短期气象预报、长期气候预测、及农业灌溉管理等具有重要意义。由于仪器故障、系统故障、管理不善、或天气原因等外在干扰及数据质量控制等,观测得到的时间序列数据经常出现大量的缺失,而实现缺失数据的高质量插补对深入研究陆气间碳、水及能量通量至关重要。EC (Eddy covariance, referred to as EC) is a method for real-time measurement of turbulent flux between ecosystems and the atmosphere based on micrometeorology methods and eddy related technologies. Long-term EC flux observation data are of great significance for water cycle and energy budget analysis, short-term weather forecast, long-term climate forecast, and agricultural irrigation management. Due to external interference such as instrument failure, system failure, poor management, or weather and data quality control, etc., there are often a large number of missing time series data obtained from observations, and the realization of high-quality interpolation of missing data is crucial for in-depth research on land-atmosphere interpolation. Carbon, water and energy fluxes are critical.
现有的数据插补方法,通常以固定的数个变量来进行观测缺失插补。例如FLUXNET2015通量观测数据集官方所采用的通量观测空缺数据插补方法为边际分布采样(MDS)方法,该方法使用入射短波辐射、空气温度和饱和水汽压逆差三个变量来进行通量观测缺失数据的插补。Existing data imputation methods usually perform observation missing imputation with a fixed number of variables. For example, the flux observation vacancy data interpolation method officially adopted by the FLUXNET2015 flux observation data set is the marginal distribution sampling (MDS) method, which uses three variables of incident shortwave radiation, air temperature and saturated water vapor pressure deficit to perform flux observations Imputation of missing data.
MDS插补法的缺点在于,插补精度受限于三个变量的数量和质量,所插补的时间长度也同样受限,难以插补60天以上的时间缺口。其它插补方法也有相似的局限性,通量缺失数据的插补在很大程度上依赖于科研人员对变量的选择。由于所有站点使用有限的完全相同的变量,插补精度会受到选定的固定变量的限制,插补精度高的方法其插补率较低,使得插补精度和插补率难以达到理想的平衡。The disadvantage of the MDS interpolation method is that the accuracy of the interpolation is limited by the quantity and quality of the three variables, and the duration of the interpolation is also limited, making it difficult to interpolate a time gap of more than 60 days. Other imputation methods have similar limitations, and the imputation of flux missing data largely depends on the choice of variables by researchers. Since all stations use limited and identical variables, the imputation accuracy will be limited by the selected fixed variables. The method with high imputation accuracy has a low imputation rate, making it difficult to achieve an ideal balance between imputation accuracy and imputation rate. .
此外在实际中不同站点可获取变量的数目参差不齐,不能保证所有站点都具有插补方法所需要的输入变量,从而难以实现所有站点的缺失数据插补,使得部分站点观测得到的宝贵数据不能充分发挥其价值。In addition, in practice, the number of variables that can be obtained from different stations is uneven, and it cannot be guaranteed that all stations have the input variables required by the imputation method, so it is difficult to realize the imputation of missing data at all stations, so that the valuable data obtained from some stations cannot be obtained. Get the most out of it.
发明内容Contents of the invention
为了解决现有技术中存在的插补的时间长度受限于数据数量和质量、插补精度受限、插补率不足、插补精度和插补率难以达到平衡的问题,本发明提供全球通量站点蒸散发观测缺失的动态插补方法。In order to solve the problems existing in the prior art that the time length of interpolation is limited by the quantity and quality of data, the interpolation accuracy is limited, the interpolation rate is insufficient, and it is difficult to achieve a balance between interpolation accuracy and interpolation rate, the present invention provides a global flux A dynamic imputation method for missing evapotranspiration observations at a site.
第一方面,本发明提供了通量观测缺失数据动态插补方法,包括:In the first aspect, the present invention provides a dynamic interpolation method for flux observation missing data, including:
获取全球通量站点的观测数据与MODIS遥感数据;所述通量站点的观测数据包括气象观测数据与通量观测数据;Obtain the observation data and MODIS remote sensing data of the global flux site; the observation data of the flux site include meteorological observation data and flux observation data;
根据所述气象观测数据、所述通量观测数据以及所述MODIS遥感数据,构建所述全球通量站点的蒸散发观测缺失动态插补数据库;According to the meteorological observation data, the flux observation data and the MODIS remote sensing data, construct a dynamic interpolation database for missing evapotranspiration observations at the global flux site;
对所述蒸散发观测缺失动态插补数据库中的数据进行预处理,逐站点构建输入数据集;Preprocessing the data in the dynamic interpolation database for the lack of evapotranspiration observations, and constructing input data sets site by site;
基于所述输入数据集,逐站点获取各个所述通量站点的可用变量及各所述可用变量的数量;所述可用变量用于表征所述通量站点的观测数据与所述MODIS遥感数据;Based on the input data set, obtain the available variables of each of the flux sites and the quantity of each of the available variables site by site; the available variables are used to characterize the observation data of the flux sites and the MODIS remote sensing data;
根据所述可用变量,利用随机森林方法逐站点对各个所述可用变量进行重要性排序;According to the available variables, the random forest method is used to rank the importance of each of the available variables site by site;
根据各所述可用变量的数量与重要性排序的结果,确定包含各种所述可用变量数量的变量组合;According to the results of the quantity and importance ranking of each of the available variables, determine a variable combination that includes the quantity of each of the available variables;
计算所有所述变量组合的插补精度和插补率,将各个所述变量组合按插补精度从高到低进行排序;Calculating the interpolation accuracy and imputation rate of all the variable combinations, and sorting each of the variable combinations according to the interpolation accuracy from high to low;
基于包含各种所述可用变量数量的所述变量组合,利用随机森林方法逐站点建立各个所述变量组合的蒸散发观测缺失动态插补模型,得到蒸散发观测缺失动态插补模型的集合;所述蒸散发观测缺失动态插补模型用于表征蒸散发观测数据与所述变量组合的函数关系;Based on the variable combinations that include various available variable quantities, the random forest method is used to establish dynamic interpolation models for missing evapotranspiration observations site by site for each combination of variables, and a set of dynamic interpolation models for missing evapotranspiration observations is obtained; The missing dynamic interpolation model of evapotranspiration observation is used to characterize the functional relationship between evapotranspiration observation data and the combination of variables;
根据所述蒸散发观测数据缺失时刻的所述气象观测数据、所述MODIS遥感数据,依次插补缺失时刻的所述蒸散发观测数据,不断更新插补率,直到插补率达到100%,并对插补各个缺口所用的所述变量组合进行标注。According to the meteorological observation data and the MODIS remote sensing data at the time when the evapotranspiration observation data is missing, sequentially interpolate the evapotranspiration observation data at the missing time, and continuously update the interpolation rate until the interpolation rate reaches 100%, and Label the combination of variables used to impute each gap.
第二方面,本发明提供了全球通量站点蒸散发观测缺失的动态插补系统,包括第一获取单元、数据库构建单元、预处理单元、第二获取单元、第一排序单元、确定单元、第二排序单元、模型建立单元与插补更新单元:In the second aspect, the present invention provides a dynamic interpolation system for lack of evapotranspiration observations at global flux sites, including a first acquisition unit, a database construction unit, a preprocessing unit, a second acquisition unit, a first sorting unit, a determination unit, a second Sorting unit, model building unit and imputation update unit:
第一获取单元,用于获取全球通量站点的观测数据与MODIS遥感数据;所述全球通量站点的观测数据包括气象观测数据与通量观测数据;The first acquisition unit is used to acquire the observation data and MODIS remote sensing data of the global flux site; the observation data of the global flux site includes meteorological observation data and flux observation data;
数据库构建单元,用于根据所述气象观测数据、所述通量观测数据以及所述MODIS遥感数据,构建各个所述通量站点的蒸散发观测缺失动态插补数据库;A database construction unit, configured to construct a dynamic interpolation database for missing evapotranspiration observations at each flux site according to the meteorological observation data, the flux observation data, and the MODIS remote sensing data;
预处理单元,用于对所述蒸散发观测缺失动态插补数据库中的数据进行预处理,逐站点构建输入数据集;A preprocessing unit, configured to preprocess the data in the dynamic interpolation database for missing evapotranspiration observations, and construct input data sets site by site;
第二获取单元,用于基于所述输入数据集,逐站点获取各个所述通量站点的可用变量及各所述可用变量的数量;所述可用变量用于表征所述通量站点的观测数据与所述MODIS遥感数据;The second acquisition unit is configured to acquire the available variables of each of the flux sites and the quantity of each of the available variables site by site based on the input data set; the available variables are used to characterize the observation data of the flux sites with the MODIS remote sensing data;
第一排序单元,用于根据所述可用变量,利用随机森林方法逐站点对各个所述可用变量进行重要性排序;The first sorting unit is configured to sort the importance of each of the available variables site by site using a random forest method according to the available variables;
确定单元,用于根据各所述可用变量的数量与重要性排序的结果,确定包含各种所述可用变量数量的变量组合;A determination unit, configured to determine a variable combination including the quantity of various available variables according to the quantity and importance ranking of each available variable;
第二排序单元,用于计算所有所述变量组合的插补精度和插补率,将各个所述变量组合按插补精度从高到低进行排序;The second sorting unit is used to calculate the interpolation accuracy and imputation rate of all the variable combinations, and sort each of the variable combinations according to the interpolation accuracy from high to low;
模型建立单元,用于基于包含各种所述可用变量数量的所述变量组合,利用随机森林方法逐站点建立各个所述变量组合的蒸散发观测缺失动态插补模型,得到蒸散发观测缺失动态插补模型的集合;所述蒸散发观测缺失动态插补模型用于表征蒸散发观测数据与所述变量组合的函数关系;The model building unit is used to establish a dynamic interpolation model of missing evapotranspiration observations for each of the variable combinations site by site based on the variable combinations including various available variable quantities, and obtain a dynamic interpolation model for missing evapotranspiration observations. A collection of complementary models; the dynamic interpolation model for missing evapotranspiration observations is used to characterize the functional relationship between evapotranspiration observation data and the combination of variables;
插补更新单元,用于依次插补缺失时刻的所述气象观测数据、所述MODIS遥感数据,不断更新插补率,直到插补率达到100%,并对插补各个缺口所用的所述变量组合进行标注。The interpolation update unit is used to interpolate the meteorological observation data and the MODIS remote sensing data at the missing time in turn, and constantly update the interpolation rate until the interpolation rate reaches 100%, and the variables used to interpolate each gap combination to label.
本发明的有益效果是:本发明采用机器学习中的随机森林算法,可以基于不同站点的可用变量,获取变量组合的最大个数,基于不同变量组合训练出鲁棒性高、预测性强的随机森林模型,并基于测试集获得不同变量组合的插补精度,最后依随机森林模型结果精度,先排序后插补,实现高精度大范围插补。The beneficial effects of the present invention are: the present invention adopts the random forest algorithm in machine learning, can obtain the maximum number of variable combinations based on the available variables of different sites, and train a random forest algorithm with high robustness and strong predictability based on different variable combinations. Forest model, and based on the test set to obtain the imputation accuracy of different variable combinations, and finally according to the accuracy of the random forest model results, sort first and then interpolate to achieve high-precision and large-scale interpolation.
在上述技术方案的基础上,本发明还可以做如下改进。On the basis of the above technical solutions, the present invention can also be improved as follows.
进一步,所述气象观测数据包括观测日期、观测时刻、空气温度、风速、大气压强、空气湿度、入射短波辐射与净辐射;所述通量观测数据包括所述蒸散发观测数据;所述MODIS遥感数据包括归一化植被指数数据与叶面积指数数据。Further, the weather observation data includes observation date, observation time, air temperature, wind speed, atmospheric pressure, air humidity, incident short-wave radiation and net radiation; the flux observation data includes the evapotranspiration observation data; the MODIS remote sensing The data include normalized difference vegetation index data and leaf area index data.
进一步,对所述蒸散发观测缺失动态插补数据库中的数据进行预处理,逐站点构建输入数据集,包括:Further, the data in the dynamic interpolation database for missing evapotranspiration observations are preprocessed, and input data sets are constructed site by site, including:
获取观测日期和观测时刻,逐条计算所述通量站点的观测数据的日期在一年中的天数次序,并将观测时刻转换为所述通量站点的观测数据所在的半小时时刻在一天48个半小时时刻中的次序,得到时刻次序;将所述天数次序、所述时刻次序、所述气象观测数据与按照经纬度提取出来的所述通量站点对应的像元的MODIS数据结合,逐站点构建输入数据集。Obtain the observation date and observation time, calculate the order of the days of the observation data of the flux site in a year one by one, and convert the observation time to the half-hour time of the observation data of the flux site at 48 days a day The order in the half-hour time is obtained by the order of time; the order of days, the order of time, and the meteorological observation data are combined with the MODIS data of the pixel corresponding to the flux station extracted according to the latitude and longitude, and constructed site by site Input dataset.
进一步,基于包含各种所述可用变量数量的所述变量组合,利用随机森林方法逐站点建立各个所述变量组合的蒸散发观测缺失动态插补模型,包括:Further, based on the variable combinations containing various available variable quantities, the random forest method is used to establish a dynamic interpolation model for missing evapotranspiration observations of each variable combination site by site, including:
设代表所述输入数据集中的所述蒸散发观测数据,所述函数关系为/>,则:set up represents the evapotranspiration observation data in the input data set, and the functional relationship is /> ,but:
; ;
其中,代表所述通量站点的观测数据的日期在一年中的天数次序,/>代表所述通量站点的观测数据所在的半小时时刻在一天48个半小时时刻中的次序,/>代表所述输入数据集中蒸散发未出现缺失时刻的空气温度,/>代表所述输入数据集中蒸散发未出现缺失时刻的入射短波辐射,/>代表所述输入数据集中蒸散发未出现缺失时刻的净辐射,/>代表所述输入数据集中蒸散发未出现缺失时刻的风速,代表所述输入数据集中蒸散发未出现缺失时刻的归一化植被指数。in, The order of the days of the year representing the date of the observed data for the flux site, /> Represents the order of the half-hour time where the observation data of the flux site is located in the 48 half-hour time of the day, /> Represents the air temperature at the time when evapotranspiration does not appear missing in the input data set, /> represents the incident shortwave radiation at the time when evapotranspiration does not appear missing in the input data set, /> represents the net radiance at which no missing moments in evapotranspiration appear in the input dataset, /> Represents the wind speed at the time when evapotranspiration does not appear missing in the input data set, Represents the normalized difference vegetation index at the time when evapotranspiration does not appear missing in the input dataset.
进一步,根据所述蒸散发观测数据缺失时刻的所述气象观测数据、所述MODIS遥感数据,依次插补缺失时刻的所述蒸散发观测数据,包括:Further, according to the meteorological observation data and the MODIS remote sensing data at the time when the evapotranspiration observation data is missing, sequentially interpolate the evapotranspiration observation data at the missing time, including:
; ;
式中,代表所述蒸散发观测数据的插补结果,/>代表蒸散发缺失时刻对应的空气温度观测数据,/>代表蒸散发缺失时刻对应的入射短波辐射观测数据,代表蒸散发缺失时刻对应的净辐射观测数据,/>代表蒸散发缺失时刻对应的风速观测数据,/>代表蒸散发缺失时刻对应的归一化植被指数遥感数据。In the formula, represents the imputation result of the observed evapotranspiration data, /> Represents the air temperature observation data corresponding to the missing time of evapotranspiration, /> Represents the incident shortwave radiation observation data corresponding to the missing time of evapotranspiration, Represents the net radiation observation data corresponding to the missing time of evapotranspiration, /> Represents the wind speed observation data corresponding to the missing time of evapotranspiration, /> Represents the normalized difference vegetation index remote sensing data corresponding to the missing time of evapotranspiration.
进一步,相同所述可用变量数量的所述变量组合之间,若所述变量组合包含的所述可用变量不同,则利用随机森林方法进行插补后的数据插补率不同。Further, among the variable combinations with the same number of available variables, if the available variables included in the variable combinations are different, the data interpolation rates after interpolation using the random forest method are different.
进一步,根据所述蒸散发观测数据缺失时刻的所述气象观测数据、所述MODIS遥感数据,采用遍历插补与更新的方式,依次插补缺失时刻的所述蒸散发观测数据。Further, according to the meteorological observation data and the MODIS remote sensing data at the time when the evapotranspiration observation data is missing, the evapotranspiration observation data at the missing time are sequentially interpolated by means of ergodic interpolation and updating.
附图说明Description of drawings
图1为本发明实施例1提供的通量数据缺口动态插补方法的流程图;FIG. 1 is a flow chart of the dynamic interpolation method for flux data gaps provided by Embodiment 1 of the present invention;
图2为通量数据缺口动态插补的原理示意图;Figure 2 is a schematic diagram of the principle of dynamic interpolation of flux data gaps;
图3为本发明实施例2提供的通量数据缺口动态插补系统的原理图。Fig. 3 is a schematic diagram of the flux data gap dynamic interpolation system provided by Embodiment 2 of the present invention.
具体实施方式Detailed ways
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本发明实施例的组件可以以各种不同的配置来布置和设计。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. The components of the embodiments of the invention generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations.
实施例1Example 1
作为一个实施例,如附图1所示,为解决上述技术问题,本实施例提供全球通量站点蒸散发观测缺失的动态插补方法,包括:As an example, as shown in Figure 1, in order to solve the above-mentioned technical problems, this example provides a dynamic interpolation method for missing evapotranspiration observations at global flux sites, including:
获取全球通量站点的观测数据与MODIS遥感数据;通量站点的观测数据包括气象观测数据与通量观测数据;Obtain observation data and MODIS remote sensing data of global flux stations; observation data of flux stations include meteorological observation data and flux observation data;
根据气象观测数据、通量观测数据以及MODIS遥感数据,构建全球通量站点的蒸散发观测缺失动态插补数据库;Based on meteorological observation data, flux observation data and MODIS remote sensing data, construct a dynamic interpolation database for missing evapotranspiration observations at global flux sites;
对蒸散发观测缺失动态插补数据库中的数据进行预处理,逐站点构建输入数据集;Preprocess the data in the dynamic interpolation database for missing evapotranspiration observations, and construct input data sets site by site;
基于输入数据集,逐站点获取全球通量站点的可用变量及各可用变量的数量;可用变量用于表征通量站点的观测数据与MODIS遥感数据;Based on the input data set, the available variables of global flux stations and the quantity of each available variable are obtained station by station; the available variables are used to represent the observation data of flux stations and MODIS remote sensing data;
根据可用变量,利用随机森林方法逐站点对各个可用变量进行重要性排序;According to the available variables, the random forest method is used to rank the importance of each available variable site by site;
根据各可用变量的数量与重要性排序的结果,确定包含各种可用变量数量的变量组合;According to the number and importance ranking results of each available variable, determine the variable combination that includes the number of available variables;
计算所有变量组合的插补精度和插补率,将各个变量组合按插补精度从高到低进行排序;Calculate the imputation accuracy and imputation rate of all variable combinations, and sort each variable combination according to the imputation accuracy from high to low;
基于包含各种可用变量数量的变量组合,利用随机森林方法逐站点建立各个变量组合的蒸散发观测缺失动态插补模型,得到蒸散发观测缺失动态插补模型的集合;蒸散发观测缺失动态插补模型用于表征蒸散发观测数据与变量组合的函数关系;Based on variable combinations that contain various available variable quantities, the random forest method is used to establish dynamic interpolation models for missing ET observations site by site, and a set of dynamic interpolation models for missing ET observations is obtained; dynamic interpolation models for missing ET observations are obtained; The model is used to characterize the functional relationship between evapotranspiration observation data and variable combinations;
根据蒸散发观测数据缺失时刻的气象观测数据、MODIS遥感数据,依次插补缺失时刻的蒸散发观测数据,不断更新插补率,直到插补率达到100%,并对插补各个缺口所用的变量组合进行标注。According to the meteorological observation data and MODIS remote sensing data at the time when the evapotranspiration observation data is missing, the evapotranspiration observation data at the missing time are sequentially interpolated, and the interpolation rate is continuously updated until the interpolation rate reaches 100%. combination to label.
可选的,气象观测数据包括但不限于观测日期、观测时刻、空气温度、风速、大气压强、空气湿度、入射短波辐射与净辐射;通量观测数据包括但不限于蒸散发观测数据;MODIS遥感数据包括归一化植被指数数据与叶面积指数数据。Optionally, meteorological observation data include but not limited to observation date, observation time, air temperature, wind speed, atmospheric pressure, air humidity, incident shortwave radiation and net radiation; flux observation data include but not limited to evapotranspiration observation data; MODIS remote sensing The data include normalized difference vegetation index data and leaf area index data.
可选的,对蒸散发观测缺失动态插补数据库中的数据进行预处理,逐站点构建输入数据集,包括:Optionally, preprocess the data in the dynamic imputation database for missing evapotranspiration observations, and construct input data sets site by site, including:
获取观测日期和观测时刻,逐条计算通量站点的观测数据的日期在一年中的天数次序,并将观测时刻转换为通量站点的观测数据所在的半小时时刻在一天48个半小时时刻中的次序,得到时刻次序;将天数次序、时刻次序、气象观测数据与按照经纬度提取出来的通量站点对应的像元的MODIS数据结合,逐站点构建输入数据集。Obtain the observation date and observation time, calculate the order of the days of the observation data of the flux site one by one in a year, and convert the observation time to the half-hour time where the observation data of the flux site is located in the 48 half-hour time of the day The order of time is obtained; the order of days, time, and meteorological observation data are combined with the MODIS data of pixels corresponding to the flux stations extracted according to latitude and longitude, and the input data set is constructed station by station.
可选的,基于包含各种可用变量数量的变量组合,利用随机森林方法逐站点建立各个变量组合的蒸散发观测缺失动态插补模型,包括:Optionally, based on variable combinations that contain various available variable numbers, the random forest method is used to establish a dynamic imputation model for missing evapotranspiration observations of each variable combination site by site, including:
设代表输入数据集中的蒸散发观测数据,蒸散发观测数据与变量组合的函数关系为/>,则:set up Represents the evapotranspiration observation data in the input data set, and the functional relationship between the evapotranspiration observation data and the variable combination is /> ,but:
; ;
其中,代表通量站点的观测数据的日期在一年中的天数次序,/>代表通量站点的观测数据所在的半小时时刻在一天48个半小时时刻中的次序,/>代表输入数据集中未出现蒸散发观测数据缺失时刻的空气温度,/>代表输入数据集中未出现蒸散发观测数据缺失时刻的入射短波辐射,/>代表输入数据集中未出现蒸散发观测数据缺失时刻的净辐射,/>代表输入数据集中蒸散发未出现缺失时刻的风速,代表输入数据集中蒸散发未出现缺失时刻的归一化植被指数。in, Represents the order of the days of the year for the observation data of the flux site, /> Represents the order of the half-hour time where the observation data of the flux site is located in the 48 half-hour time of the day, /> Represents the air temperature at the moment when there is no missing evapotranspiration observation data in the input data set, /> Represents the incident short-wave radiation at the time when there is no missing evapotranspiration observation data in the input data set, /> Represents the net radiance at the time when no evapotranspiration observation data is missing in the input data set, /> Represents the wind speed at the time when evapotranspiration does not appear missing in the input data set, The normalized difference vegetation index representing the moment when evapotranspiration does not appear missing in the input dataset.
可选的,根据蒸散发观测数据缺失时刻的所述气象观测数据、所述MODIS遥感数据,依次插补缺失时刻的蒸散发观测数据,包括:Optionally, according to the meteorological observation data and the MODIS remote sensing data at the time when the evapotranspiration observation data is missing, sequentially interpolate the evapotranspiration observation data at the missing time, including:
; ;
式中,代表蒸散发观测数据插补结果,/>代表蒸散发缺失时刻对应的空气温度观测数据,/>代表蒸散发缺失时刻对应的入射短波辐射观测数据,/>代表蒸散发缺失时刻对应的净辐射观测数据,/>代表蒸散发缺失时刻对应的风速观测数据,/>代表蒸散发缺失时刻对应的归一化植被指数遥感数据。In the formula, represents the interpolation result of evapotranspiration observation data, /> Represents the air temperature observation data corresponding to the missing time of evapotranspiration, /> Represents the incident shortwave radiation observation data corresponding to the missing time of evapotranspiration, /> Represents the net radiation observation data corresponding to the missing time of evapotranspiration, /> Represents the wind speed observation data corresponding to the missing time of evapotranspiration, /> Represents the normalized difference vegetation index remote sensing data corresponding to the missing time of evapotranspiration.
可选的,相同可用变量数量的变量组合之间,若变量组合包含的可用变量不同,则利用随机森林方法进行插补后的数据插补率不同。Optionally, among variable combinations with the same number of available variables, if the variable combinations contain different available variables, the imputation rate of the data after imputation using the random forest method is different.
可选的,根据蒸散发观测数据缺失时刻的所述气象观测数据、所述MODIS遥感数据,依次插补缺失时刻的蒸散发观测数据采用遍历插补与更新的方式对缺失时刻的蒸散发观测数据进行插补。Optionally, according to the meteorological observation data and the MODIS remote sensing data at the time when the evapotranspiration observation data is missing, the evapotranspiration observation data at the missing time are sequentially interpolated, and the evapotranspiration observation data at the missing time are interpolated by traversal interpolation and updating. Perform imputation.
在实际应用过程中,如附图2所示,输入数据集如FLUXNET2015中各站点所含变量种类不同,增加了特征选择的复杂性,通过获取站点A、站点B、站点C和站点D各站点的最大可用变量N,在该数值基础上依次递减,构造N-3种变量组合(变量组合最少为3个)。In the actual application process, as shown in Figure 2, the input data set such as FLUXNET2015 contains different types of variables in each site, which increases the complexity of feature selection. By obtaining the site A, site B, site C and site D The maximum available variable N of , and decrease in sequence on the basis of this value, construct N-3 variable combinations (at least 3 variable combinations).
对于单个站点,不同数据缺口对应的变量类别不同,若选取其中的几个变量进行建模预测,只能预测部分满足变量条件的缺口,变量缺失的缺口则无法被预测,而选取多种具有不同数目的变量组合,可以满足预测不同缺口所需的变量情况,将站点数据的插补率大幅提升,实现数据集的完整重构。For a single site, different data gaps correspond to different variable categories. If several variables are selected for modeling and prediction, only gaps that partially meet the variable conditions can be predicted, and gaps with missing variables cannot be predicted. The number of variable combinations can meet the variable conditions required to predict different gaps, greatly increase the imputation rate of site data, and realize the complete reconstruction of the data set.
在相同变量个数的组合中,不同变量组合建模后的数据插补率又各不相同,为实现输入数据集插补率最大化,计算各个可用变量的数量的变量组合插补精度,选取插补精度最高的变量组合,分别训练随机森林模型,最大的变量个数为各站点的可用变量数目。In the combination of the same number of variables, the data interpolation rates of different variable combinations are different after modeling. In order to maximize the interpolation rate of the input data set and calculate the variable combination imputation accuracy of the number of available variables, select The variable combination with the highest interpolation accuracy was used to train the random forest model separately, and the maximum number of variables was the number of variables available at each site.
可选的,根据蒸散发观测数据缺失时刻的所述气象观测数据、MODIS遥感数据,采用遍历插补与更新的方式,依次插补缺失时刻的蒸散发观测数据。Optionally, according to the meteorological observation data and MODIS remote sensing data at the time when the evapotranspiration observation data is missing, the evapotranspiration observation data at the missing time are sequentially interpolated by means of traversal interpolation and updating.
如附图2所示,对于每个站点,构建随机森林模型的具体实施步骤如下:As shown in Figure 2, for each site, the specific implementation steps of building a random forest model are as follows:
确定该站点所有可用变量;/>为通量站点的观测数据所在天在一年中的天数次序/>,/>为通量站点的观测数据所在半小时时刻在一天48个半小时时刻中出现的半小时数次序/>,/>,/>;/>为出天数次序和半小时数次序意外的可用变量。Identify all available variables for this site ;/> It is the sequence of days in a year where the observation data of flux stations are located /> , /> It is the order of the half-hours in the 48 half-hours of a day where the observation data of the flux site is located at half-hours/> , /> , /> ;/> Available variants for day order and half hour order exceptions.
利用随机森林模型对除和/>以外的所有可用变量/>进行重要性排序,得到按重要性依次排序的变量序列:/>;Using a random forest model to divide and /> All available variables other than /> Perform importance sorting to get a sequence of variables sorted by importance: /> ;
根据重要性排序结果,在和/>基础上,按照该变量序列在变量组合中依次添加可用变量(每次添加/>中的最少一个变量,优选的,添加变量后拟构建的随机森林模型至少三个变量作为模型的输入),构建所有可能的变量组合(/>,/>为变量组合的个数,/>为变量的数量,/>即表示不包括天数次序和半小时数次序的变量的数量),使得所有的变量组合中用于构建随机森林模型的数据量达到总数据量的设定比例。Sort the results by importance, in and /> Based on this variable sequence, add the available variables in sequence in the variable combination (each adding /> At least one variable in the variable, preferably, the random forest model to be constructed after adding the variable has at least three variables as the input of the model), and constructs all possible variable combinations (/> , /> is the number of variable combinations, /> is the number of variables, /> That is, the number of variables that do not include the order of days and the order of half hours), so that the amount of data used to build the random forest model in all variable combinations reaches the set ratio of the total data amount.
; ;
; ;
; ;
……...
; ;
; ;
; ;
; ;
……...
; ;
; ;
; ;
。 .
对各种可用变量数量的变量组合分别构建随机森林模型如下:The random forest model is constructed separately for the variable combinations of various available variable numbers as follows:
; ;
; ;
; ;
; ;
……...
; ;
; ;
; ;
; ;
……...
; ;
; ;
; ;
。 .
式中,代表蒸散发观测数据,/>代表蒸散发观测数据与插补输入变量的函数关系。其中,/>是作为补充,以保证在没有除了天数次序/>与半小时次序/>以外的可用变量时,所有缺失的数据的插补率仍能够达到100%。可选的,利用未参与建立随机森林模型的袋外数据进行参数调优,计算袋外数据的插补率的均方根误差,当均方根误差小于设定值,参数调优完成。In the formula, represents evapotranspiration observation data, /> Represents the functional relationship between observed evapotranspiration data and imputation input variables. where, /> is as a supplement to guarantee the order in no addition to the number of days /> with half hour order /> When other variables are available, the imputation rate of all missing data can still reach 100%. Optionally, use the out-of-bag data that is not involved in the establishment of the random forest model for parameter tuning, and calculate the root mean square error of the imputation rate of the out-of-bag data. When the root mean square error is less than the set value, the parameter tuning is complete.
测试模型插补精度,获得m个变量组合的插补精度及各自的插补率/>。Test the model interpolation accuracy and obtain the interpolation accuracy of m variable combinations and their respective imputation rates/> .
对m个变量组合的插补精度进行排序,排序后的变量组合依次为对应的插补率依次为/>。Sort the imputation accuracy of m variable combinations, and the sorted variable combinations are The corresponding imputation rate is /> .
从最高精度所对应的变量组合开始,利用排序后的各个变量组合对应的蒸散发观测数据,得到蒸散发数据,利用各个蒸散发数据,逐个的对缺失时刻的蒸散发观测数据进行插补。From the combination of variables corresponding to the highest precision At the beginning, use the sorted variables to combine the corresponding evapotranspiration observation data to obtain evapotranspiration data, and use each evapotranspiration data to interpolate the evapotranspiration observation data at the missing time one by one.
对于插补精度相近的随机森林模型,选取插补率高的蒸散发数据作为蒸散发观测数据,若插补率相近,则选取变量组合中可用变量少的随机森林模型得到蒸散发数据;在动态插补的过程中,对于插补的每一条数据,标记插补空缺数据所使用的变量组合,并更新该通量站点当前的插补率,直至该通量站点的插补率达到100%,具体过程如下:For the random forest model with similar interpolation accuracy, the ET data with high interpolation rate is selected as the ET observation data. If the interpolation rate is similar, the random forest model with less available variables in the variable combination is selected to obtain the ET data; During the imputation process, for each piece of interpolated data, mark the combination of variables used to imput the vacant data, and update the current imputation rate of the flux station until the imputation rate of the flux station reaches 100%. The specific process is as follows:
,/>; , /> ;
,/>; , /> ;
……...
, ,
; ;
式中,代表第/>组插补数据,/>为每次插补的变量的数量,/>为站点数据总数,/>代表从高到低不同精度的蒸散发观测缺失动态插补模型,/>代表完成第/>组插补后最新的插补率。In the formula, On behalf of No. /> group imputation data, /> number of variables for each imputation, /> is the total number of site data, /> Represents the dynamic interpolation model of missing evapotranspiration observations with different precision from high to low, /> On behalf of the completion of the /> The latest imputation rate after group imputation.
若插补率达到100%,则插补结束;若插补率小于100%,则使用插补剩余空缺,直至插补率达到100%。If the interpolation rate reaches 100%, the interpolation ends; if the interpolation rate is less than 100%, use The remaining vacancies are imputed until the imputation rate reaches 100%.
与传统方式利用固定变量组合通过机器学习插补数据缺口的方法相比,本方法不局限于数据缺口时刻对应的变量种类及数量,可针对不同缺口选取不同种变量组合,在保证插补精度的基础上实现数据插补率的最大化。与FLUXNET2015数据集中所采用的边际分布采样(Marginal Distribution Sampling,缩写MDS)法相比,本发明采用机器学习中的随机森林算法,可以基于不同站点的可用变量,获取变量组合的最大个数,基于不同变量组合训练出鲁棒性高、预测性强的随机森林模型,并基于测试集获得不同变量组合的精度,最后依建模精度,先排序、后插补,实现高精度大范围插补。Compared with the traditional method of using fixed variable combinations to interpolate data gaps through machine learning, this method is not limited to the type and quantity of variables corresponding to the time of data gaps, and can select different combinations of variables for different gaps, while ensuring the accuracy of interpolation. Based on this, the data imputation rate is maximized. Compared with the Marginal Distribution Sampling (MDS) method used in the FLUXNET2015 data set, the present invention adopts the random forest algorithm in machine learning, which can obtain the maximum number of variable combinations based on the available variables of different sites, based on different The variable combination trains a random forest model with high robustness and strong predictability, and obtains the accuracy of different variable combinations based on the test set. Finally, according to the modeling accuracy, sort first and then interpolate to achieve high-precision and large-scale interpolation.
本发明依据不同站点确定最大变量个数,从而用于确定各种变量数目的变量组合;依据数据缺口插补率确定各组合的具体变量以及依据各变量组合的建模精度由高到低插补数据缺口,够实现动态数据插补,兼顾插补率和插补精度,使得插补率达到100%。The present invention determines the maximum number of variables according to different stations, so as to determine the variable combination of various variable numbers; determines the specific variables of each combination according to the data gap interpolation rate, and interpolates from high to low according to the modeling accuracy of each variable combination The data gap is enough to realize dynamic data interpolation, taking into account the interpolation rate and interpolation accuracy, so that the interpolation rate reaches 100%.
实施例2Example 2
基于与本发明的实施例1中所示的方法相同的原理,如附图3所示,本发明的实施例中还提供了全球通量站点蒸散发观测缺失的动态插补系统,包括第一获取单元、数据库构建单元、预处理单元、第二获取单元、第一排序单元、确定单元、第二排序单元、模型建立单元与插补更新单元:Based on the same principle as the method shown in Embodiment 1 of the present invention, as shown in Figure 3, the embodiment of the present invention also provides a dynamic interpolation system for missing evapotranspiration observations at global flux sites, including the first acquisition Unit, database construction unit, preprocessing unit, second acquisition unit, first sorting unit, determination unit, second sorting unit, model building unit and interpolation update unit:
第一获取单元,用于获取全球通量站点的观测数据与MODIS遥感数据;通量站点的观测数据包括气象观测数据与通量观测数据;The first acquisition unit is used to acquire the observation data and MODIS remote sensing data of the global flux site; the observation data of the flux site includes meteorological observation data and flux observation data;
数据库构建单元,用于根据气象观测数据、通量观测数据以及MODIS遥感数据,构建全球通量站点的蒸散发观测缺失动态插补数据库;The database construction unit is used to construct a dynamic interpolation database for missing evapotranspiration observations at global flux sites based on meteorological observation data, flux observation data and MODIS remote sensing data;
预处理单元,用于对蒸散发观测缺失动态插补数据库中的数据进行预处理,逐站点构建输入数据集;The preprocessing unit is used to preprocess the data in the dynamic interpolation database for missing evapotranspiration observations, and construct input data sets site by site;
第二获取单元,用于基于输入数据集,逐站点获取全球通量站点的可用变量及各可用变量的数量;可用变量包括通量站点的观测数据在一年中的天数次序和在一天中的半小时数次序;The second acquisition unit is used to obtain the available variables of global flux sites and the quantity of each available variable site by site based on the input data set; the available variables include the sequence of days in a year and half of a day in the observation data of flux sites order of hours;
第一排序单元,用于根据可用变量,利用随机森林方法逐站点对各个可用变量进行重要性排序;The first sorting unit is used to sort the importance of each available variable site by site using the random forest method according to the available variables;
确定单元,用于根据各可用变量的数量与重要性排序的结果,确定包含各种可用变量数量的变量组合;A determination unit is used to determine the variable combination including the quantity of various available variables according to the results of ranking the quantity and importance of each available variable;
第二排序单元,用于计算所有变量组合的插补精度和插补率,将各个变量组合按插补精度从高到低进行排序;The second sorting unit is used to calculate the interpolation accuracy and interpolation rate of all variable combinations, and sort each variable combination according to the interpolation accuracy from high to low;
模型建立单元,用于基于包含各种可用变量数量的变量组合,利用随机森林方法逐站点建立各个变量组合的蒸散发观测缺失动态插补模型,得到蒸散发观测缺失动态插补模型的集合;蒸散发观测缺失动态插补模型用于表征蒸散发观测数据与变量组合的函数关系;The model building unit is used to establish the dynamic interpolation model of evapotranspiration observation loss for each variable combination site by site based on the variable combination including various available variable quantities, and obtain a set of dynamic interpolation models for evapotranspiration observation loss; evapotranspiration The missing dynamic interpolation model is used to characterize the functional relationship between evapotranspiration observation data and variable combinations;
插补更新单元,用于根据蒸散发观测数据缺失时刻的气象观测数据、MODIS遥感数据,依次插补缺失时刻的蒸散发观测数据,不断更新插补率,直到插补率达到100%,并对插补各个缺口所用的变量组合进行标注。The interpolation update unit is used to interpolate the evapotranspiration observation data at the missing time according to the meteorological observation data and MODIS remote sensing data at the time when the evapotranspiration observation data is missing, and continuously update the interpolation rate until the interpolation rate reaches 100%. The combination of variables used to imput each gap is marked.
可选的,气象观测数据包括观测日期、观测时刻、空气温度、风速、大气压强、空气湿度、入射短波辐射与净辐射;通量观测数据包括蒸散发数据;MODIS遥感数据包括归一化植被指数数据与叶面积指数数据。Optionally, meteorological observation data include observation date, observation time, air temperature, wind speed, atmospheric pressure, air humidity, incident shortwave radiation and net radiation; flux observation data include evapotranspiration data; MODIS remote sensing data include normalized difference vegetation index Data with LAI data.
可选的,对蒸散发观测缺失动态插补数据库中的数据进行预处理,逐站点构建输入数据集,包括:Optionally, preprocess the data in the dynamic imputation database for missing evapotranspiration observations, and construct input data sets site by site, including:
获取观测日期和观测时刻,逐条计算通量站点的观测数据的日期在一年中的天数次序,并将观测时刻转换为通量站点的观测数据所在的半小时时刻在一天48个半小时时刻中的次序,得到时刻次序;将天数次序、时刻次序、气象观测数据与按照经纬度提取出来的通量站点对应的像元的MODIS数据结合,逐站点构建输入数据集。Obtain the observation date and observation time, calculate the order of the days of the observation data of the flux site one by one in a year, and convert the observation time to the half-hour time where the observation data of the flux site is located in the 48 half-hour time of the day The order of time is obtained; the order of days, time, and meteorological observation data are combined with the MODIS data of pixels corresponding to the flux stations extracted according to latitude and longitude, and the input data set is constructed station by station.
可选的,基于包含各种可用变量数量的变量组合,利用随机森林方法逐站点建立各个变量组合的蒸散发观测缺失动态插补模型,包括:Optionally, based on variable combinations that contain various available variable numbers, the random forest method is used to establish a dynamic imputation model for missing evapotranspiration observations of each variable combination site by site, including:
设代表输入数据集中的蒸散发观测数据,函数关系为/>,则:set up Represents the evapotranspiration observation data in the input data set, and the functional relationship is /> ,but:
; ;
其中,代表通量站点的观测数据的日期在一年中的天数次序,/>代表通量站点的观测数据所在的半小时时刻在一天48个半小时时刻中的次序,/>代表输入数据集中蒸散发未出现缺失时刻的空气温度,/>代表输入数据集中蒸散发未出现缺失时刻的入射短波辐射,/>代表输入数据集中蒸散发未出现缺失时刻的净辐射,/>代表输入数据集中蒸散发未出现缺失时刻的风速,/>代表输入数据集中蒸散发未出现缺失时刻的归一化植被指数。in, Represents the order of the days of the year for the observation data of the flux site, /> Represents the order of the half-hour time where the observation data of the flux site is located in the 48 half-hour time of the day, /> Represents the air temperature at which evapotranspiration does not appear missing in the input data set, /> Represents the incident shortwave radiation at the time when evapotranspiration does not appear missing in the input data set, /> Represents the net radiance at the time when evapotranspiration does not appear missing in the input data set, /> Represents the wind speed at the time when evapotranspiration does not appear missing in the input data set, /> The normalized difference vegetation index representing the moment when evapotranspiration does not appear missing in the input dataset.
可选的,根据蒸散发观测数据缺失时刻的所述气象观测数据、所述MODIS遥感数据,依次插补缺失时刻的蒸散发观测数据,包括:Optionally, according to the meteorological observation data and the MODIS remote sensing data at the time when the evapotranspiration observation data is missing, sequentially interpolate the evapotranspiration observation data at the missing time, including:
; ;
式中,代表蒸散发观测数据插补结果,/>代表蒸散发缺失时刻对应的空气温度观测数据,/>代表蒸散发缺失时刻对应的入射短波辐射观测数据,/>代表蒸散发缺失时刻对应的净辐射观测数据,/>代表蒸散发缺失时刻对应的风速观测数据,/>代表蒸散发缺失时刻对应的归一化植被指数遥感数据。In the formula, represents the interpolation result of evapotranspiration observation data, /> Represents the air temperature observation data corresponding to the missing time of evapotranspiration, /> Represents the incident shortwave radiation observation data corresponding to the missing time of evapotranspiration, /> Represents the net radiation observation data corresponding to the missing time of evapotranspiration, /> Represents the wind speed observation data corresponding to the missing time of evapotranspiration, /> Represents the normalized difference vegetation index remote sensing data corresponding to the missing time of evapotranspiration.
可选的,相同可用变量数量的变量组合之间,若变量组合包含的可用变量不同,则利用随机森林方法进行插补后的数据插补率不同。Optionally, among variable combinations with the same number of available variables, if the variable combinations contain different available variables, the imputation rate of the data after imputation using the random forest method is different.
可选的,根据蒸散发观测数据缺失时刻的所述气象观测数据、所述MODIS遥感数据,依次插补缺失时刻的蒸散发观测数据采用遍历插补与更新的方式对缺失时刻的蒸散发观测数据进行插补。Optionally, according to the meteorological observation data and the MODIS remote sensing data at the time when the evapotranspiration observation data is missing, the evapotranspiration observation data at the missing time are sequentially interpolated, and the evapotranspiration observation data at the missing time are interpolated by traversal interpolation and updating. Perform imputation.
以上仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310750877.1A CN116502050B (en) | 2023-06-25 | 2023-06-25 | Dynamic interpolation method and system for global flux site evapotranspiration observation loss |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310750877.1A CN116502050B (en) | 2023-06-25 | 2023-06-25 | Dynamic interpolation method and system for global flux site evapotranspiration observation loss |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116502050A true CN116502050A (en) | 2023-07-28 |
CN116502050B CN116502050B (en) | 2023-09-15 |
Family
ID=87318696
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310750877.1A Active CN116502050B (en) | 2023-06-25 | 2023-06-25 | Dynamic interpolation method and system for global flux site evapotranspiration observation loss |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116502050B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117609706A (en) * | 2023-10-20 | 2024-02-27 | 北京师范大学 | Method for interpolating data of carbon water flux |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107577649A (en) * | 2017-09-26 | 2018-01-12 | 广州供电局有限公司 | The interpolation processing method and device of missing data |
CN108984792A (en) * | 2018-08-02 | 2018-12-11 | 中国科学院地理科学与资源研究所 | Utilize the method for the eddy flux observation data of the not political reform interpolation ground ALPHA missing |
CN109840260A (en) * | 2019-02-02 | 2019-06-04 | 中国水利水电科学研究院 | A kind of extensive real-time rainfall automatic Observation station ranked data processing method based on dynamic interpolation |
CN112991247A (en) * | 2021-03-04 | 2021-06-18 | 河南省气象科学研究所 | Winter wheat evapotranspiration remote sensing inversion and crop model assimilation method |
CN115423163A (en) * | 2022-08-24 | 2022-12-02 | 中国地质大学(武汉) | Method and device for predicting short-term flood events of drainage basin and terminal equipment |
-
2023
- 2023-06-25 CN CN202310750877.1A patent/CN116502050B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107577649A (en) * | 2017-09-26 | 2018-01-12 | 广州供电局有限公司 | The interpolation processing method and device of missing data |
CN108984792A (en) * | 2018-08-02 | 2018-12-11 | 中国科学院地理科学与资源研究所 | Utilize the method for the eddy flux observation data of the not political reform interpolation ground ALPHA missing |
CN109840260A (en) * | 2019-02-02 | 2019-06-04 | 中国水利水电科学研究院 | A kind of extensive real-time rainfall automatic Observation station ranked data processing method based on dynamic interpolation |
CN112991247A (en) * | 2021-03-04 | 2021-06-18 | 河南省气象科学研究所 | Winter wheat evapotranspiration remote sensing inversion and crop model assimilation method |
CN115423163A (en) * | 2022-08-24 | 2022-12-02 | 中国地质大学(武汉) | Method and device for predicting short-term flood events of drainage basin and terminal equipment |
Non-Patent Citations (4)
Title |
---|
MENG LIU 等: "Global Land Surface Evapotranspiration Estimation From Meteorological and Satellite Data Using the Support Vector Machine and Semiempirical Algorithm", IEEE * |
刘?;何祺胜;荆琛琳;李金阳;陈丽;: "基于机器学习的蒸散量插补方法", 河海大学学报(自然科学版), no. 02 * |
刘萌 等: "数据驱动的蒸散发遥感反演方法及产品研究进展", 《遥感雪豹》 * |
白洁;刘绍民;丁晓萍;卢俐;: "大孔径闪烁仪观测数据的处理方法研究", 地球科学进展, no. 11 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117609706A (en) * | 2023-10-20 | 2024-02-27 | 北京师范大学 | Method for interpolating data of carbon water flux |
CN117609706B (en) * | 2023-10-20 | 2024-06-04 | 北京师范大学 | A method for interpolation of carbon and water flux data |
Also Published As
Publication number | Publication date |
---|---|
CN116502050B (en) | 2023-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110309985B (en) | Method and system for forecasting crop yield | |
CN110610054B (en) | A method and system for constructing a soil moisture cuboid inversion model | |
Hazarika et al. | Estimation of net primary productivity by integrating remote sensing data with an ecosystem model | |
CN114462518B (en) | Regional evapotranspiration change attribution analysis method considering multi-element spatial dependence | |
CN102592181A (en) | Method for optimizing spatial distribution of statistical data about crop planting area | |
CN107423850B (en) | Regional corn maturity prediction method based on time series LAI curve integral area | |
CN108304973A (en) | Area crops maturity period prediction technique based on accumulated temperature, radiation and soil moisture content | |
CN113591631A (en) | Crop yield estimation method based on multi-source data | |
CN110501761A (en) | A Forecasting Method of Regional Crop ETc in Different Forecast Periods | |
CN115327662B (en) | Regional ionosphere TEC prediction method based on statistical learning | |
CN102722766A (en) | Wheat output predication method based on revised regional climate mode data | |
CN112257225A (en) | NPP calculation method suitable for alpine grassland ecosystem | |
CN116595333A (en) | Soil-climate intelligent rice target yield and nitrogen fertilizer consumption determination method | |
CN117172037B (en) | Distributed hydrologic forecasting method, device, computer equipment and medium | |
CN116502050B (en) | Dynamic interpolation method and system for global flux site evapotranspiration observation loss | |
CN116628979B (en) | Method, device and medium for pixel-by-pixel prediction of total primary productivity based on multivariate regression | |
CN116522145B (en) | A drought prediction method taking into account time and space constraints and vegetation conditions | |
CN116106472B (en) | Digital mapping method and device for soil organic carbon based on forward iterative variable screening | |
CN118735078B (en) | A downscaling wind resource assessment method and system | |
CN106446226A (en) | Weather information processing and displaying method and weather information processing and displaying system | |
Ju et al. | Prediction of summer grain crop yield with a process-based ecosystem model and remote sensing data for the northern area of the Jiangsu Province, China | |
CN116245008A (en) | Dynamic estimation method of drainage basin hydrologic model parameters based on digital twin | |
CN109359862B (en) | A method and system for real-time yield estimation of food crops | |
CN117217632A (en) | Model estimation-based farmland carbon flux assessment method | |
CN113592664B (en) | Crop production space prediction simulation method, device, model and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |