CN116502050A

CN116502050A - Dynamic interpolation method and system for missing evapotranspiration observations at global flux sites

Info

Publication number: CN116502050A
Application number: CN202310750877.1A
Authority: CN
Inventors: 刘萌; 唐荣林; 李召良; 段四波; 姚娜; 黄凌霄
Original assignee: Institute of Geographic Sciences and Natural Resources of CAS; Institute of Agricultural Resources and Regional Planning of CAAS; Academy of Agricultural Planning and Engineering MARA
Current assignee: Institute of Geographic Sciences and Natural Resources of CAS; Institute of Agricultural Resources and Regional Planning of CAAS; Academy of Agricultural Planning and Engineering MARA
Priority date: 2023-06-25
Filing date: 2023-06-25
Publication date: 2023-07-28
Anticipated expiration: 2043-06-25
Also published as: CN116502050B

Abstract

The invention belongs to the technical field of flux observation data processing, and relates to a dynamic interpolation method and system for missing evapotranspiration observations at global flux sites. The method includes: obtaining observational data of global flux stations and MODIS remote sensing data; constructing a dynamic interpolation database of missing evapotranspiration observations of global flux stations; obtaining available variables and the quantity of each available variable station by station; sorting the importance of available variables; Variable combinations of various available variable quantities; sort the variable combinations according to the interpolation accuracy; use the random forest method to establish a dynamic interpolation model for the missing evapotranspiration observations of each variable combination site by site; dynamically interpolate the evapotranspiration observation data at the missing time, and continuously Update the imputation rate until the imputation rate reaches 100%. The invention can realize the high-precision dynamic interpolation of flux observation evapotranspiration deficiency, and helps to improve the practical value of flux observation data.

Description

Dynamic interpolation method and system for missing evapotranspiration observations at global flux sites

技术领域technical field

本发明涉及通量数据处理技术领域，具体而言，涉及全球通量站点蒸散发观测缺失的动态插补方法与系统。The invention relates to the technical field of flux data processing, in particular to a dynamic interpolation method and system for missing evapotranspiration observations at global flux sites.

背景技术Background technique

EC（Eddy covariance，简称EC）是基于微气象学方法和涡度相关技术对生态系统与大气间湍流通量进行实时测量的方法。长时序EC通量观测数据，对于水循环和能量收支分析、短期气象预报、长期气候预测、及农业灌溉管理等具有重要意义。由于仪器故障、系统故障、管理不善、或天气原因等外在干扰及数据质量控制等，观测得到的时间序列数据经常出现大量的缺失，而实现缺失数据的高质量插补对深入研究陆气间碳、水及能量通量至关重要。EC (Eddy covariance, referred to as EC) is a method for real-time measurement of turbulent flux between ecosystems and the atmosphere based on micrometeorology methods and eddy related technologies. Long-term EC flux observation data are of great significance for water cycle and energy budget analysis, short-term weather forecast, long-term climate forecast, and agricultural irrigation management. Due to external interference such as instrument failure, system failure, poor management, or weather and data quality control, etc., there are often a large number of missing time series data obtained from observations, and the realization of high-quality interpolation of missing data is crucial for in-depth research on land-atmosphere interpolation. Carbon, water and energy fluxes are critical.

现有的数据插补方法，通常以固定的数个变量来进行观测缺失插补。例如FLUXNET2015通量观测数据集官方所采用的通量观测空缺数据插补方法为边际分布采样（MDS）方法，该方法使用入射短波辐射、空气温度和饱和水汽压逆差三个变量来进行通量观测缺失数据的插补。Existing data imputation methods usually perform observation missing imputation with a fixed number of variables. For example, the flux observation vacancy data interpolation method officially adopted by the FLUXNET2015 flux observation data set is the marginal distribution sampling (MDS) method, which uses three variables of incident shortwave radiation, air temperature and saturated water vapor pressure deficit to perform flux observations Imputation of missing data.

MDS插补法的缺点在于，插补精度受限于三个变量的数量和质量，所插补的时间长度也同样受限，难以插补60天以上的时间缺口。其它插补方法也有相似的局限性，通量缺失数据的插补在很大程度上依赖于科研人员对变量的选择。由于所有站点使用有限的完全相同的变量，插补精度会受到选定的固定变量的限制，插补精度高的方法其插补率较低，使得插补精度和插补率难以达到理想的平衡。The disadvantage of the MDS interpolation method is that the accuracy of the interpolation is limited by the quantity and quality of the three variables, and the duration of the interpolation is also limited, making it difficult to interpolate a time gap of more than 60 days. Other imputation methods have similar limitations, and the imputation of flux missing data largely depends on the choice of variables by researchers. Since all stations use limited and identical variables, the imputation accuracy will be limited by the selected fixed variables. The method with high imputation accuracy has a low imputation rate, making it difficult to achieve an ideal balance between imputation accuracy and imputation rate. .

此外在实际中不同站点可获取变量的数目参差不齐，不能保证所有站点都具有插补方法所需要的输入变量，从而难以实现所有站点的缺失数据插补，使得部分站点观测得到的宝贵数据不能充分发挥其价值。In addition, in practice, the number of variables that can be obtained from different stations is uneven, and it cannot be guaranteed that all stations have the input variables required by the imputation method, so it is difficult to realize the imputation of missing data at all stations, so that the valuable data obtained from some stations cannot be obtained. Get the most out of it.

发明内容Contents of the invention

为了解决现有技术中存在的插补的时间长度受限于数据数量和质量、插补精度受限、插补率不足、插补精度和插补率难以达到平衡的问题，本发明提供全球通量站点蒸散发观测缺失的动态插补方法。In order to solve the problems existing in the prior art that the time length of interpolation is limited by the quantity and quality of data, the interpolation accuracy is limited, the interpolation rate is insufficient, and it is difficult to achieve a balance between interpolation accuracy and interpolation rate, the present invention provides a global flux A dynamic imputation method for missing evapotranspiration observations at a site.

第一方面，本发明提供了通量观测缺失数据动态插补方法，包括：In the first aspect, the present invention provides a dynamic interpolation method for flux observation missing data, including:

获取全球通量站点的观测数据与MODIS遥感数据；所述通量站点的观测数据包括气象观测数据与通量观测数据；Obtain the observation data and MODIS remote sensing data of the global flux site; the observation data of the flux site include meteorological observation data and flux observation data;

根据所述气象观测数据、所述通量观测数据以及所述MODIS遥感数据，构建所述全球通量站点的蒸散发观测缺失动态插补数据库；According to the meteorological observation data, the flux observation data and the MODIS remote sensing data, construct a dynamic interpolation database for missing evapotranspiration observations at the global flux site;

对所述蒸散发观测缺失动态插补数据库中的数据进行预处理，逐站点构建输入数据集；Preprocessing the data in the dynamic interpolation database for the lack of evapotranspiration observations, and constructing input data sets site by site;

基于所述输入数据集，逐站点获取各个所述通量站点的可用变量及各所述可用变量的数量；所述可用变量用于表征所述通量站点的观测数据与所述MODIS遥感数据；Based on the input data set, obtain the available variables of each of the flux sites and the quantity of each of the available variables site by site; the available variables are used to characterize the observation data of the flux sites and the MODIS remote sensing data;

根据所述可用变量，利用随机森林方法逐站点对各个所述可用变量进行重要性排序；According to the available variables, the random forest method is used to rank the importance of each of the available variables site by site;

根据各所述可用变量的数量与重要性排序的结果，确定包含各种所述可用变量数量的变量组合；According to the results of the quantity and importance ranking of each of the available variables, determine a variable combination that includes the quantity of each of the available variables;

计算所有所述变量组合的插补精度和插补率，将各个所述变量组合按插补精度从高到低进行排序；Calculating the interpolation accuracy and imputation rate of all the variable combinations, and sorting each of the variable combinations according to the interpolation accuracy from high to low;

基于包含各种所述可用变量数量的所述变量组合，利用随机森林方法逐站点建立各个所述变量组合的蒸散发观测缺失动态插补模型，得到蒸散发观测缺失动态插补模型的集合；所述蒸散发观测缺失动态插补模型用于表征蒸散发观测数据与所述变量组合的函数关系；Based on the variable combinations that include various available variable quantities, the random forest method is used to establish dynamic interpolation models for missing evapotranspiration observations site by site for each combination of variables, and a set of dynamic interpolation models for missing evapotranspiration observations is obtained; The missing dynamic interpolation model of evapotranspiration observation is used to characterize the functional relationship between evapotranspiration observation data and the combination of variables;

根据所述蒸散发观测数据缺失时刻的所述气象观测数据、所述MODIS遥感数据，依次插补缺失时刻的所述蒸散发观测数据，不断更新插补率，直到插补率达到100%，并对插补各个缺口所用的所述变量组合进行标注。According to the meteorological observation data and the MODIS remote sensing data at the time when the evapotranspiration observation data is missing, sequentially interpolate the evapotranspiration observation data at the missing time, and continuously update the interpolation rate until the interpolation rate reaches 100%, and Label the combination of variables used to impute each gap.

第二方面，本发明提供了全球通量站点蒸散发观测缺失的动态插补系统，包括第一获取单元、数据库构建单元、预处理单元、第二获取单元、第一排序单元、确定单元、第二排序单元、模型建立单元与插补更新单元：In the second aspect, the present invention provides a dynamic interpolation system for lack of evapotranspiration observations at global flux sites, including a first acquisition unit, a database construction unit, a preprocessing unit, a second acquisition unit, a first sorting unit, a determination unit, a second Sorting unit, model building unit and imputation update unit:

第一获取单元，用于获取全球通量站点的观测数据与MODIS遥感数据；所述全球通量站点的观测数据包括气象观测数据与通量观测数据；The first acquisition unit is used to acquire the observation data and MODIS remote sensing data of the global flux site; the observation data of the global flux site includes meteorological observation data and flux observation data;

数据库构建单元，用于根据所述气象观测数据、所述通量观测数据以及所述MODIS遥感数据，构建各个所述通量站点的蒸散发观测缺失动态插补数据库；A database construction unit, configured to construct a dynamic interpolation database for missing evapotranspiration observations at each flux site according to the meteorological observation data, the flux observation data, and the MODIS remote sensing data;

预处理单元，用于对所述蒸散发观测缺失动态插补数据库中的数据进行预处理，逐站点构建输入数据集；A preprocessing unit, configured to preprocess the data in the dynamic interpolation database for missing evapotranspiration observations, and construct input data sets site by site;

第二获取单元，用于基于所述输入数据集，逐站点获取各个所述通量站点的可用变量及各所述可用变量的数量；所述可用变量用于表征所述通量站点的观测数据与所述MODIS遥感数据；The second acquisition unit is configured to acquire the available variables of each of the flux sites and the quantity of each of the available variables site by site based on the input data set; the available variables are used to characterize the observation data of the flux sites with the MODIS remote sensing data;

第一排序单元，用于根据所述可用变量，利用随机森林方法逐站点对各个所述可用变量进行重要性排序；The first sorting unit is configured to sort the importance of each of the available variables site by site using a random forest method according to the available variables;

确定单元，用于根据各所述可用变量的数量与重要性排序的结果，确定包含各种所述可用变量数量的变量组合；A determination unit, configured to determine a variable combination including the quantity of various available variables according to the quantity and importance ranking of each available variable;

第二排序单元，用于计算所有所述变量组合的插补精度和插补率，将各个所述变量组合按插补精度从高到低进行排序；The second sorting unit is used to calculate the interpolation accuracy and imputation rate of all the variable combinations, and sort each of the variable combinations according to the interpolation accuracy from high to low;

模型建立单元，用于基于包含各种所述可用变量数量的所述变量组合，利用随机森林方法逐站点建立各个所述变量组合的蒸散发观测缺失动态插补模型，得到蒸散发观测缺失动态插补模型的集合；所述蒸散发观测缺失动态插补模型用于表征蒸散发观测数据与所述变量组合的函数关系；The model building unit is used to establish a dynamic interpolation model of missing evapotranspiration observations for each of the variable combinations site by site based on the variable combinations including various available variable quantities, and obtain a dynamic interpolation model for missing evapotranspiration observations. A collection of complementary models; the dynamic interpolation model for missing evapotranspiration observations is used to characterize the functional relationship between evapotranspiration observation data and the combination of variables;

插补更新单元，用于依次插补缺失时刻的所述气象观测数据、所述MODIS遥感数据，不断更新插补率，直到插补率达到100%，并对插补各个缺口所用的所述变量组合进行标注。The interpolation update unit is used to interpolate the meteorological observation data and the MODIS remote sensing data at the missing time in turn, and constantly update the interpolation rate until the interpolation rate reaches 100%, and the variables used to interpolate each gap combination to label.

本发明的有益效果是：本发明采用机器学习中的随机森林算法，可以基于不同站点的可用变量，获取变量组合的最大个数，基于不同变量组合训练出鲁棒性高、预测性强的随机森林模型，并基于测试集获得不同变量组合的插补精度，最后依随机森林模型结果精度，先排序后插补，实现高精度大范围插补。The beneficial effects of the present invention are: the present invention adopts the random forest algorithm in machine learning, can obtain the maximum number of variable combinations based on the available variables of different sites, and train a random forest algorithm with high robustness and strong predictability based on different variable combinations. Forest model, and based on the test set to obtain the imputation accuracy of different variable combinations, and finally according to the accuracy of the random forest model results, sort first and then interpolate to achieve high-precision and large-scale interpolation.

在上述技术方案的基础上，本发明还可以做如下改进。On the basis of the above technical solutions, the present invention can also be improved as follows.

进一步，所述气象观测数据包括观测日期、观测时刻、空气温度、风速、大气压强、空气湿度、入射短波辐射与净辐射；所述通量观测数据包括所述蒸散发观测数据；所述MODIS遥感数据包括归一化植被指数数据与叶面积指数数据。Further, the weather observation data includes observation date, observation time, air temperature, wind speed, atmospheric pressure, air humidity, incident short-wave radiation and net radiation; the flux observation data includes the evapotranspiration observation data; the MODIS remote sensing The data include normalized difference vegetation index data and leaf area index data.

进一步，对所述蒸散发观测缺失动态插补数据库中的数据进行预处理，逐站点构建输入数据集，包括：Further, the data in the dynamic interpolation database for missing evapotranspiration observations are preprocessed, and input data sets are constructed site by site, including:

获取观测日期和观测时刻，逐条计算所述通量站点的观测数据的日期在一年中的天数次序，并将观测时刻转换为所述通量站点的观测数据所在的半小时时刻在一天48个半小时时刻中的次序，得到时刻次序；将所述天数次序、所述时刻次序、所述气象观测数据与按照经纬度提取出来的所述通量站点对应的像元的MODIS数据结合，逐站点构建输入数据集。Obtain the observation date and observation time, calculate the order of the days of the observation data of the flux site in a year one by one, and convert the observation time to the half-hour time of the observation data of the flux site at 48 days a day The order in the half-hour time is obtained by the order of time; the order of days, the order of time, and the meteorological observation data are combined with the MODIS data of the pixel corresponding to the flux station extracted according to the latitude and longitude, and constructed site by site Input dataset.

进一步，基于包含各种所述可用变量数量的所述变量组合，利用随机森林方法逐站点建立各个所述变量组合的蒸散发观测缺失动态插补模型，包括：Further, based on the variable combinations containing various available variable quantities, the random forest method is used to establish a dynamic interpolation model for missing evapotranspiration observations of each variable combination site by site, including:

设代表所述输入数据集中的所述蒸散发观测数据，所述函数关系为/>，则：set up represents the evapotranspiration observation data in the input data set, and the functional relationship is /> ,but:

； ;

其中，代表所述通量站点的观测数据的日期在一年中的天数次序，/>代表所述通量站点的观测数据所在的半小时时刻在一天48个半小时时刻中的次序，/>代表所述输入数据集中蒸散发未出现缺失时刻的空气温度，/>代表所述输入数据集中蒸散发未出现缺失时刻的入射短波辐射，/>代表所述输入数据集中蒸散发未出现缺失时刻的净辐射，/>代表所述输入数据集中蒸散发未出现缺失时刻的风速，代表所述输入数据集中蒸散发未出现缺失时刻的归一化植被指数。in, The order of the days of the year representing the date of the observed data for the flux site, /> Represents the order of the half-hour time where the observation data of the flux site is located in the 48 half-hour time of the day, /> Represents the air temperature at the time when evapotranspiration does not appear missing in the input data set, /> represents the incident shortwave radiation at the time when evapotranspiration does not appear missing in the input data set, /> represents the net radiance at which no missing moments in evapotranspiration appear in the input dataset, /> Represents the wind speed at the time when evapotranspiration does not appear missing in the input data set, Represents the normalized difference vegetation index at the time when evapotranspiration does not appear missing in the input dataset.

进一步，根据所述蒸散发观测数据缺失时刻的所述气象观测数据、所述MODIS遥感数据，依次插补缺失时刻的所述蒸散发观测数据，包括：Further, according to the meteorological observation data and the MODIS remote sensing data at the time when the evapotranspiration observation data is missing, sequentially interpolate the evapotranspiration observation data at the missing time, including:

； ;

式中，代表所述蒸散发观测数据的插补结果，/>代表蒸散发缺失时刻对应的空气温度观测数据，/>代表蒸散发缺失时刻对应的入射短波辐射观测数据，代表蒸散发缺失时刻对应的净辐射观测数据，/>代表蒸散发缺失时刻对应的风速观测数据，/>代表蒸散发缺失时刻对应的归一化植被指数遥感数据。In the formula, represents the imputation result of the observed evapotranspiration data, /> Represents the air temperature observation data corresponding to the missing time of evapotranspiration, /> Represents the incident shortwave radiation observation data corresponding to the missing time of evapotranspiration, Represents the net radiation observation data corresponding to the missing time of evapotranspiration, /> Represents the wind speed observation data corresponding to the missing time of evapotranspiration, /> Represents the normalized difference vegetation index remote sensing data corresponding to the missing time of evapotranspiration.

进一步，相同所述可用变量数量的所述变量组合之间，若所述变量组合包含的所述可用变量不同，则利用随机森林方法进行插补后的数据插补率不同。Further, among the variable combinations with the same number of available variables, if the available variables included in the variable combinations are different, the data interpolation rates after interpolation using the random forest method are different.

进一步，根据所述蒸散发观测数据缺失时刻的所述气象观测数据、所述MODIS遥感数据，采用遍历插补与更新的方式，依次插补缺失时刻的所述蒸散发观测数据。Further, according to the meteorological observation data and the MODIS remote sensing data at the time when the evapotranspiration observation data is missing, the evapotranspiration observation data at the missing time are sequentially interpolated by means of ergodic interpolation and updating.

附图说明Description of drawings

图1为本发明实施例1提供的通量数据缺口动态插补方法的流程图；FIG. 1 is a flow chart of the dynamic interpolation method for flux data gaps provided by Embodiment 1 of the present invention;

图2为通量数据缺口动态插补的原理示意图；Figure 2 is a schematic diagram of the principle of dynamic interpolation of flux data gaps;

图3为本发明实施例2提供的通量数据缺口动态插补系统的原理图。Fig. 3 is a schematic diagram of the flux data gap dynamic interpolation system provided by Embodiment 2 of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。通常在此处附图中描述和示出的本发明实施例的组件可以以各种不同的配置来布置和设计。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. The components of the embodiments of the invention generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations.

实施例1Example 1

作为一个实施例，如附图1所示，为解决上述技术问题，本实施例提供全球通量站点蒸散发观测缺失的动态插补方法，包括：As an example, as shown in Figure 1, in order to solve the above-mentioned technical problems, this example provides a dynamic interpolation method for missing evapotranspiration observations at global flux sites, including:

获取全球通量站点的观测数据与MODIS遥感数据；通量站点的观测数据包括气象观测数据与通量观测数据；Obtain observation data and MODIS remote sensing data of global flux stations; observation data of flux stations include meteorological observation data and flux observation data;

根据气象观测数据、通量观测数据以及MODIS遥感数据，构建全球通量站点的蒸散发观测缺失动态插补数据库；Based on meteorological observation data, flux observation data and MODIS remote sensing data, construct a dynamic interpolation database for missing evapotranspiration observations at global flux sites;

对蒸散发观测缺失动态插补数据库中的数据进行预处理，逐站点构建输入数据集；Preprocess the data in the dynamic interpolation database for missing evapotranspiration observations, and construct input data sets site by site;

基于输入数据集，逐站点获取全球通量站点的可用变量及各可用变量的数量；可用变量用于表征通量站点的观测数据与MODIS遥感数据；Based on the input data set, the available variables of global flux stations and the quantity of each available variable are obtained station by station; the available variables are used to represent the observation data of flux stations and MODIS remote sensing data;

根据可用变量，利用随机森林方法逐站点对各个可用变量进行重要性排序；According to the available variables, the random forest method is used to rank the importance of each available variable site by site;

根据各可用变量的数量与重要性排序的结果，确定包含各种可用变量数量的变量组合；According to the number and importance ranking results of each available variable, determine the variable combination that includes the number of available variables;

计算所有变量组合的插补精度和插补率，将各个变量组合按插补精度从高到低进行排序；Calculate the imputation accuracy and imputation rate of all variable combinations, and sort each variable combination according to the imputation accuracy from high to low;

基于包含各种可用变量数量的变量组合，利用随机森林方法逐站点建立各个变量组合的蒸散发观测缺失动态插补模型，得到蒸散发观测缺失动态插补模型的集合；蒸散发观测缺失动态插补模型用于表征蒸散发观测数据与变量组合的函数关系；Based on variable combinations that contain various available variable quantities, the random forest method is used to establish dynamic interpolation models for missing ET observations site by site, and a set of dynamic interpolation models for missing ET observations is obtained; dynamic interpolation models for missing ET observations are obtained; The model is used to characterize the functional relationship between evapotranspiration observation data and variable combinations;

根据蒸散发观测数据缺失时刻的气象观测数据、MODIS遥感数据，依次插补缺失时刻的蒸散发观测数据，不断更新插补率，直到插补率达到100%，并对插补各个缺口所用的变量组合进行标注。According to the meteorological observation data and MODIS remote sensing data at the time when the evapotranspiration observation data is missing, the evapotranspiration observation data at the missing time are sequentially interpolated, and the interpolation rate is continuously updated until the interpolation rate reaches 100%. combination to label.

可选的，气象观测数据包括但不限于观测日期、观测时刻、空气温度、风速、大气压强、空气湿度、入射短波辐射与净辐射；通量观测数据包括但不限于蒸散发观测数据；MODIS遥感数据包括归一化植被指数数据与叶面积指数数据。Optionally, meteorological observation data include but not limited to observation date, observation time, air temperature, wind speed, atmospheric pressure, air humidity, incident shortwave radiation and net radiation; flux observation data include but not limited to evapotranspiration observation data; MODIS remote sensing The data include normalized difference vegetation index data and leaf area index data.

可选的，对蒸散发观测缺失动态插补数据库中的数据进行预处理，逐站点构建输入数据集，包括：Optionally, preprocess the data in the dynamic imputation database for missing evapotranspiration observations, and construct input data sets site by site, including:

获取观测日期和观测时刻，逐条计算通量站点的观测数据的日期在一年中的天数次序，并将观测时刻转换为通量站点的观测数据所在的半小时时刻在一天48个半小时时刻中的次序，得到时刻次序；将天数次序、时刻次序、气象观测数据与按照经纬度提取出来的通量站点对应的像元的MODIS数据结合，逐站点构建输入数据集。Obtain the observation date and observation time, calculate the order of the days of the observation data of the flux site one by one in a year, and convert the observation time to the half-hour time where the observation data of the flux site is located in the 48 half-hour time of the day The order of time is obtained; the order of days, time, and meteorological observation data are combined with the MODIS data of pixels corresponding to the flux stations extracted according to latitude and longitude, and the input data set is constructed station by station.

可选的，基于包含各种可用变量数量的变量组合，利用随机森林方法逐站点建立各个变量组合的蒸散发观测缺失动态插补模型，包括：Optionally, based on variable combinations that contain various available variable numbers, the random forest method is used to establish a dynamic imputation model for missing evapotranspiration observations of each variable combination site by site, including:

设代表输入数据集中的蒸散发观测数据，蒸散发观测数据与变量组合的函数关系为/>，则：set up Represents the evapotranspiration observation data in the input data set, and the functional relationship between the evapotranspiration observation data and the variable combination is /> ,but:

； ;

其中，代表通量站点的观测数据的日期在一年中的天数次序，/>代表通量站点的观测数据所在的半小时时刻在一天48个半小时时刻中的次序，/>代表输入数据集中未出现蒸散发观测数据缺失时刻的空气温度，/>代表输入数据集中未出现蒸散发观测数据缺失时刻的入射短波辐射，/>代表输入数据集中未出现蒸散发观测数据缺失时刻的净辐射，/>代表输入数据集中蒸散发未出现缺失时刻的风速，代表输入数据集中蒸散发未出现缺失时刻的归一化植被指数。in, Represents the order of the days of the year for the observation data of the flux site, /> Represents the order of the half-hour time where the observation data of the flux site is located in the 48 half-hour time of the day, /> Represents the air temperature at the moment when there is no missing evapotranspiration observation data in the input data set, /> Represents the incident short-wave radiation at the time when there is no missing evapotranspiration observation data in the input data set, /> Represents the net radiance at the time when no evapotranspiration observation data is missing in the input data set, /> Represents the wind speed at the time when evapotranspiration does not appear missing in the input data set, The normalized difference vegetation index representing the moment when evapotranspiration does not appear missing in the input dataset.

可选的，根据蒸散发观测数据缺失时刻的所述气象观测数据、所述MODIS遥感数据，依次插补缺失时刻的蒸散发观测数据，包括：Optionally, according to the meteorological observation data and the MODIS remote sensing data at the time when the evapotranspiration observation data is missing, sequentially interpolate the evapotranspiration observation data at the missing time, including:

； ;

式中，代表蒸散发观测数据插补结果，/>代表蒸散发缺失时刻对应的空气温度观测数据，/>代表蒸散发缺失时刻对应的入射短波辐射观测数据，/>代表蒸散发缺失时刻对应的净辐射观测数据，/>代表蒸散发缺失时刻对应的风速观测数据，/>代表蒸散发缺失时刻对应的归一化植被指数遥感数据。In the formula, represents the interpolation result of evapotranspiration observation data, /> Represents the air temperature observation data corresponding to the missing time of evapotranspiration, /> Represents the incident shortwave radiation observation data corresponding to the missing time of evapotranspiration, /> Represents the net radiation observation data corresponding to the missing time of evapotranspiration, /> Represents the wind speed observation data corresponding to the missing time of evapotranspiration, /> Represents the normalized difference vegetation index remote sensing data corresponding to the missing time of evapotranspiration.

可选的，相同可用变量数量的变量组合之间，若变量组合包含的可用变量不同，则利用随机森林方法进行插补后的数据插补率不同。Optionally, among variable combinations with the same number of available variables, if the variable combinations contain different available variables, the imputation rate of the data after imputation using the random forest method is different.

可选的，根据蒸散发观测数据缺失时刻的所述气象观测数据、所述MODIS遥感数据，依次插补缺失时刻的蒸散发观测数据采用遍历插补与更新的方式对缺失时刻的蒸散发观测数据进行插补。Optionally, according to the meteorological observation data and the MODIS remote sensing data at the time when the evapotranspiration observation data is missing, the evapotranspiration observation data at the missing time are sequentially interpolated, and the evapotranspiration observation data at the missing time are interpolated by traversal interpolation and updating. Perform imputation.

在实际应用过程中，如附图2所示，输入数据集如FLUXNET2015中各站点所含变量种类不同，增加了特征选择的复杂性，通过获取站点A、站点B、站点C和站点D各站点的最大可用变量N，在该数值基础上依次递减，构造N-3种变量组合（变量组合最少为3个）。In the actual application process, as shown in Figure 2, the input data set such as FLUXNET2015 contains different types of variables in each site, which increases the complexity of feature selection. By obtaining the site A, site B, site C and site D The maximum available variable N of , and decrease in sequence on the basis of this value, construct N-3 variable combinations (at least 3 variable combinations).

对于单个站点，不同数据缺口对应的变量类别不同，若选取其中的几个变量进行建模预测，只能预测部分满足变量条件的缺口，变量缺失的缺口则无法被预测，而选取多种具有不同数目的变量组合，可以满足预测不同缺口所需的变量情况，将站点数据的插补率大幅提升，实现数据集的完整重构。For a single site, different data gaps correspond to different variable categories. If several variables are selected for modeling and prediction, only gaps that partially meet the variable conditions can be predicted, and gaps with missing variables cannot be predicted. The number of variable combinations can meet the variable conditions required to predict different gaps, greatly increase the imputation rate of site data, and realize the complete reconstruction of the data set.

在相同变量个数的组合中，不同变量组合建模后的数据插补率又各不相同，为实现输入数据集插补率最大化，计算各个可用变量的数量的变量组合插补精度，选取插补精度最高的变量组合，分别训练随机森林模型，最大的变量个数为各站点的可用变量数目。In the combination of the same number of variables, the data interpolation rates of different variable combinations are different after modeling. In order to maximize the interpolation rate of the input data set and calculate the variable combination imputation accuracy of the number of available variables, select The variable combination with the highest interpolation accuracy was used to train the random forest model separately, and the maximum number of variables was the number of variables available at each site.

可选的，根据蒸散发观测数据缺失时刻的所述气象观测数据、MODIS遥感数据，采用遍历插补与更新的方式，依次插补缺失时刻的蒸散发观测数据。Optionally, according to the meteorological observation data and MODIS remote sensing data at the time when the evapotranspiration observation data is missing, the evapotranspiration observation data at the missing time are sequentially interpolated by means of traversal interpolation and updating.

如附图2所示，对于每个站点，构建随机森林模型的具体实施步骤如下：As shown in Figure 2, for each site, the specific implementation steps of building a random forest model are as follows:

确定该站点所有可用变量；/>为通量站点的观测数据所在天在一年中的天数次序/>，/>为通量站点的观测数据所在半小时时刻在一天48个半小时时刻中出现的半小时数次序/>，/>，/>；/>为出天数次序和半小时数次序意外的可用变量。Identify all available variables for this site ;/> It is the sequence of days in a year where the observation data of flux stations are located /> , /> It is the order of the half-hours in the 48 half-hours of a day where the observation data of the flux site is located at half-hours/> , /> , /> ;/> Available variants for day order and half hour order exceptions.

利用随机森林模型对除和/>以外的所有可用变量/>进行重要性排序，得到按重要性依次排序的变量序列：/>；Using a random forest model to divide and /> All available variables other than /> Perform importance sorting to get a sequence of variables sorted by importance: /> ;

根据重要性排序结果，在和/>基础上，按照该变量序列在变量组合中依次添加可用变量（每次添加/>中的最少一个变量，优选的，添加变量后拟构建的随机森林模型至少三个变量作为模型的输入），构建所有可能的变量组合（/>，/>为变量组合的个数，/>为变量的数量，/>即表示不包括天数次序和半小时数次序的变量的数量），使得所有的变量组合中用于构建随机森林模型的数据量达到总数据量的设定比例。Sort the results by importance, in and /> Based on this variable sequence, add the available variables in sequence in the variable combination (each adding /> At least one variable in the variable, preferably, the random forest model to be constructed after adding the variable has at least three variables as the input of the model), and constructs all possible variable combinations (/> , /> is the number of variable combinations, /> is the number of variables, /> That is, the number of variables that do not include the order of days and the order of half hours), so that the amount of data used to build the random forest model in all variable combinations reaches the set ratio of the total data amount.

； ;

……...

； ;

……...

； ;

。 .

对各种可用变量数量的变量组合分别构建随机森林模型如下：The random forest model is constructed separately for the variable combinations of various available variable numbers as follows:

； ;

……...

； ;

……...

； ;

。 .

式中，代表蒸散发观测数据，/>代表蒸散发观测数据与插补输入变量的函数关系。其中，/>是作为补充，以保证在没有除了天数次序/>与半小时次序/>以外的可用变量时，所有缺失的数据的插补率仍能够达到100%。可选的，利用未参与建立随机森林模型的袋外数据进行参数调优，计算袋外数据的插补率的均方根误差，当均方根误差小于设定值，参数调优完成。In the formula, represents evapotranspiration observation data, /> Represents the functional relationship between observed evapotranspiration data and imputation input variables. where, /> is as a supplement to guarantee the order in no addition to the number of days /> with half hour order /> When other variables are available, the imputation rate of all missing data can still reach 100%. Optionally, use the out-of-bag data that is not involved in the establishment of the random forest model for parameter tuning, and calculate the root mean square error of the imputation rate of the out-of-bag data. When the root mean square error is less than the set value, the parameter tuning is complete.

测试模型插补精度，获得m个变量组合的插补精度及各自的插补率/>。Test the model interpolation accuracy and obtain the interpolation accuracy of m variable combinations and their respective imputation rates/> .

对m个变量组合的插补精度进行排序，排序后的变量组合依次为对应的插补率依次为/>。Sort the imputation accuracy of m variable combinations, and the sorted variable combinations are The corresponding imputation rate is /> .

从最高精度所对应的变量组合开始，利用排序后的各个变量组合对应的蒸散发观测数据，得到蒸散发数据，利用各个蒸散发数据，逐个的对缺失时刻的蒸散发观测数据进行插补。From the combination of variables corresponding to the highest precision At the beginning, use the sorted variables to combine the corresponding evapotranspiration observation data to obtain evapotranspiration data, and use each evapotranspiration data to interpolate the evapotranspiration observation data at the missing time one by one.

对于插补精度相近的随机森林模型，选取插补率高的蒸散发数据作为蒸散发观测数据，若插补率相近，则选取变量组合中可用变量少的随机森林模型得到蒸散发数据；在动态插补的过程中，对于插补的每一条数据，标记插补空缺数据所使用的变量组合，并更新该通量站点当前的插补率，直至该通量站点的插补率达到100%，具体过程如下：For the random forest model with similar interpolation accuracy, the ET data with high interpolation rate is selected as the ET observation data. If the interpolation rate is similar, the random forest model with less available variables in the variable combination is selected to obtain the ET data; During the imputation process, for each piece of interpolated data, mark the combination of variables used to imput the vacant data, and update the current imputation rate of the flux station until the imputation rate of the flux station reaches 100%. The specific process is as follows:

，/>； , /> ;

……...

， ,

； ;

式中，代表第/>组插补数据，/>为每次插补的变量的数量，/>为站点数据总数，/>代表从高到低不同精度的蒸散发观测缺失动态插补模型，/>代表完成第/>组插补后最新的插补率。In the formula, On behalf of No. /> group imputation data, /> number of variables for each imputation, /> is the total number of site data, /> Represents the dynamic interpolation model of missing evapotranspiration observations with different precision from high to low, /> On behalf of the completion of the /> The latest imputation rate after group imputation.

若插补率达到100%，则插补结束；若插补率小于100%，则使用插补剩余空缺，直至插补率达到100%。If the interpolation rate reaches 100%, the interpolation ends; if the interpolation rate is less than 100%, use The remaining vacancies are imputed until the imputation rate reaches 100%.

与传统方式利用固定变量组合通过机器学习插补数据缺口的方法相比，本方法不局限于数据缺口时刻对应的变量种类及数量，可针对不同缺口选取不同种变量组合，在保证插补精度的基础上实现数据插补率的最大化。与FLUXNET2015数据集中所采用的边际分布采样（Marginal Distribution Sampling，缩写MDS）法相比，本发明采用机器学习中的随机森林算法，可以基于不同站点的可用变量，获取变量组合的最大个数，基于不同变量组合训练出鲁棒性高、预测性强的随机森林模型，并基于测试集获得不同变量组合的精度，最后依建模精度，先排序、后插补，实现高精度大范围插补。Compared with the traditional method of using fixed variable combinations to interpolate data gaps through machine learning, this method is not limited to the type and quantity of variables corresponding to the time of data gaps, and can select different combinations of variables for different gaps, while ensuring the accuracy of interpolation. Based on this, the data imputation rate is maximized. Compared with the Marginal Distribution Sampling (MDS) method used in the FLUXNET2015 data set, the present invention adopts the random forest algorithm in machine learning, which can obtain the maximum number of variable combinations based on the available variables of different sites, based on different The variable combination trains a random forest model with high robustness and strong predictability, and obtains the accuracy of different variable combinations based on the test set. Finally, according to the modeling accuracy, sort first and then interpolate to achieve high-precision and large-scale interpolation.

本发明依据不同站点确定最大变量个数，从而用于确定各种变量数目的变量组合；依据数据缺口插补率确定各组合的具体变量以及依据各变量组合的建模精度由高到低插补数据缺口，够实现动态数据插补，兼顾插补率和插补精度，使得插补率达到100%。The present invention determines the maximum number of variables according to different stations, so as to determine the variable combination of various variable numbers; determines the specific variables of each combination according to the data gap interpolation rate, and interpolates from high to low according to the modeling accuracy of each variable combination The data gap is enough to realize dynamic data interpolation, taking into account the interpolation rate and interpolation accuracy, so that the interpolation rate reaches 100%.

实施例2Example 2

基于与本发明的实施例1中所示的方法相同的原理，如附图3所示，本发明的实施例中还提供了全球通量站点蒸散发观测缺失的动态插补系统，包括第一获取单元、数据库构建单元、预处理单元、第二获取单元、第一排序单元、确定单元、第二排序单元、模型建立单元与插补更新单元：Based on the same principle as the method shown in Embodiment 1 of the present invention, as shown in Figure 3, the embodiment of the present invention also provides a dynamic interpolation system for missing evapotranspiration observations at global flux sites, including the first acquisition Unit, database construction unit, preprocessing unit, second acquisition unit, first sorting unit, determination unit, second sorting unit, model building unit and interpolation update unit:

第一获取单元，用于获取全球通量站点的观测数据与MODIS遥感数据；通量站点的观测数据包括气象观测数据与通量观测数据；The first acquisition unit is used to acquire the observation data and MODIS remote sensing data of the global flux site; the observation data of the flux site includes meteorological observation data and flux observation data;

数据库构建单元，用于根据气象观测数据、通量观测数据以及MODIS遥感数据，构建全球通量站点的蒸散发观测缺失动态插补数据库；The database construction unit is used to construct a dynamic interpolation database for missing evapotranspiration observations at global flux sites based on meteorological observation data, flux observation data and MODIS remote sensing data;

预处理单元，用于对蒸散发观测缺失动态插补数据库中的数据进行预处理，逐站点构建输入数据集；The preprocessing unit is used to preprocess the data in the dynamic interpolation database for missing evapotranspiration observations, and construct input data sets site by site;

第二获取单元，用于基于输入数据集，逐站点获取全球通量站点的可用变量及各可用变量的数量；可用变量包括通量站点的观测数据在一年中的天数次序和在一天中的半小时数次序;The second acquisition unit is used to obtain the available variables of global flux sites and the quantity of each available variable site by site based on the input data set; the available variables include the sequence of days in a year and half of a day in the observation data of flux sites order of hours;

第一排序单元，用于根据可用变量，利用随机森林方法逐站点对各个可用变量进行重要性排序；The first sorting unit is used to sort the importance of each available variable site by site using the random forest method according to the available variables;

确定单元，用于根据各可用变量的数量与重要性排序的结果，确定包含各种可用变量数量的变量组合；A determination unit is used to determine the variable combination including the quantity of various available variables according to the results of ranking the quantity and importance of each available variable;

第二排序单元，用于计算所有变量组合的插补精度和插补率，将各个变量组合按插补精度从高到低进行排序；The second sorting unit is used to calculate the interpolation accuracy and interpolation rate of all variable combinations, and sort each variable combination according to the interpolation accuracy from high to low;

模型建立单元，用于基于包含各种可用变量数量的变量组合，利用随机森林方法逐站点建立各个变量组合的蒸散发观测缺失动态插补模型，得到蒸散发观测缺失动态插补模型的集合；蒸散发观测缺失动态插补模型用于表征蒸散发观测数据与变量组合的函数关系；The model building unit is used to establish the dynamic interpolation model of evapotranspiration observation loss for each variable combination site by site based on the variable combination including various available variable quantities, and obtain a set of dynamic interpolation models for evapotranspiration observation loss; evapotranspiration The missing dynamic interpolation model is used to characterize the functional relationship between evapotranspiration observation data and variable combinations;

插补更新单元，用于根据蒸散发观测数据缺失时刻的气象观测数据、MODIS遥感数据，依次插补缺失时刻的蒸散发观测数据，不断更新插补率，直到插补率达到100%，并对插补各个缺口所用的变量组合进行标注。The interpolation update unit is used to interpolate the evapotranspiration observation data at the missing time according to the meteorological observation data and MODIS remote sensing data at the time when the evapotranspiration observation data is missing, and continuously update the interpolation rate until the interpolation rate reaches 100%. The combination of variables used to imput each gap is marked.

可选的，气象观测数据包括观测日期、观测时刻、空气温度、风速、大气压强、空气湿度、入射短波辐射与净辐射；通量观测数据包括蒸散发数据；MODIS遥感数据包括归一化植被指数数据与叶面积指数数据。Optionally, meteorological observation data include observation date, observation time, air temperature, wind speed, atmospheric pressure, air humidity, incident shortwave radiation and net radiation; flux observation data include evapotranspiration data; MODIS remote sensing data include normalized difference vegetation index Data with LAI data.

设代表输入数据集中的蒸散发观测数据，函数关系为/>，则：set up Represents the evapotranspiration observation data in the input data set, and the functional relationship is /> ,but:

； ;

其中，代表通量站点的观测数据的日期在一年中的天数次序，/>代表通量站点的观测数据所在的半小时时刻在一天48个半小时时刻中的次序，/>代表输入数据集中蒸散发未出现缺失时刻的空气温度，/>代表输入数据集中蒸散发未出现缺失时刻的入射短波辐射，/>代表输入数据集中蒸散发未出现缺失时刻的净辐射，/>代表输入数据集中蒸散发未出现缺失时刻的风速，/>代表输入数据集中蒸散发未出现缺失时刻的归一化植被指数。in, Represents the order of the days of the year for the observation data of the flux site, /> Represents the order of the half-hour time where the observation data of the flux site is located in the 48 half-hour time of the day, /> Represents the air temperature at which evapotranspiration does not appear missing in the input data set, /> Represents the incident shortwave radiation at the time when evapotranspiration does not appear missing in the input data set, /> Represents the net radiance at the time when evapotranspiration does not appear missing in the input data set, /> Represents the wind speed at the time when evapotranspiration does not appear missing in the input data set, /> The normalized difference vegetation index representing the moment when evapotranspiration does not appear missing in the input dataset.

； ;

以上仅为本发明的优选实施例而已，并不用于限制本发明，对于本领域的技术人员来说，本发明可以有各种更改和变化。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims

1. The dynamic interpolation method for the global flux site evapotranspiration observation missing is characterized by comprising the following steps:

obtaining observation data and MODIS remote sensing data of a global flux site; the observation data of the flux site comprises meteorological observation data and flux observation data;

constructing a dynamic interpolation database of the evapotranspiration observation deletion of the global flux site according to the meteorological observation data, the flux observation data and the MODIS remote sensing data;

preprocessing data in the dynamic interpolation database of the evapotranspiration observation deficiency, and constructing an input data set station by station;

acquiring available variables of each flux site and the number of the available variables from site to site based on the input data set; the usable variables are used for representing the observation data of the flux site and the MODIS remote sensing data;

according to the available variables, carrying out importance ranking on each available variable site by utilizing a random forest method;

determining variable combinations comprising the number of various available variables according to the number of the available variables and the result of the importance ranking;

calculating interpolation precision and interpolation rate of all the variable combinations, and sequencing the variable combinations according to the interpolation precision from high to low;

based on the variable combinations containing various available variable quantities, establishing a dynamic interpolation model of the evapotranspiration observation deficiency of each variable combination station by utilizing a random forest method to obtain a set of the dynamic interpolation models of the evapotranspiration observation deficiency; the evapotranspiration observation missing dynamic interpolation model is used for representing the functional relation between evapotranspiration observation data and the variable combination;

and sequentially interpolating the evapotranspiration observation data at the missing moment according to the meteorological observation data and the MODIS remote sensing data at the missing moment, continuously updating the interpolation rate until the interpolation rate reaches 100%, and marking the variable combination used for interpolating each gap.

2. The dynamic interpolation method of global flux site evapotranspiration observation loss according to claim 1, wherein the meteorological observation data comprises observation date, observation time, air temperature, wind speed, atmospheric pressure, air humidity, incident short wave radiation and net radiation; the flux observation data includes the evapotranspiration observation data; the MODIS remote sensing data comprises normalized vegetation index data and leaf area index data.

3. A method of dynamic interpolation for the loss of transpiration observations at a global flux site according to claim 1 wherein preprocessing the data in the dynamic interpolation database for the loss of transpiration observations, constructing an input dataset site by site, comprises:

acquiring observation dates and observation moments, calculating the number of days sequence of the date of the observation data of the flux station in one year one by one, converting the observation moments into the sequence of the half hour moment of the observation data of the flux station in 48 half hour moments of the day, and obtaining a moment sequence; and combining the number of days sequence, the time sequence and the meteorological observation data with MODIS data of pixels corresponding to the flux sites extracted according to longitude and latitude, and constructing an input data set site by site.

4. A method of dynamic interpolation of global flux site evapotranspiration observation loss according to claim 1, wherein establishing a dynamic interpolation model of evapotranspiration observation loss for each of said variable combinations site by site using a random forest method based on said variable combinations including the various amounts of said variables available comprises:

is provided withRepresenting said evapotranspiration observation data in said input dataset, said functional relationship being +.>Then:

；

wherein ,the order of days in the year of the date representing the observations of the flux site, +.>The half hour time at which the observations representing the flux site are located is at 48 half hour times of the dayOrder of (1)>Air temperature representing the moment of absence of evaporation in said input dataset,/for>Incident short-wave radiation representing the moment of absence of evaporation in the input dataset,/>A net radiation representing the moment of absence of evaporation in said input dataset,/for>A wind speed representing the moment when no missing is present in the input dataset,and (3) representing the normalized vegetation index of the evaporation non-missing moment in the input data set.

5. The dynamic interpolation method for global flux site evapotranspiration observation loss according to claim 1, wherein sequentially interpolating the evapotranspiration observation data at a loss moment from the meteorological observation data and the MODIS remote sensing data at the loss moment of the evapotranspiration observation data comprises:

；

in the formula ,interpolation result representing said evapotranspiration observation data,/->Represents the air temperature observation data corresponding to the moment of missing of the evapotranspiration,/>incident short-wave radiation observation data corresponding to the moment of absence of evapotranspiration, < >>Net radiation observations corresponding to the moment of absence of evapotranspiration,>wind speed observation data corresponding to the moment of evapotranspiration loss, < >>And representing normalized vegetation index remote sensing data corresponding to the evapotranspiration occurrence deletion moment.

6. A method of dynamic interpolation for global flux site evapotranspiration observation loss according to claim 1, wherein the data interpolation rate after interpolation by a random forest method is different if the variable combinations included in the variable combinations are different among the variable combinations of the same number of the available variables.

7. The dynamic interpolation method for the missing of the evapotranspiration observation of the global flux site according to claim 1, wherein the evapotranspiration observation data at the missing moment is sequentially interpolated by adopting a traversing interpolation and updating mode according to the meteorological observation data and the MODIS remote sensing data at the missing moment of the evapotranspiration observation data.

8. The dynamic interpolation system for the global flux site transpiration observation missing is characterized by comprising a first acquisition unit, a database construction unit, a preprocessing unit, a second acquisition unit, a first ordering unit, a determination unit, a second ordering unit, a model establishment unit and an interpolation updating unit:

the first acquisition unit is used for acquiring the observation data of the global flux site and the MODIS remote sensing data; the observation data of the flux site comprises meteorological observation data and flux observation data;

the database construction unit is used for constructing a dynamic interpolation database of the evapotranspiration observation deletion of the global flux site according to the meteorological observation data, the flux observation data and the MODIS remote sensing data;

the preprocessing unit is used for preprocessing the data in the dynamic interpolation database of the evapotranspiration observation deficiency, and constructing an input data set station by station;

a second obtaining unit, configured to obtain, based on the input data set, available variables of each of the flux sites and the number of the available variables from site to site; the usable variables are used for representing the observation data of the flux site and the MODIS remote sensing data;

the first sorting unit is used for sorting the importance of each available variable site by utilizing a random forest method according to the available variable;

a determining unit configured to determine variable combinations including the number of the various available variables based on the number of the available variables and a result of the importance ranking;

the second sorting unit is used for calculating the interpolation precision and the interpolation rate of all the variable combinations and sorting the variable combinations from high to low according to the interpolation precision;

the model building unit is used for building the evapotranspiration observation missing dynamic interpolation models of the variable combinations station by utilizing a random forest method based on the variable combinations comprising various available variable numbers to obtain a set of the evapotranspiration observation missing dynamic interpolation models; the evapotranspiration observation missing dynamic interpolation model is used for representing the functional relation between evapotranspiration observation data and the variable combination;

and the interpolation updating unit is used for sequentially interpolating the evapotranspiration observation data at the missing moment according to the meteorological observation data and the MODIS remote sensing data at the missing moment, continuously updating the interpolation rate until the interpolation rate reaches 100%, and marking the variable combination used for interpolation of each gap.