CN116502050A - Dynamic interpolation method and system for global flux site evapotranspiration observation loss - Google Patents
Dynamic interpolation method and system for global flux site evapotranspiration observation loss Download PDFInfo
- Publication number
- CN116502050A CN116502050A CN202310750877.1A CN202310750877A CN116502050A CN 116502050 A CN116502050 A CN 116502050A CN 202310750877 A CN202310750877 A CN 202310750877A CN 116502050 A CN116502050 A CN 116502050A
- Authority
- CN
- China
- Prior art keywords
- observation
- evapotranspiration
- data
- interpolation
- site
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000004907 flux Effects 0.000 title claims abstract description 102
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000007637 random forest analysis Methods 0.000 claims abstract description 35
- 230000007812 deficiency Effects 0.000 claims abstract description 15
- 238000012217 deletion Methods 0.000 claims abstract description 14
- 230000037430 deletion Effects 0.000 claims abstract description 14
- 230000005855 radiation Effects 0.000 claims description 25
- 230000008020 evaporation Effects 0.000 claims description 16
- 238000001704 evaporation Methods 0.000 claims description 16
- 238000007781 pre-processing Methods 0.000 claims description 16
- 238000010276 construction Methods 0.000 claims description 7
- 238000012163 sequencing technique Methods 0.000 claims description 5
- 230000005068 transpiration Effects 0.000 claims 3
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 238000012545 processing Methods 0.000 abstract description 2
- 238000010801 machine learning Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002262 irrigation Effects 0.000 description 1
- 238000003973 irrigation Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Complex Calculations (AREA)
Abstract
The invention belongs to the technical field of flux observation data processing, and relates to a dynamic interpolation method and a system for the loss of global flux site evapotranspiration observation. The method comprises the following steps: obtaining observation data and MODIS remote sensing data of a global flux site; constructing a dynamic interpolation database of the evapotranspiration observation deletion of the global flux site; acquiring available variables and the number of each available variable from station to station; ranking the importance of the available variables; determining a variable combination comprising a variety of available variable quantities; sorting variable combinations according to interpolation precision; establishing a dynamic interpolation model of the evapotranspiration observation deficiency of each variable combination station by using a random forest method; the interpolation rate is continuously updated by dynamically interpolating the evapotranspiration observation data at the moment of missing until the interpolation rate reaches 100%. The invention can realize high-precision dynamic interpolation of flux observation evapotranspiration loss, and is beneficial to improving the practical value of flux observation data.
Description
Technical Field
The invention relates to the technical field of flux data processing, in particular to a dynamic interpolation method and a system for global flux site evapotranspiration observation missing.
Background
EC (EC for short) is a method for measuring turbulence flux between an ecosystem and the atmosphere in real time based on a micro-aeropictography method and a vorticity correlation technique. The long-time-sequence EC flux observation data has important significance for water circulation and energy balance analysis, short-term weather forecast, long-term weather forecast, agricultural irrigation management and the like. Due to external interference such as instrument faults, system faults, poor management, weather reasons and the like, data quality control and the like, a great amount of defects often occur in the observed time series data, and the realization of high-quality interpolation of the missing data is important for the deep research of carbon, water and energy flux between land and gas.
In the conventional data interpolation method, observation missing interpolation is generally performed with a fixed number of variables. For example, the flux observation vacancy data interpolation method adopted by the official of the flux observation data set of FLUXNET2015 is a Marginal Distribution Sampling (MDS) method, which uses three variables of incident short wave radiation, air temperature and saturated water vapor pressure head to interpolate flux observation missing data.
The MDS interpolation method has the disadvantage that the interpolation accuracy is limited by the number and quality of three variables, the length of time to be interpolated is also limited, and it is difficult to interpolate time slots of 60 days or more. Other interpolation methods have similar limitations, and interpolation of flux-deficiency data is largely dependent on the choice of variables by the scientific researchers. Since all stations use limited identical variables, the interpolation accuracy is limited by the selected fixed variables, and the interpolation rate is low in the method with high interpolation accuracy, so that the interpolation accuracy and the interpolation rate are difficult to reach ideal balance.
In addition, in practice, the number of variables acquired by different sites is uneven, and it cannot be ensured that all sites have input variables required by an interpolation method, so that missing data interpolation of all sites is difficult to realize, and precious data observed by part of sites cannot fully play the value of the valuable data.
Disclosure of Invention
In order to solve the problems that the time length of interpolation is limited by the quantity and quality of data, the interpolation precision is limited, the interpolation rate is insufficient, and the interpolation precision and the interpolation rate are difficult to reach balance in the prior art, the invention provides a dynamic interpolation method for global flux site evaporation observation loss.
In a first aspect, the present invention provides a flux observation missing data dynamic interpolation method, including:
obtaining observation data and MODIS remote sensing data of a global flux site; the observation data of the flux site comprises meteorological observation data and flux observation data;
constructing a dynamic interpolation database of the evapotranspiration observation deletion of the global flux site according to the meteorological observation data, the flux observation data and the MODIS remote sensing data;
preprocessing data in the dynamic interpolation database of the evapotranspiration observation deficiency, and constructing an input data set station by station;
acquiring available variables of each flux site and the number of the available variables from site to site based on the input data set; the usable variables are used for representing the observation data of the flux site and the MODIS remote sensing data;
according to the available variables, carrying out importance ranking on each available variable site by utilizing a random forest method;
determining variable combinations comprising the number of various available variables according to the number of the available variables and the result of the importance ranking;
calculating interpolation precision and interpolation rate of all the variable combinations, and sequencing the variable combinations according to the interpolation precision from high to low;
based on the variable combinations containing various available variable quantities, establishing a dynamic interpolation model of the evapotranspiration observation deficiency of each variable combination station by utilizing a random forest method to obtain a set of the dynamic interpolation models of the evapotranspiration observation deficiency; the evapotranspiration observation missing dynamic interpolation model is used for representing the functional relation between evapotranspiration observation data and the variable combination;
and sequentially interpolating the evapotranspiration observation data at the missing moment according to the meteorological observation data and the MODIS remote sensing data at the missing moment, continuously updating the interpolation rate until the interpolation rate reaches 100%, and marking the variable combination used for interpolating each gap.
In a second aspect, the invention provides a dynamic interpolation system for global flux site evaporative observation loss, which comprises a first acquisition unit, a database construction unit, a preprocessing unit, a second acquisition unit, a first ordering unit, a determination unit, a second ordering unit, a model construction unit and an interpolation updating unit:
the first acquisition unit is used for acquiring the observation data of the global flux site and the MODIS remote sensing data; the observation data of the global flux site comprises meteorological observation data and flux observation data;
the database construction unit is used for constructing a dynamic interpolation database of the evapotranspiration observation deletion of each flux site according to the meteorological observation data, the flux observation data and the MODIS remote sensing data;
the preprocessing unit is used for preprocessing the data in the dynamic interpolation database of the evapotranspiration observation deficiency, and constructing an input data set station by station;
a second obtaining unit, configured to obtain, based on the input data set, available variables of each of the flux sites and the number of the available variables from site to site; the usable variables are used for representing the observation data of the flux site and the MODIS remote sensing data;
the first sorting unit is used for sorting the importance of each available variable site by utilizing a random forest method according to the available variable;
a determining unit configured to determine variable combinations including the number of the various available variables based on the number of the available variables and a result of the importance ranking;
the second sorting unit is used for calculating the interpolation precision and the interpolation rate of all the variable combinations and sorting the variable combinations from high to low according to the interpolation precision;
the model building unit is used for building the evapotranspiration observation missing dynamic interpolation models of the variable combinations station by utilizing a random forest method based on the variable combinations comprising various available variable numbers to obtain a set of the evapotranspiration observation missing dynamic interpolation models; the evapotranspiration observation missing dynamic interpolation model is used for representing the functional relation between evapotranspiration observation data and the variable combination;
and the interpolation updating unit is used for sequentially interpolating the meteorological observation data and the MODIS remote sensing data at the missing moment, continuously updating the interpolation rate until the interpolation rate reaches 100%, and marking the variable combination used for interpolating each gap.
The beneficial effects of the invention are as follows: the invention adopts a random forest algorithm in machine learning, can acquire the maximum number of variable combinations based on the available variables of different sites, trains out a random forest model with high robustness and strong predictability based on different variable combinations, acquires the interpolation precision of different variable combinations based on a test set, and finally realizes high-precision large-scale interpolation by sequencing and then interpolation according to the result precision of the random forest model.
On the basis of the technical scheme, the invention can be improved as follows.
Further, the meteorological observation data comprise observation date, observation time, air temperature, wind speed, atmospheric pressure, air humidity, incident short wave radiation and net radiation; the flux observation data includes the evapotranspiration observation data; the MODIS remote sensing data comprises normalized vegetation index data and leaf area index data.
Further, preprocessing the data in the dynamic interpolation database of the evapotranspiration observation deficiency, constructing an input data set station by station, and comprising:
acquiring observation dates and observation moments, calculating the number of days sequence of the date of the observation data of the flux station in one year one by one, converting the observation moments into the sequence of the half hour moment of the observation data of the flux station in 48 half hour moments of the day, and obtaining a moment sequence; and combining the number of days sequence, the time sequence and the meteorological observation data with MODIS data of pixels corresponding to the flux sites extracted according to longitude and latitude, and constructing an input data set site by site.
Further, based on the variable combinations including the various available variable numbers, a site-by-site evapotranspiration observation loss dynamic interpolation model of each variable combination is established by using a random forest method, and the method comprises the following steps:
is provided withRepresenting said evapotranspiration observation data in said input dataset, said functional relationship being +.>Then:
;
wherein ,the order of days in the year of the date representing the observations of the flux site, +.>Representative instituteHalf hour of the order of the observation data of the flux site at 48 half hour times a day,/h>Air temperature representing the moment of absence of evaporation in said input dataset,/for>Incident short-wave radiation representing the moment of absence of evaporation in the input dataset,/>A net radiation representing the moment of absence of evaporation in said input dataset,/for>A wind speed representing the moment when no missing is present in the input dataset,and (3) representing the normalized vegetation index of the evaporation non-missing moment in the input data set.
Further, the method for sequentially interpolating the evapotranspiration observation data at the missing moment according to the meteorological observation data and the MODIS remote sensing data at the missing moment comprises the steps of:
;
in the formula ,interpolation result representing said evapotranspiration observation data,/->Air temperature observation data representing the corresponding moment of absence of evapotranspiration,/->Incident short wave radiation observation data corresponding to evapotranspiration missing moment,Net radiation observations corresponding to the moment of absence of evapotranspiration,>wind speed observation data corresponding to the moment of evapotranspiration loss, < >>And representing normalized vegetation index remote sensing data corresponding to the evapotranspiration occurrence deletion moment.
Further, the data interpolation rate after interpolation by using the random forest method is different if the variable combinations with the same number of the variable combinations include different variable combinations.
Further, according to the meteorological observation data and the MODIS remote sensing data at the moment of missing the evapotranspiration observation data, the evapotranspiration observation data at the moment of missing are sequentially interpolated by adopting a traversing interpolation and updating mode.
Drawings
FIG. 1 is a flowchart of a flux data notch dynamic interpolation method according to embodiment 1 of the present invention;
FIG. 2 is a schematic diagram of the dynamic interpolation of flux data gaps;
fig. 3 is a schematic diagram of a flux data notch dynamic interpolation system according to embodiment 2 of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Example 1
As an embodiment, as shown in fig. 1, to solve the above technical problem, the present embodiment provides a dynamic interpolation method for global flux site evaporation observation loss, including:
obtaining observation data and MODIS remote sensing data of a global flux site; the observation data of the flux site comprises meteorological observation data and flux observation data;
constructing a dynamic interpolation database of the evapotranspiration observation deletion of the global flux site according to the meteorological observation data, the flux observation data and the MODIS remote sensing data;
preprocessing data in a dynamic interpolation database of the evapotranspiration observation deficiency, and constructing an input data set station by station;
acquiring available variables of the global flux site and the number of the available variables from site to site based on an input data set; the available variables are used for representing the observation data and MODIS remote sensing data of the flux site;
according to the available variables, carrying out importance ranking on each available variable site by utilizing a random forest method;
determining variable combinations containing various available variable numbers according to the number of the available variables and the importance ranking result;
calculating interpolation precision and interpolation rate of all variable combinations, and sequencing all variable combinations according to the interpolation precision from high to low;
based on variable combinations comprising various available variable numbers, establishing a dynamic interpolation model of the evapotranspiration observation deficiency of each variable combination station by using a random forest method to obtain a set of the dynamic interpolation models of the evapotranspiration observation deficiency; the evaporative observation missing dynamic interpolation model is used for representing the functional relation between evaporative observation data and variable combination;
and sequentially interpolating the evapotranspiration observation data at the missing moment according to the weather observation data at the missing moment of the evapotranspiration observation data and the MODIS remote sensing data, continuously updating the interpolation rate until the interpolation rate reaches 100%, and marking variable combinations used for interpolating all gaps.
Optionally, the meteorological observation data includes, but is not limited to, observation date, observation time, air temperature, wind speed, atmospheric pressure, air humidity, incident short wave radiation, and net radiation; flux observations include, but are not limited to, vapor emission observations; the MODIS remote sensing data comprises normalized vegetation index data and leaf area index data.
Optionally, preprocessing data in the dynamic interpolation database of the evapotranspiration observation missing, constructing an input data set by site, including:
acquiring observation dates and observation moments, calculating the order of days of the date of the observation data of the flux site in one year one by one, converting the observation moments into the order of 48 half-hour moments of the day of the half-hour moment of the observation data of the flux site, and obtaining a moment order; and combining the day sequence, the time sequence and the meteorological observation data with MODIS data of pixels corresponding to flux sites extracted according to longitude and latitude, and constructing an input data set site by site.
Optionally, based on variable combinations including various available variable numbers, building a dynamic interpolation model of the evapotranspiration observation deficiency of each variable combination site by using a random forest method, including:
is provided withThe function relation of the evapotranspiration observation data and the variable combination is +.>Then:
;
wherein ,the date representing the observations of the flux site is in the order of days in the year, +.>Half hour representing the order of observations of the flux site at 48 half hour times a day,/h>Air temperature representing the moment when no evapotranspiration observation data missing occurs in the input data set,/for>Incident short-wave radiation representing the moment in the input dataset at which no evapotranspiration observed data loss occurs, < >>Represents the net radiation at the moment when no evapotranspiration observation data loss occurs in the input data set,/i>Represents the wind speed at which no missing moment occurs in the input dataset for the evapotranspiration,and the normalized vegetation index representing the moment when no missing is found by evaporation in the input data set.
Optionally, according to the meteorological observation data and the MODIS remote sensing data at the time of missing the evapotranspiration observation data, sequentially interpolating the evapotranspiration observation data at the time of missing, including:
;
in the formula ,representing the interpolation result of the evapotranspiration observation data, < >>Air temperature observation data representing the corresponding moment of absence of evapotranspiration,/->Incident short-wave radiation observation data corresponding to the moment of absence of evapotranspiration, < >>Net radiation observations corresponding to the moment of absence of evapotranspiration,>wind speed observation data corresponding to the moment of evapotranspiration loss, < >>And representing normalized vegetation index remote sensing data corresponding to the evapotranspiration occurrence deletion moment.
Optionally, if the variable combinations with the same number of available variables include different available variables, the data interpolation rate after interpolation by using the random forest method is different.
Optionally, according to the meteorological observation data and the MODIS remote sensing data at the missing moment of the evapotranspiration observation data, sequentially interpolating the evapotranspiration observation data at the missing moment, and interpolating the evapotranspiration observation data at the missing moment by adopting a traversing interpolation and updating mode.
In the practical application process, as shown in fig. 2, the input data set, such as the variable types contained in each site in the FLUXNET2015, is different, the complexity of feature selection is increased, and N-3 variable combinations (the number of the variable combinations is at least 3) are constructed by acquiring the maximum available variable N of each site of the sites A, B, C and D and sequentially decreasing the maximum available variable N on the basis of the values.
For a single site, the types of variables corresponding to different data gaps are different, if a plurality of variables are selected for modeling prediction, only gaps which partially meet variable conditions can be predicted, but a plurality of variable combinations with different numbers are selected, so that the variable conditions required for predicting different gaps can be met, the interpolation rate of site data is greatly improved, and the complete reconstruction of a data set is realized.
In the combination of the same variable number, the data interpolation rates after modeling of different variable combinations are different, in order to achieve maximization of the interpolation rate of an input data set, variable combination interpolation precision of the number of each available variable is calculated, variable combinations with highest interpolation precision are selected, random forest models are trained respectively, and the maximum variable number is the number of available variables of each site.
Optionally, according to the meteorological observation data and the MODIS remote sensing data at the missing moment of the evapotranspiration observation data, adopting a traversing interpolation and updating mode to sequentially interpolate the evapotranspiration observation data at the missing moment.
As shown in fig. 2, for each site, the following steps are implemented to construct a random forest model:
determining all available variables for the site;/>Order of days in one year for the days on which observations of flux sites are located +.>,/>Half hour order of 48 half hours a day for the observation data of the flux station>,/>,/>;/>Is an unexpectedly usable variable in the order of days and half hours.
Using random forest model pairs for removal and />All available variables except +.>Performing importance ranking to obtain a variable sequence which is ranked according to importance in turn: />;
According to the importance ranking result, in and />On the basis of this variable sequence, the available variables are added in sequence to the variable combination (each time +.>At least one variable of the random forest model to be built after adding the variable, preferably at least three variables as inputs to the model), all possible variable combinations are built(/>,/>For the number of variable combinations, +.>For the number of variables>I.e. the number of variables excluding the order of days and half-hour number), such that the amount of data used to construct the random forest model in all combinations of variables reaches a set proportion of the total amount of data.
;
;
;
……
;
;
;
;
……
;
;
;
。
The random forest model is respectively constructed for variable combinations of various available variable quantities as follows:
;
;
;
;
……
;
;
;
;
……
;
;
;
。
in the formula ,representing evapotranspiration observation data, < >>Representing the functional relationship of the evaporative observation with the interpolated input variables. Wherein (1)>Is added to ensure that there is no +.>And half hour order->The interpolation rate of all missing data can still reach 100% when other variables are available. Optionally, parameter tuning is performed by using the out-of-bag data which does not participate in building the random forest model, root mean square error of interpolation rate of the out-of-bag data is calculated, and when the root mean square error is smaller than a set value, parameter tuning is completed.
Testing the interpolation precision of the model to obtain the interpolation precision of m variable combinationsAnd the respective interpolation rate->。
The interpolation precision of m variable combinations is ordered, and the ordered variable combinations are sequentially as followsThe corresponding interpolation rate is +.>。
Variable combination corresponding to highest precisionAnd firstly, combining corresponding evapotranspiration observation data by using the sequenced variables to obtain evapotranspiration data, and interpolating the evapotranspiration observation data at the missing moment one by using the evapotranspiration data.
Selecting the evapotranspiration data with high interpolation rate as evapotranspiration observation data for random forest models with similar interpolation precision, and selecting random forest models with few usable variables in variable combinations to obtain evapotranspiration data if the interpolation rates are similar; in the process of dynamic interpolation, for each piece of data to be interpolated, marking variable combinations used for interpolation of empty data, and updating the current interpolation rate of the flux station until the interpolation rate of the flux station reaches 100%, wherein the specific process is as follows:
,/>;
,/>;
……
,
;
in the formula ,represents->Group interpolation data,/->For the number of variables to be interpolated per time, +.>For the total number of site data>Dynamic interpolation model representing the evapotranspiration observation deletion with different precision from high to low,/for the case of the model>Representing complete +.>The latest interpolation rate after group interpolation.
If the interpolation rate reaches 100%, the interpolation is finished; if the interpolation rate is less than 100%, then useAnd interpolating the remaining gaps until the interpolation rate reaches 100%.
Compared with the traditional method of interpolating the data gap by machine learning by utilizing fixed variable combinations, the method is not limited to the types and the quantity of the variables corresponding to the time of the data gap, different variable combinations can be selected for different gaps, and the maximization of the data interpolation rate is realized on the basis of ensuring the interpolation precision. Compared with a marginal distribution sampling (Marginal Distribution Sampling, abbreviated MDS) method adopted in FLUXNET2015 data set, the method adopts a random forest algorithm in machine learning, can acquire the maximum number of variable combinations based on available variables of different sites, trains out a random forest model with high robustness and strong predictability based on different variable combinations, acquires the precision of different variable combinations based on a test set, and finally realizes high-precision large-scale interpolation by sequencing and then interpolation according to modeling precision.
The method and the device determine the maximum variable number according to different sites, so that the method and the device are used for determining variable combinations of various variable numbers; the specific variables of each combination are determined according to the interpolation rate of the data gap, and the data gap is interpolated from high to low according to the modeling precision of each variable combination, so that dynamic data interpolation can be realized, the interpolation rate and the interpolation precision are considered, and the interpolation rate reaches 100%.
Example 2
Based on the same principle as the method shown in embodiment 1 of the present invention, as shown in fig. 3, there is further provided a dynamic interpolation system for global flux site evaporation observation loss in an embodiment of the present invention, including a first acquisition unit, a database construction unit, a preprocessing unit, a second acquisition unit, a first ranking unit, a determination unit, a second ranking unit, a model establishment unit, and an interpolation update unit:
the first acquisition unit is used for acquiring the observation data of the global flux site and the MODIS remote sensing data; the observation data of the flux site comprises meteorological observation data and flux observation data;
the database construction unit is used for constructing a dynamic interpolation database of the evapotranspiration observation deletion of the global flux site according to the meteorological observation data, the flux observation data and the MODIS remote sensing data;
the preprocessing unit is used for preprocessing the data in the dynamic interpolation database of the evapotranspiration observation deletion, and constructing an input data set station by station;
the second acquisition unit is used for acquiring the available variables of the global flux site and the number of the available variables from site to site based on the input data set; available variables include the order of days in one year and the order of half hours in one day of the observation data for the flux site;
the first ordering unit is used for ordering the importance of each available variable site by utilizing a random forest method according to the available variable;
a determining unit for determining variable combinations including the number of the various available variables according to the number of the various available variables and the result of the importance ranking;
the second sorting unit is used for calculating the interpolation precision and the interpolation rate of all the variable combinations and sorting the variable combinations from high to low according to the interpolation precision;
the model building unit is used for building the evapotranspiration observation missing dynamic interpolation model of each variable combination station by utilizing a random forest method based on variable combinations comprising various available variable numbers to obtain a set of the evapotranspiration observation missing dynamic interpolation model; the evaporative observation missing dynamic interpolation model is used for representing the functional relation between evaporative observation data and variable combination;
and the interpolation updating unit is used for sequentially interpolating the evapotranspiration observation data at the missing moment according to the weather observation data at the missing moment of the evapotranspiration observation data and the MODIS remote sensing data, continuously updating the interpolation rate until the interpolation rate reaches 100%, and marking variable combinations used for interpolating all the gaps.
Optionally, the meteorological observation data comprises observation date, observation time, air temperature, wind speed, atmospheric pressure, air humidity, incident short wave radiation and net radiation; flux observation data includes evapotranspiration data; the MODIS remote sensing data comprises normalized vegetation index data and leaf area index data.
Optionally, preprocessing data in the dynamic interpolation database of the evapotranspiration observation missing, constructing an input data set by site, including:
acquiring observation dates and observation moments, calculating the order of days of the date of the observation data of the flux site in one year one by one, converting the observation moments into the order of 48 half-hour moments of the day of the half-hour moment of the observation data of the flux site, and obtaining a moment order; and combining the day sequence, the time sequence and the meteorological observation data with MODIS data of pixels corresponding to flux sites extracted according to longitude and latitude, and constructing an input data set site by site.
Optionally, based on variable combinations including various available variable numbers, building a dynamic interpolation model of the evapotranspiration observation deficiency of each variable combination site by using a random forest method, including:
is provided withThe function relation is +.>Then:
;
wherein ,the date representing the observations of the flux site is in the order of days in the year, +.>Half hour of observation data representing a flux siteOrder inscribed in 48 hours a day, +.>Air temperature representing the moment of absence of vapor emission in the input dataset, < >>Incident short-wave radiation representing the moment of absence of evaporation in the input dataset, < >>Net radiation representing the moment in time when no deletion occurred in the evaporation of the input dataset,>wind speed representing moment of absence of evaporation in input dataset, < >>And the normalized vegetation index representing the moment when no missing is found by evaporation in the input data set.
Optionally, according to the meteorological observation data and the MODIS remote sensing data at the time of missing the evapotranspiration observation data, sequentially interpolating the evapotranspiration observation data at the time of missing, including:
;
in the formula ,representing the interpolation result of the evapotranspiration observation data, < >>Air temperature observation data representing the corresponding moment of absence of evapotranspiration,/->Incident short-wave radiation observation data corresponding to the moment of absence of evapotranspiration, < >>Net radiation observations corresponding to the moment of absence of evapotranspiration,>wind speed observation data corresponding to the moment of evapotranspiration loss, < >>And representing normalized vegetation index remote sensing data corresponding to the evapotranspiration occurrence deletion moment.
Optionally, if the variable combinations with the same number of available variables include different available variables, the data interpolation rate after interpolation by using the random forest method is different.
Optionally, according to the meteorological observation data and the MODIS remote sensing data at the missing moment of the evapotranspiration observation data, sequentially interpolating the evapotranspiration observation data at the missing moment, and interpolating the evapotranspiration observation data at the missing moment by adopting a traversing interpolation and updating mode.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (8)
1. The dynamic interpolation method for the global flux site evapotranspiration observation missing is characterized by comprising the following steps:
obtaining observation data and MODIS remote sensing data of a global flux site; the observation data of the flux site comprises meteorological observation data and flux observation data;
constructing a dynamic interpolation database of the evapotranspiration observation deletion of the global flux site according to the meteorological observation data, the flux observation data and the MODIS remote sensing data;
preprocessing data in the dynamic interpolation database of the evapotranspiration observation deficiency, and constructing an input data set station by station;
acquiring available variables of each flux site and the number of the available variables from site to site based on the input data set; the usable variables are used for representing the observation data of the flux site and the MODIS remote sensing data;
according to the available variables, carrying out importance ranking on each available variable site by utilizing a random forest method;
determining variable combinations comprising the number of various available variables according to the number of the available variables and the result of the importance ranking;
calculating interpolation precision and interpolation rate of all the variable combinations, and sequencing the variable combinations according to the interpolation precision from high to low;
based on the variable combinations containing various available variable quantities, establishing a dynamic interpolation model of the evapotranspiration observation deficiency of each variable combination station by utilizing a random forest method to obtain a set of the dynamic interpolation models of the evapotranspiration observation deficiency; the evapotranspiration observation missing dynamic interpolation model is used for representing the functional relation between evapotranspiration observation data and the variable combination;
and sequentially interpolating the evapotranspiration observation data at the missing moment according to the meteorological observation data and the MODIS remote sensing data at the missing moment, continuously updating the interpolation rate until the interpolation rate reaches 100%, and marking the variable combination used for interpolating each gap.
2. The dynamic interpolation method of global flux site evapotranspiration observation loss according to claim 1, wherein the meteorological observation data comprises observation date, observation time, air temperature, wind speed, atmospheric pressure, air humidity, incident short wave radiation and net radiation; the flux observation data includes the evapotranspiration observation data; the MODIS remote sensing data comprises normalized vegetation index data and leaf area index data.
3. A method of dynamic interpolation for the loss of transpiration observations at a global flux site according to claim 1 wherein preprocessing the data in the dynamic interpolation database for the loss of transpiration observations, constructing an input dataset site by site, comprises:
acquiring observation dates and observation moments, calculating the number of days sequence of the date of the observation data of the flux station in one year one by one, converting the observation moments into the sequence of the half hour moment of the observation data of the flux station in 48 half hour moments of the day, and obtaining a moment sequence; and combining the number of days sequence, the time sequence and the meteorological observation data with MODIS data of pixels corresponding to the flux sites extracted according to longitude and latitude, and constructing an input data set site by site.
4. A method of dynamic interpolation of global flux site evapotranspiration observation loss according to claim 1, wherein establishing a dynamic interpolation model of evapotranspiration observation loss for each of said variable combinations site by site using a random forest method based on said variable combinations including the various amounts of said variables available comprises:
is provided withRepresenting said evapotranspiration observation data in said input dataset, said functional relationship being +.>Then:
;
wherein ,the order of days in the year of the date representing the observations of the flux site, +.>The half hour time at which the observations representing the flux site are located is at 48 half hour times of the dayOrder of (1)>Air temperature representing the moment of absence of evaporation in said input dataset,/for>Incident short-wave radiation representing the moment of absence of evaporation in the input dataset,/>A net radiation representing the moment of absence of evaporation in said input dataset,/for>A wind speed representing the moment when no missing is present in the input dataset,and (3) representing the normalized vegetation index of the evaporation non-missing moment in the input data set.
5. The dynamic interpolation method for global flux site evapotranspiration observation loss according to claim 1, wherein sequentially interpolating the evapotranspiration observation data at a loss moment from the meteorological observation data and the MODIS remote sensing data at the loss moment of the evapotranspiration observation data comprises:
;
in the formula ,interpolation result representing said evapotranspiration observation data,/->Represents the air temperature observation data corresponding to the moment of missing of the evapotranspiration,/>incident short-wave radiation observation data corresponding to the moment of absence of evapotranspiration, < >>Net radiation observations corresponding to the moment of absence of evapotranspiration,>wind speed observation data corresponding to the moment of evapotranspiration loss, < >>And representing normalized vegetation index remote sensing data corresponding to the evapotranspiration occurrence deletion moment.
6. A method of dynamic interpolation for global flux site evapotranspiration observation loss according to claim 1, wherein the data interpolation rate after interpolation by a random forest method is different if the variable combinations included in the variable combinations are different among the variable combinations of the same number of the available variables.
7. The dynamic interpolation method for the missing of the evapotranspiration observation of the global flux site according to claim 1, wherein the evapotranspiration observation data at the missing moment is sequentially interpolated by adopting a traversing interpolation and updating mode according to the meteorological observation data and the MODIS remote sensing data at the missing moment of the evapotranspiration observation data.
8. The dynamic interpolation system for the global flux site transpiration observation missing is characterized by comprising a first acquisition unit, a database construction unit, a preprocessing unit, a second acquisition unit, a first ordering unit, a determination unit, a second ordering unit, a model establishment unit and an interpolation updating unit:
the first acquisition unit is used for acquiring the observation data of the global flux site and the MODIS remote sensing data; the observation data of the flux site comprises meteorological observation data and flux observation data;
the database construction unit is used for constructing a dynamic interpolation database of the evapotranspiration observation deletion of the global flux site according to the meteorological observation data, the flux observation data and the MODIS remote sensing data;
the preprocessing unit is used for preprocessing the data in the dynamic interpolation database of the evapotranspiration observation deficiency, and constructing an input data set station by station;
a second obtaining unit, configured to obtain, based on the input data set, available variables of each of the flux sites and the number of the available variables from site to site; the usable variables are used for representing the observation data of the flux site and the MODIS remote sensing data;
the first sorting unit is used for sorting the importance of each available variable site by utilizing a random forest method according to the available variable;
a determining unit configured to determine variable combinations including the number of the various available variables based on the number of the available variables and a result of the importance ranking;
the second sorting unit is used for calculating the interpolation precision and the interpolation rate of all the variable combinations and sorting the variable combinations from high to low according to the interpolation precision;
the model building unit is used for building the evapotranspiration observation missing dynamic interpolation models of the variable combinations station by utilizing a random forest method based on the variable combinations comprising various available variable numbers to obtain a set of the evapotranspiration observation missing dynamic interpolation models; the evapotranspiration observation missing dynamic interpolation model is used for representing the functional relation between evapotranspiration observation data and the variable combination;
and the interpolation updating unit is used for sequentially interpolating the evapotranspiration observation data at the missing moment according to the meteorological observation data and the MODIS remote sensing data at the missing moment, continuously updating the interpolation rate until the interpolation rate reaches 100%, and marking the variable combination used for interpolation of each gap.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310750877.1A CN116502050B (en) | 2023-06-25 | 2023-06-25 | Dynamic interpolation method and system for global flux site evapotranspiration observation loss |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310750877.1A CN116502050B (en) | 2023-06-25 | 2023-06-25 | Dynamic interpolation method and system for global flux site evapotranspiration observation loss |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116502050A true CN116502050A (en) | 2023-07-28 |
CN116502050B CN116502050B (en) | 2023-09-15 |
Family
ID=87318696
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310750877.1A Active CN116502050B (en) | 2023-06-25 | 2023-06-25 | Dynamic interpolation method and system for global flux site evapotranspiration observation loss |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116502050B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117609706A (en) * | 2023-10-20 | 2024-02-27 | 北京师范大学 | Method for interpolating data of carbon water flux |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107577649A (en) * | 2017-09-26 | 2018-01-12 | 广州供电局有限公司 | The interpolation processing method and device of missing data |
CN108984792A (en) * | 2018-08-02 | 2018-12-11 | 中国科学院地理科学与资源研究所 | Utilize the method for the eddy flux observation data of the not political reform interpolation ground ALPHA missing |
CN109840260A (en) * | 2019-02-02 | 2019-06-04 | 中国水利水电科学研究院 | A kind of extensive real-time rainfall automatic Observation station ranked data processing method based on dynamic interpolation |
CN112991247A (en) * | 2021-03-04 | 2021-06-18 | 河南省气象科学研究所 | Winter wheat evapotranspiration remote sensing inversion and crop model assimilation method |
CN115423163A (en) * | 2022-08-24 | 2022-12-02 | 中国地质大学(武汉) | Method and device for predicting short-term flood events of drainage basin and terminal equipment |
-
2023
- 2023-06-25 CN CN202310750877.1A patent/CN116502050B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107577649A (en) * | 2017-09-26 | 2018-01-12 | 广州供电局有限公司 | The interpolation processing method and device of missing data |
CN108984792A (en) * | 2018-08-02 | 2018-12-11 | 中国科学院地理科学与资源研究所 | Utilize the method for the eddy flux observation data of the not political reform interpolation ground ALPHA missing |
CN109840260A (en) * | 2019-02-02 | 2019-06-04 | 中国水利水电科学研究院 | A kind of extensive real-time rainfall automatic Observation station ranked data processing method based on dynamic interpolation |
CN112991247A (en) * | 2021-03-04 | 2021-06-18 | 河南省气象科学研究所 | Winter wheat evapotranspiration remote sensing inversion and crop model assimilation method |
CN115423163A (en) * | 2022-08-24 | 2022-12-02 | 中国地质大学(武汉) | Method and device for predicting short-term flood events of drainage basin and terminal equipment |
Non-Patent Citations (4)
Title |
---|
MENG LIU 等: "Global Land Surface Evapotranspiration Estimation From Meteorological and Satellite Data Using the Support Vector Machine and Semiempirical Algorithm", IEEE * |
刘?;何祺胜;荆琛琳;李金阳;陈丽;: "基于机器学习的蒸散量插补方法", 河海大学学报(自然科学版), no. 02 * |
刘萌 等: "数据驱动的蒸散发遥感反演方法及产品研究进展", 《遥感雪豹》 * |
白洁;刘绍民;丁晓萍;卢俐;: "大孔径闪烁仪观测数据的处理方法研究", 地球科学进展, no. 11 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117609706A (en) * | 2023-10-20 | 2024-02-27 | 北京师范大学 | Method for interpolating data of carbon water flux |
CN117609706B (en) * | 2023-10-20 | 2024-06-04 | 北京师范大学 | Method for interpolating data of carbon water flux |
Also Published As
Publication number | Publication date |
---|---|
CN116502050B (en) | 2023-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cai et al. | Global models and predictions of plant diversity based on advanced machine learning techniques | |
CN111723482B (en) | Satellite CO-based 2 Method for inverting surface carbon flux by column concentration observation | |
CN105912836B (en) | A kind of watershed hydrologic cycle analogy method of pure remotely-sensed data driving | |
CN101916337B (en) | Geographic information system-based dynamic prediction method for rice production potential | |
CN110719336A (en) | Irrigation water analysis monitoring system based on Internet of things | |
CN109800921B (en) | Regional winter wheat yield estimation method based on remote sensing phenological assimilation and particle swarm optimization | |
CN116502050B (en) | Dynamic interpolation method and system for global flux site evapotranspiration observation loss | |
CN109472283B (en) | Dangerous weather prediction method and device based on multiple incremental regression tree model | |
CN114254802B (en) | Prediction method for vegetation coverage space-time change under climate change drive | |
CN108319772A (en) | A kind of analysis method again of wave long term data | |
CN108205718B (en) | Grain crop sampling yield measurement method and system | |
CN115345076A (en) | Wind speed correction processing method and device | |
CN115575601A (en) | Vegetation drought index evaluation method and system based on water vapor flux divergence | |
CN117408430A (en) | Soil improvement evaluation system for agricultural planting based on big data | |
CN116401882A (en) | Method for evaluating influence of surface ozone on winter wheat yield based on multi-factor stress | |
CN115758074A (en) | High spatial resolution seawater carbon dioxide partial pressure reconstruction method based on multi-source data | |
CN118364975A (en) | Wheat yield prediction method of multi-source data-driven hybrid mechanism learning model | |
CN107437262B (en) | Crop planting area early warning method and system | |
CN117217632A (en) | Model estimation-based farmland carbon flux assessment method | |
CN113009108A (en) | Prediction method for predicting soil organic carbon content based on hydrothermal conditions | |
CN117493476A (en) | Runoff backtracking simulation method and system integrating physical mechanism and artificial intelligence | |
CN112580899A (en) | Medium-and-long-term wind power generation prediction method and system fused with machine learning model | |
CN112699287A (en) | Configurable automatic model data preprocessing and distributing method and system | |
CN109858678A (en) | A kind of method and system of determining sunflower Meteorological Output | |
CN115481366A (en) | Method for measuring and calculating farmland resource production potential based on space downscaling regression model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |