CN117973237A - Domain-free runoff simulation method based on domain adaptation and machine learning - Google Patents

Domain-free runoff simulation method based on domain adaptation and machine learning Download PDF

Info

Publication number
CN117973237A
CN117973237A CN202410387927.9A CN202410387927A CN117973237A CN 117973237 A CN117973237 A CN 117973237A CN 202410387927 A CN202410387927 A CN 202410387927A CN 117973237 A CN117973237 A CN 117973237A
Authority
CN
China
Prior art keywords
data
input
domain
runoff
watershed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410387927.9A
Other languages
Chinese (zh)
Other versions
CN117973237B (en
Inventor
陈能汪
余镒琦
梁中耀
李少斌
赵昕叶
李冰菲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202410387927.9A priority Critical patent/CN117973237B/en
Publication of CN117973237A publication Critical patent/CN117973237A/en
Application granted granted Critical
Publication of CN117973237B publication Critical patent/CN117973237B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data-free stream-domain runoff simulation method based on field adaptation and machine learning, which comprises the following steps of: s1, respectively collecting meteorological data and geographic data with and without data watershed, collecting runoff data with the data watershed, and preprocessing data; s2, selecting model input parameters by using the importance degree of random forest features and a recursive feature elimination algorithm; s3, using a kernel average matching algorithm to allocate weight to each sample in the data set input with the data watershed, and reducing the average distribution difference of the data set input between the data watershed and the data-free watershed; s4, constructing a random forest model by utilizing the data-stream-domain input data set processed by the kernel average matching algorithm, and inputting the data-stream-domain-free input data set into the model for runoff simulation; the method is flexible and easy to use, has low dependence on data quantity, and can effectively improve the runoff simulation precision of the data-free watershed.

Description

Domain-free runoff simulation method based on domain adaptation and machine learning
Technical Field
The invention relates to the technical field of runoff simulation, in particular to a data-free runoff simulation method of a runoff domain based on domain adaptation and machine learning.
Background
Runoff simulation is of great importance in water resource management and environmental protection. By simulating the river runoff, the influence of rainfall events on the hydrologic process of the river basin can be better known, and technical support is provided for water resource utilization planning and environmental capacity calculation.
Common runoff simulation models include process mechanism models (e.g., SWAT, HEC-HMS, etc.) and machine learning models (e.g., support vector machines, random forests, etc.). In recent years, machine learning models have been increasingly applied to the field of runoff simulation, showing unique advantages in many scenarios. One of the advantages of the machine learning runoff simulation method is its flexibility and generalization capability. The process mechanism model is often based on a complex river basin surface and underground hydrologic process and a river channel converging process, a large number of input parameters are needed, and the machine learning model can learn interaction modes among parameters from runoff historical observation data, so that dependence on priori knowledge is reduced. In addition, the machine learning model can also process nonlinear and complex hydrological response relations, is better suitable for various river basin characteristics (precipitation, topography, soil, vegetation and other meteorological and geographic characteristics), and improves simulation precision.
However, machine learning models also have data dependency problems. In a runoff monitoring data-free river basin, the characteristics of the river basin and the runoff response relation cannot be directly determined. Therefore, how to construct a machine learning model in a runoff-free data river basin is a great technical problem faced in the technical field of runoff simulation.
Disclosure of Invention
The invention aims to provide a runoff simulation method without a data stream basin based on field adaptation and machine learning, which is flexible and easy to use, has low dependence on data quantity and can effectively improve the runoff simulation precision without the data stream basin.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
A data-stream-domain-free runoff simulation method based on field adaptation and machine learning comprises the following steps:
S1, respectively collecting meteorological data and geographic data with and without data watershed, collecting runoff data with the data watershed, and preprocessing data;
S2, selecting model input parameters by using the importance degree of random forest features and a recursive feature elimination algorithm;
the specific process of step S2 is as follows:
S21, constructing a random forest model by taking meteorological data and geographic data with data watershed as input data and runoff data as output data;
s22, sorting all input parameters according to the feature importance degree obtained by the random forest model, and removing the input parameters with the lowest feature importance degree;
S23, setting the number of target input parameters, and repeating the steps S21-S22 to gradually remove the input parameters until the number of the input parameters is consistent with the number of the target input parameters;
s3, using a kernel average matching algorithm to allocate weight to each sample in the data set input with the data watershed, and reducing the average distribution difference of the data set input between the data watershed and the data-free watershed;
S4, constructing a random forest model by utilizing the data-stream-domain input data set processed by the kernel average matching algorithm, and inputting the data-stream-domain-free input data set into the model for runoff simulation.
Preferably, the meteorological data in step S1 includes precipitation, average wind speed for 2 minutes, average daily air temperature, maximum daily air temperature, minimum daily air temperature and average daily relative humidity; the geographic data comprise soil moisture content, normalized vegetation index, vegetation coverage, gradient, river network density and catchment area; the runoff data refers to historical runoff monitoring data of a data river basin.
Preferably, the preprocessing in step S1 includes removing non-numerical data, removing abnormal data, and aligning data time resolutions; the step of eliminating non-numerical data is to eliminate the data of character strings and null types; the abnormal data is removed by removing the abnormal data exceeding a set threshold value; the time resolution alignment means that the time resolution of the data is unified to the month frequency by calculating an average value.
Preferably, the specific process of step S3 is:
s31, calculating a kernel matrix by taking the radial basis function as a kernel function And a nuclear matrix/>The calculation formula is as follows:;/>;/> In the/> Is a radial basis function; /(I)An i-th sample in the input dataset; /(I)For the j-th sample in the input dataset; bandwidth parameters that are radial basis functions; /(I) Is an input dataset with data fields; /(I)Is an input dataset without a data stream field; nuclear moment array/>A result matrix is obtained by carrying out pairwise calculation on all samples in the data set input data with the data stream domain by using a radial basis function; nuclear matrix/>The result matrix is obtained by carrying out pairwise calculation on all samples in the data set input with the data stream domain and the data set without the data stream domain by using a radial basis function;
s32, calculating an optimal weight distribution vector The calculation formula is as follows: /(I)In the above, the ratio of/>Assigning a vector to the weight; /(I)Distributing vectors for the weights corresponding to the ith sample; t is a matrix transpose symbol; /(I)The number of samples with data fields; /(I)Sample number for no data stream field; Is of length/> Is the full 1 vector of (2); b is a weight boundary coefficient; /(I)Is a constraint coefficient;
S33, calculating the data stream domain input data set with the sample weight distribution The calculation formula is as follows: In the above, the ratio of/> For/>The i-th element of (a); /(I)For/>The i-th element of (a); /(I)For/>I-th element of (a) in the list.
Preferably, the specific process of step S4 is:
S41, inputting the data stream domain with the weight allocated into a data set According to 80 percent: the proportion of 20% is divided into a training set and a testing set;
s42, under different hyper-parameter combinations, training a random forest model by using training set data, and calculating NSE of an actual measurement value and a model predicted value by using test set data, wherein the hyper-parameter combination when NSE is highest is the final hyper-parameter of the random forest model; wherein, the calculation formula of NSE is: wherein n is the number of samples in the test set; /(I) Is the actual measurement value; /(I)Is a model predictive value; /(I)The average value of the measured values in the test set;
S43, inputting the input data set without the data watershed into a random forest model to obtain a corresponding runoff simulation value.
After the technical scheme is adopted, the invention has the following beneficial effects: firstly, the data-free basin runoff simulation method based on field adaptation and machine learning has excellent flexibility and usability, can perform runoff simulation even in a basin lacking runoff monitoring data, and provides powerful technical support for water resource utilization planning and environment capacity calculation; and secondly, the invention adopts a kernel mean matching algorithm, can effectively measure the data distribution difference between the data-containing watershed and the non-data-containing watershed, not only reduces the dependence of the runoff simulation model on data, but also improves the generalization capability of the model, and improves the accuracy and reliability of the simulation result. Thus, the present invention provides an innovative and viable solution to the problem of runoff simulation without data watershed.
Drawings
FIG. 1 is a flow chart of the present invention;
Fig. 2 is a flow chart of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 1 to 2, a data-stream-domain-free runoff simulation method based on domain adaptation and machine learning includes the following steps:
S1, respectively collecting meteorological data and geographic data with and without data watershed, collecting runoff data with the data watershed, and preprocessing data;
The meteorological data in the step S1 comprises precipitation, average wind speed of 2 minutes, average daily air temperature, highest daily air temperature, lowest daily air temperature and average daily relative humidity; the geographic data comprise soil moisture content, normalized vegetation index, vegetation coverage, gradient, river network density and catchment area; the runoff data refers to historical runoff monitoring data with data watershed;
The preprocessing in the step S1 comprises the steps of eliminating non-numerical data, eliminating abnormal data and aligning the time resolution of the data; the step of eliminating non-numerical data is to eliminate the data of character strings and null types; the abnormal data is removed by removing the abnormal data exceeding a set threshold value; the time resolution alignment means that the time resolution of the data is unified into month frequency in a mode of calculating an average value;
S2, selecting model input parameters by using the importance degree of random forest features and a recursive feature elimination algorithm;
the specific process of step S2 is as follows:
S21, constructing a random forest model by taking meteorological data and geographic data with data watershed as input data and runoff data as output data;
s22, sorting all input parameters according to the feature importance degree obtained by the random forest model, and removing the input parameters with the lowest feature importance degree;
S23, setting the number of target input parameters, and repeating the steps S21-S22 to gradually remove the input parameters until the number of the input parameters is consistent with the number of the target input parameters;
s3, using a kernel average matching algorithm to allocate weight to each sample in the data set input with the data watershed, and reducing the average distribution difference of the data set input between the data watershed and the data-free watershed;
the specific process of step S3 is as follows:
s31, calculating a kernel matrix by taking the radial basis function as a kernel function And a nuclear matrix/>The calculation formula is as follows:;/>;/> In the/> Is a radial basis function; /(I)An i-th sample in the input dataset; /(I)For the j-th sample in the input dataset; /(I)Bandwidth parameters that are radial basis functions; /(I)Is an input dataset with data fields; /(I)Is an input dataset without a data stream field; nuclear moment array/>A result matrix is obtained by carrying out pairwise calculation on all samples in the data set input data with the data stream domain by using a radial basis function; nuclear matrix/>The result matrix is obtained by carrying out pairwise calculation on all samples in the data set input with the data stream domain and the data set without the data stream domain by using a radial basis function;
s32, calculating an optimal weight distribution vector The calculation formula is as follows: /(I)In the above, the ratio of/>Assigning a vector to the weight; /(I)Distributing vectors for the weights corresponding to the ith sample; t is a matrix transpose symbol; /(I)The number of samples with data fields; /(I)Sample number for no data stream field; Is of length/> Is the full 1 vector of (2); b is a weight boundary coefficient; /(I)Is a constraint coefficient;
S33, calculating the data stream domain input data set with the sample weight distribution The calculation formula is as follows: In the above, the ratio of/> For/>The i-th element of (a); /(I)For/>The i-th element of (a); /(I)For/>The i-th element of (a);
s4, constructing a random forest model by utilizing the data-stream-domain input data set processed by the kernel average matching algorithm, and inputting the data-stream-domain-free input data set into the model for runoff simulation;
The specific process of step S4 is:
S41, inputting the data stream domain with the weight allocated into a data set According to 80 percent: the proportion of 20% is divided into a training set and a testing set;
s42, under different hyper-parameter combinations, training a random forest model by using training set data, and calculating NSE of an actual measurement value and a model predicted value by using test set data, wherein the hyper-parameter combination when NSE is highest is the final hyper-parameter of the random forest model; wherein, the calculation formula of NSE is: wherein n is the number of samples in the test set; /(I) Is the actual measurement value; /(I)Is a model predictive value; /(I)The average value of the measured values in the test set;
S43, inputting the input data set without the data watershed into a random forest model to obtain a corresponding runoff simulation value.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (5)

1. The method for simulating the runoff of the non-data stream domain based on the field adaptation and the machine learning is characterized by comprising the following steps of:
S1, respectively collecting meteorological data and geographic data with and without data watershed, collecting runoff data with the data watershed, and preprocessing data;
S2, selecting model input parameters by using the importance degree of random forest features and a recursive feature elimination algorithm;
the specific process of step S2 is as follows:
S21, constructing a random forest model by taking meteorological data and geographic data with data watershed as input data and runoff data as output data;
s22, sorting all input parameters according to the feature importance degree obtained by the random forest model, and removing the input parameters with the lowest feature importance degree;
S23, setting the number of target input parameters, and repeating the steps S21-S22 to gradually remove the input parameters until the number of the input parameters is consistent with the number of the target input parameters;
s3, using a kernel average matching algorithm to allocate weight to each sample in the data set input with the data watershed, and reducing the average distribution difference of the data set input between the data watershed and the data-free watershed;
S4, constructing a random forest model by utilizing the data-stream-domain input data set processed by the kernel average matching algorithm, and inputting the data-stream-domain-free input data set into the model for runoff simulation.
2. The method for simulating runoff in a data-free stream domain based on domain adaptation and machine learning as claimed in claim 1, wherein: the meteorological data in the step S1 comprises precipitation, average wind speed of 2 minutes, average daily air temperature, highest daily air temperature, lowest daily air temperature and average daily relative humidity; the geographic data comprise soil moisture content, normalized vegetation index, vegetation coverage, gradient, river network density and catchment area; the runoff data refers to historical runoff monitoring data of a data river basin.
3. The method for simulating runoff in a data-free stream domain based on domain adaptation and machine learning as claimed in claim 1, wherein: the preprocessing in the step S1 comprises the steps of eliminating non-numerical data, eliminating abnormal data and aligning the time resolution of the data; the step of eliminating non-numerical data is to eliminate the data of character strings and null types; the abnormal data is removed by removing the abnormal data exceeding a set threshold value; the time resolution alignment means that the time resolution of the data is unified to the month frequency by calculating an average value.
4. The method for simulating runoff in no data stream domain based on domain adaptation and machine learning as claimed in claim 1, wherein the specific process of step S3 is as follows:
s31, calculating a kernel matrix by taking the radial basis function as a kernel function And a nuclear matrix/>The calculation formula is as follows:;/>;/> In the/> Is a radial basis function; /(I)An i-th sample in the input dataset; /(I)For the j-th sample in the input dataset; bandwidth parameters that are radial basis functions; /(I) Is an input dataset with data fields; /(I)Is an input dataset without a data stream field; nuclear moment array/>A result matrix is obtained by carrying out pairwise calculation on all samples in the data set input data with the data stream domain by using a radial basis function; nuclear matrix/>The result matrix is obtained by carrying out pairwise calculation on all samples in the data set input with the data stream domain and the data set without the data stream domain by using a radial basis function;
s32, calculating an optimal weight distribution vector The calculation formula is as follows: /(I)In the above, the ratio of/>Assigning a vector to the weight; /(I)Distributing vectors for the weights corresponding to the ith sample; t is a matrix transpose symbol; /(I)The number of samples with data fields; /(I)Sample number for no data stream field; Is of length/> Is the full 1 vector of (2); b is a weight boundary coefficient; /(I)Is a constraint coefficient;
S33, calculating the data stream domain input data set with the sample weight distribution The calculation formula is as follows: /(I)In the above, the ratio of/>For/>The i-th element of (a); /(I)For/>The i-th element of (a); /(I)For/>I-th element of (a) in the list.
5. The method for simulating runoff in no-data stream domain based on domain adaptation and machine learning as claimed in claim 1, wherein the specific process of step S4 is as follows:
S41, inputting the data stream domain with the weight allocated into a data set According to 80 percent: the proportion of 20% is divided into a training set and a testing set;
s42, under different hyper-parameter combinations, training a random forest model by using training set data, and calculating NSE of an actual measurement value and a model predicted value by using test set data, wherein the hyper-parameter combination when NSE is highest is the final hyper-parameter of the random forest model; wherein, the calculation formula of NSE is: wherein n is the number of samples in the test set; /(I) Is the actual measurement value; /(I)Is a model predictive value; /(I)The average value of the measured values in the test set;
S43, inputting the input data set without the data watershed into a random forest model to obtain a corresponding runoff simulation value.
CN202410387927.9A 2024-04-01 2024-04-01 Domain-free runoff simulation method based on domain adaptation and machine learning Active CN117973237B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410387927.9A CN117973237B (en) 2024-04-01 2024-04-01 Domain-free runoff simulation method based on domain adaptation and machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410387927.9A CN117973237B (en) 2024-04-01 2024-04-01 Domain-free runoff simulation method based on domain adaptation and machine learning

Publications (2)

Publication Number Publication Date
CN117973237A true CN117973237A (en) 2024-05-03
CN117973237B CN117973237B (en) 2024-06-25

Family

ID=90859974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410387927.9A Active CN117973237B (en) 2024-04-01 2024-04-01 Domain-free runoff simulation method based on domain adaptation and machine learning

Country Status (1)

Country Link
CN (1) CN117973237B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AR109623A1 (en) * 2018-02-16 2019-01-09 Pescarmona Enrique Menotti PROCESS AND SYSTEM OF ANALYSIS AND HYDROLOGICAL MANAGEMENT FOR BASINS
CN113762615A (en) * 2021-09-01 2021-12-07 清华大学 Flood prediction method and device, computer equipment and storage medium
CN117407802A (en) * 2023-09-20 2024-01-16 华北水利水电大学 Runoff prediction method based on improved depth forest model
CN117493476A (en) * 2023-10-25 2024-02-02 武汉大学 Runoff backtracking simulation method and system integrating physical mechanism and artificial intelligence
CN117688770A (en) * 2023-12-15 2024-03-12 云南省水文水资源局 Long-term runoff depth simulation method for hydrological weather coupling
CN117787081A (en) * 2023-11-22 2024-03-29 济南大学 Hydrological model parameter uncertainty analysis method based on Morris and Sobol methods

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AR109623A1 (en) * 2018-02-16 2019-01-09 Pescarmona Enrique Menotti PROCESS AND SYSTEM OF ANALYSIS AND HYDROLOGICAL MANAGEMENT FOR BASINS
CN113762615A (en) * 2021-09-01 2021-12-07 清华大学 Flood prediction method and device, computer equipment and storage medium
CN117407802A (en) * 2023-09-20 2024-01-16 华北水利水电大学 Runoff prediction method based on improved depth forest model
CN117493476A (en) * 2023-10-25 2024-02-02 武汉大学 Runoff backtracking simulation method and system integrating physical mechanism and artificial intelligence
CN117787081A (en) * 2023-11-22 2024-03-29 济南大学 Hydrological model parameter uncertainty analysis method based on Morris and Sobol methods
CN117688770A (en) * 2023-12-15 2024-03-12 云南省水文水资源局 Long-term runoff depth simulation method for hydrological weather coupling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
龚珺夫: "新安江模型在资料匮乏的长江中下游山区中小流域洪水预报应用", 《湖泊科学》, vol. 33, no. 2, 30 April 2021 (2021-04-30) *

Also Published As

Publication number Publication date
CN117973237B (en) 2024-06-25

Similar Documents

Publication Publication Date Title
Vandal et al. Intercomparison of machine learning methods for statistical downscaling: the case of daily and extreme precipitation
CN114254561B (en) Waterlogging prediction method, system and storage medium
Lu et al. Streamflow simulation in data-scarce basins using Bayesian and physics-informed machine learning models
Fauchereau et al. Recurrent daily OLR patterns in the Southern Africa/Southwest Indian Ocean region, implications for South African rainfall and teleconnections
Behmanesh et al. Estimation of soil temperature using gene expression programming and artificial neural networks in a semiarid region
Meidani et al. Long-lead streamflow forecasting in the southwest of Iran by sea surface temperature of the Mediterranean Sea
Brisson et al. Relations between atmospheric circulation and precipitation in Belgium
Xu et al. Symbolic regression equations for calculating daily reference evapotranspiration with the same input to Hargreaves-Samani in arid China
Abbot et al. Forecasting of medium-term rainfall using Artificial Neural Networks: Case studies from Eastern Australia
Yan et al. Estimating future daily pan evaporation for Qatar using the Hargreaves model and statistically downscaled global climate model projections under RCP climate change scenarios
Yalçın et al. A new deep learning method for meteorological drought estimation based-on standard precipitation evapotranspiration index
Pegion et al. Understanding predictability of daily southeast US precipitation using explainable machine learning
Rojas-Campos et al. Postprocessing of NWP precipitation forecasts using deep learning
Liu et al. Soil water content forecasting by ANN and SVM hybrid architecture
Wang et al. Regional climate model simulation of soil moisture and its application in drought reconstruction across China from 1911 to 2010
CN117973237B (en) Domain-free runoff simulation method based on domain adaptation and machine learning
CN115422782B (en) Flood forecasting system based on hydrological model
Zhan et al. Impulse Weibull distribution for daily precipitation and climate change in China during 1961–2011
Ansari Forecasting seasonal and annual rainfall based on nonlinear modeling with Gamma test in North of Iran
Traore et al. Modeling reference evapotranspiration by generalized regression neural network in semiarid zone of Africa
Raja et al. Regionalization of precipitation in Mauritius: a statistical approach
Salvi et al. Annual Rainfall Prediction of Maharashtra State Using Multiple Regression
Shaloo et al. Reference evapotranspiration prediction using machine learning models: An empirical study from minimal climate data
Tandon et al. Comparison of different Machine Learning methods on Precipitation dataset for Uttarakhand
Yang et al. Runoff Prediction in a Data Scarce Region Based on Few-Shot Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant