CN117973237A

CN117973237A - Domain-free runoff simulation method based on domain adaptation and machine learning

Info

Publication number: CN117973237A
Application number: CN202410387927.9A
Authority: CN
Inventors: 陈能汪; 余镒琦; 梁中耀; 李少斌; 赵昕叶; 李冰菲
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2024-04-01
Filing date: 2024-04-01
Publication date: 2024-05-03
Anticipated expiration: 2044-04-01
Also published as: CN117973237B

Abstract

The invention discloses a data-free stream-domain runoff simulation method based on field adaptation and machine learning, which comprises the following steps of: s1, respectively collecting meteorological data and geographic data with and without data watershed, collecting runoff data with the data watershed, and preprocessing data; s2, selecting model input parameters by using the importance degree of random forest features and a recursive feature elimination algorithm; s3, using a kernel average matching algorithm to allocate weight to each sample in the data set input with the data watershed, and reducing the average distribution difference of the data set input between the data watershed and the data-free watershed; s4, constructing a random forest model by utilizing the data-stream-domain input data set processed by the kernel average matching algorithm, and inputting the data-stream-domain-free input data set into the model for runoff simulation; the method is flexible and easy to use, has low dependence on data quantity, and can effectively improve the runoff simulation precision of the data-free watershed.

Description

Domain-free runoff simulation method based on domain adaptation and machine learning

Technical Field

The invention relates to the technical field of runoff simulation, in particular to a data-free runoff simulation method of a runoff domain based on domain adaptation and machine learning.

Background

Runoff simulation is of great importance in water resource management and environmental protection. By simulating the river runoff, the influence of rainfall events on the hydrologic process of the river basin can be better known, and technical support is provided for water resource utilization planning and environmental capacity calculation.

Common runoff simulation models include process mechanism models (e.g., SWAT, HEC-HMS, etc.) and machine learning models (e.g., support vector machines, random forests, etc.). In recent years, machine learning models have been increasingly applied to the field of runoff simulation, showing unique advantages in many scenarios. One of the advantages of the machine learning runoff simulation method is its flexibility and generalization capability. The process mechanism model is often based on a complex river basin surface and underground hydrologic process and a river channel converging process, a large number of input parameters are needed, and the machine learning model can learn interaction modes among parameters from runoff historical observation data, so that dependence on priori knowledge is reduced. In addition, the machine learning model can also process nonlinear and complex hydrological response relations, is better suitable for various river basin characteristics (precipitation, topography, soil, vegetation and other meteorological and geographic characteristics), and improves simulation precision.

However, machine learning models also have data dependency problems. In a runoff monitoring data-free river basin, the characteristics of the river basin and the runoff response relation cannot be directly determined. Therefore, how to construct a machine learning model in a runoff-free data river basin is a great technical problem faced in the technical field of runoff simulation.

Disclosure of Invention

The invention aims to provide a runoff simulation method without a data stream basin based on field adaptation and machine learning, which is flexible and easy to use, has low dependence on data quantity and can effectively improve the runoff simulation precision without the data stream basin.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

A data-stream-domain-free runoff simulation method based on field adaptation and machine learning comprises the following steps:

S1, respectively collecting meteorological data and geographic data with and without data watershed, collecting runoff data with the data watershed, and preprocessing data;

S2, selecting model input parameters by using the importance degree of random forest features and a recursive feature elimination algorithm;

the specific process of step S2 is as follows:

S21, constructing a random forest model by taking meteorological data and geographic data with data watershed as input data and runoff data as output data;

s22, sorting all input parameters according to the feature importance degree obtained by the random forest model, and removing the input parameters with the lowest feature importance degree;

S23, setting the number of target input parameters, and repeating the steps S21-S22 to gradually remove the input parameters until the number of the input parameters is consistent with the number of the target input parameters;

s3, using a kernel average matching algorithm to allocate weight to each sample in the data set input with the data watershed, and reducing the average distribution difference of the data set input between the data watershed and the data-free watershed;

S4, constructing a random forest model by utilizing the data-stream-domain input data set processed by the kernel average matching algorithm, and inputting the data-stream-domain-free input data set into the model for runoff simulation.

Preferably, the meteorological data in step S1 includes precipitation, average wind speed for 2 minutes, average daily air temperature, maximum daily air temperature, minimum daily air temperature and average daily relative humidity; the geographic data comprise soil moisture content, normalized vegetation index, vegetation coverage, gradient, river network density and catchment area; the runoff data refers to historical runoff monitoring data of a data river basin.

Preferably, the preprocessing in step S1 includes removing non-numerical data, removing abnormal data, and aligning data time resolutions; the step of eliminating non-numerical data is to eliminate the data of character strings and null types; the abnormal data is removed by removing the abnormal data exceeding a set threshold value; the time resolution alignment means that the time resolution of the data is unified to the month frequency by calculating an average value.

Preferably, the specific process of step S3 is:

s31, calculating a kernel matrix by taking the radial basis function as a kernel function And a nuclear matrix/>The calculation formula is as follows:；/>；/> In the/> Is a radial basis function; /(I)An i-th sample in the input dataset; /(I)For the j-th sample in the input dataset; bandwidth parameters that are radial basis functions; /(I) Is an input dataset with data fields; /(I)Is an input dataset without a data stream field; nuclear moment array/>A result matrix is obtained by carrying out pairwise calculation on all samples in the data set input data with the data stream domain by using a radial basis function; nuclear matrix/>The result matrix is obtained by carrying out pairwise calculation on all samples in the data set input with the data stream domain and the data set without the data stream domain by using a radial basis function;

s32, calculating an optimal weight distribution vector The calculation formula is as follows: /(I)；In the above, the ratio of/>Assigning a vector to the weight; /(I)Distributing vectors for the weights corresponding to the ith sample; t is a matrix transpose symbol; /(I)The number of samples with data fields; /(I)Sample number for no data stream field; Is of length/> Is the full 1 vector of (2); b is a weight boundary coefficient; /(I)Is a constraint coefficient;

S33, calculating the data stream domain input data set with the sample weight distribution The calculation formula is as follows: In the above, the ratio of/> For/>The i-th element of (a); /(I)For/>The i-th element of (a); /(I)For/>I-th element of (a) in the list.

Preferably, the specific process of step S4 is:

S41, inputting the data stream domain with the weight allocated into a data set According to 80 percent: the proportion of 20% is divided into a training set and a testing set;

s42, under different hyper-parameter combinations, training a random forest model by using training set data, and calculating NSE of an actual measurement value and a model predicted value by using test set data, wherein the hyper-parameter combination when NSE is highest is the final hyper-parameter of the random forest model; wherein, the calculation formula of NSE is: wherein n is the number of samples in the test set; /(I) Is the actual measurement value; /(I)Is a model predictive value; /(I)The average value of the measured values in the test set;

S43, inputting the input data set without the data watershed into a random forest model to obtain a corresponding runoff simulation value.

After the technical scheme is adopted, the invention has the following beneficial effects: firstly, the data-free basin runoff simulation method based on field adaptation and machine learning has excellent flexibility and usability, can perform runoff simulation even in a basin lacking runoff monitoring data, and provides powerful technical support for water resource utilization planning and environment capacity calculation; and secondly, the invention adopts a kernel mean matching algorithm, can effectively measure the data distribution difference between the data-containing watershed and the non-data-containing watershed, not only reduces the dependence of the runoff simulation model on data, but also improves the generalization capability of the model, and improves the accuracy and reliability of the simulation result. Thus, the present invention provides an innovative and viable solution to the problem of runoff simulation without data watershed.

Drawings

FIG. 1 is a flow chart of the present invention;

Fig. 2 is a flow chart of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

As shown in fig. 1 to 2, a data-stream-domain-free runoff simulation method based on domain adaptation and machine learning includes the following steps:

The meteorological data in the step S1 comprises precipitation, average wind speed of 2 minutes, average daily air temperature, highest daily air temperature, lowest daily air temperature and average daily relative humidity; the geographic data comprise soil moisture content, normalized vegetation index, vegetation coverage, gradient, river network density and catchment area; the runoff data refers to historical runoff monitoring data with data watershed;

The preprocessing in the step S1 comprises the steps of eliminating non-numerical data, eliminating abnormal data and aligning the time resolution of the data; the step of eliminating non-numerical data is to eliminate the data of character strings and null types; the abnormal data is removed by removing the abnormal data exceeding a set threshold value; the time resolution alignment means that the time resolution of the data is unified into month frequency in a mode of calculating an average value;

the specific process of step S2 is as follows:

the specific process of step S3 is as follows:

s31, calculating a kernel matrix by taking the radial basis function as a kernel function And a nuclear matrix/>The calculation formula is as follows:；/>；/> In the/> Is a radial basis function; /(I)An i-th sample in the input dataset; /(I)For the j-th sample in the input dataset; /(I)Bandwidth parameters that are radial basis functions; /(I)Is an input dataset with data fields; /(I)Is an input dataset without a data stream field; nuclear moment array/>A result matrix is obtained by carrying out pairwise calculation on all samples in the data set input data with the data stream domain by using a radial basis function; nuclear matrix/>The result matrix is obtained by carrying out pairwise calculation on all samples in the data set input with the data stream domain and the data set without the data stream domain by using a radial basis function;

S33, calculating the data stream domain input data set with the sample weight distribution The calculation formula is as follows: In the above, the ratio of/> For/>The i-th element of (a); /(I)For/>The i-th element of (a); /(I)For/>The i-th element of (a);

s4, constructing a random forest model by utilizing the data-stream-domain input data set processed by the kernel average matching algorithm, and inputting the data-stream-domain-free input data set into the model for runoff simulation;

The specific process of step S4 is:

The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. The method for simulating the runoff of the non-data stream domain based on the field adaptation and the machine learning is characterized by comprising the following steps of:

the specific process of step S2 is as follows:

2. The method for simulating runoff in a data-free stream domain based on domain adaptation and machine learning as claimed in claim 1, wherein: the meteorological data in the step S1 comprises precipitation, average wind speed of 2 minutes, average daily air temperature, highest daily air temperature, lowest daily air temperature and average daily relative humidity; the geographic data comprise soil moisture content, normalized vegetation index, vegetation coverage, gradient, river network density and catchment area; the runoff data refers to historical runoff monitoring data of a data river basin.

3. The method for simulating runoff in a data-free stream domain based on domain adaptation and machine learning as claimed in claim 1, wherein: the preprocessing in the step S1 comprises the steps of eliminating non-numerical data, eliminating abnormal data and aligning the time resolution of the data; the step of eliminating non-numerical data is to eliminate the data of character strings and null types; the abnormal data is removed by removing the abnormal data exceeding a set threshold value; the time resolution alignment means that the time resolution of the data is unified to the month frequency by calculating an average value.

4. The method for simulating runoff in no data stream domain based on domain adaptation and machine learning as claimed in claim 1, wherein the specific process of step S3 is as follows:

S33, calculating the data stream domain input data set with the sample weight distribution The calculation formula is as follows: /(I)In the above, the ratio of/>For/>The i-th element of (a); /(I)For/>The i-th element of (a); /(I)For/>I-th element of (a) in the list.

5. The method for simulating runoff in no-data stream domain based on domain adaptation and machine learning as claimed in claim 1, wherein the specific process of step S4 is as follows: