CN114580171B

CN114580171B - A method for identification of watershed flood types and analysis of their influencing factors

Info

Publication number: CN114580171B
Application number: CN202210208546.0A
Authority: CN
Inventors: 邹磊; 于家瑞; 张永勇; 左凌峰
Original assignee: Institute of Geographic Sciences and Natural Resources of CAS
Current assignee: Institute of Geographic Sciences and Natural Resources of CAS
Priority date: 2022-03-03
Filing date: 2022-03-03
Publication date: 2022-09-30
Anticipated expiration: 2042-03-03
Also published as: CN114580171A

Abstract

The invention discloses a method for identifying the type of flood in a drainage basin and analyzing the influence factors of the flood, which comprises the following steps: collecting geographic factors of weather, human activity factors, social and economic indexes and runoff data of a hydrological site in a research area; secondly, preprocessing the data; constructing a flood event dividing method for coupling the ultra-quantitative threshold sampling and the flow sequence moving variance, and dividing flood events from hydrologic site daily runoff data; step four, constructing a flood behavior characteristic index system to depict a complete flood process; step five, clustering the flood events obtained by division by using a clustering method, and identifying the main types and the flood characteristics of the flood in the drainage basin; step six, statistically analyzing the main flood types of each hydrological station, and identifying regional characteristics of the watershed flood process; and seventhly, analyzing the influence of the spatial heterogeneity of the influence factors on the flood process of the drainage basin based on the geographic detector. The method can accurately divide flood events from long-sequence runoff data, identify the main flood types and main influence factors of the watershed, and provide support for characteristic information mining, rain flood resource utilization and water resource scientific management of the watershed flood process.

Description

A method for identification of watershed flood types and analysis of their influencing factors

技术领域technical field

本发明涉及洪水类型辨识技术领域，具体涉及一种流域洪水类型辨识及其影响因子解析的方法。The invention relates to the technical field of flood type identification, in particular to a method for flood type identification in a watershed and analysis of its impact factors.

背景技术Background technique

洪水是世界上发生最频繁且危害最大的自然灾害之一，严重威胁人类生命安全和财产安全。受区域气候、地形地貌以及社会经济发展差异的影响，水资源时空分布不均，导致洪水过程区域性差异显著。在全球变化和社会经济高速发展的背景下，洪水危害愈加严重，快速、有效辨识流域主要洪水类型及其影响因素可为流域水资源管理和防洪减灾提供科技支撑。Flood is one of the most frequent and most harmful natural disasters in the world, which seriously threatens the safety of human life and property. Affected by differences in regional climate, topography and socio-economic development, the spatial and temporal distribution of water resources is uneven, resulting in significant regional differences in flood processes. Under the background of global changes and rapid socio-economic development, flood damage is becoming more and more serious. Rapid and effective identification of the main flood types and their influencing factors in the basin can provide scientific and technological support for the management of water resources in the basin and flood control and disaster reduction.

洪水类型辨识需要大量的洪水事件样本，从连续径流序列中准确划分洪水过程至关重要。目前从连续径流序列划分洪水过程的方法多是结合降雨资料及专家经验人为判定洪水过程，该方法多依赖专家经验来判断起涨和退水点，主观因素占比大，且在大样本批量筛选洪水过程中效率较低。因此，需构建合理的方法从连续径流序列中快速、准确划分洪水事件，对于洪水类型辨识具有重要意义。此外，辨识流域主要洪水类型及其区域特征对于指导洪水管理具有重要意义。识别洪水类型及其区域特征的核心是通过构建全面的特征指标体系来刻画流域洪水，并基于此利用聚类方法对流域洪水过程进行聚类分析，辨识出具有相似性过程的洪水类型。目前洪水行为特征指标体系可以分为三类：水文气象指标、水文学指标和基于洪水过程的指标。水文气象指标和水文学指标主要是考虑引发洪水的因素，包括气旋路径、大气环流模式、天气系统的动力学特征等水文气象指标，以及降水、气温、蒸发、积雪深度、土壤湿度等水文学指标。基于洪水过程的指标是根据真实发生的洪水事件的行为特征计算得出，相对于水文气象指标和水文学指标，保证了具有相同洪水行为的洪水具有水文相似性，并在世界范围内得到越来越多的应用。但目前研究采用的基于洪水过程的指标主要集中在洪水的量级特征上，包括洪峰、洪量和径流深等，忽视了洪水过程线变化过程的特征要素，导致流域洪水特征信息在不同程度上的压缩与丢失。因此，未来洪水行为特征指标体系需要得到进一步发展和完善，着重考虑洪水量级、时域特征、变化率和洪水过程线形态特征的相关指标。同时，影响洪水过程的因素众多，各因素对洪水过程的影响解释程度不同。因此，在众多因素中准确识别出主控因素也至关重要。Flood type identification requires a large number of flood event samples, and it is crucial to accurately classify flood processes from continuous runoff sequences. At present, most of the methods to divide the flood process from the continuous runoff sequence are combined with rainfall data and expert experience to artificially determine the flood process. This method mostly relies on expert experience to determine the point of rising and falling water, which accounts for a large proportion of subjective factors, and is used in large sample batch screening. Less efficient during flooding. Therefore, it is necessary to construct a reasonable method to quickly and accurately divide flood events from continuous runoff sequences, which is of great significance for the identification of flood types. In addition, identifying the main flood types and their regional characteristics in the basin is of great significance for guiding flood management. The core of identifying flood types and their regional characteristics is to construct a comprehensive feature index system to describe the flood in the basin, and based on this, use the clustering method to cluster and analyze the flood process in the basin, and identify flood types with similar processes. The current flood behavior characteristic index system can be divided into three categories: hydrometeorological indicators, hydrological indicators and indicators based on flood process. Hydrometeorological indicators and hydrological indicators mainly consider the factors that cause floods, including hydrometeorological indicators such as cyclone paths, atmospheric circulation patterns, and dynamic characteristics of weather systems, as well as hydrological and meteorological indicators such as precipitation, temperature, evaporation, snow depth, and soil moisture. index. Flood process-based indicators are calculated based on the behavioral characteristics of actual flood events. Compared with hydrometeorological indicators and hydrological indicators, floods with the same flood behavior are guaranteed to have hydrological similarity, and are increasingly obtained worldwide. more applications. However, the indicators based on the flood process used in the current research mainly focus on the magnitude characteristics of the flood, including flood peak, flood volume and runoff depth, etc., ignoring the characteristic elements of the changing process of the flood hydrograph, resulting in the flood characteristic information of the watershed in different degrees. Compression and loss. Therefore, the indicator system of future flood behavior characteristics needs to be further developed and improved, focusing on the relevant indicators of flood magnitude, time domain characteristics, rate of change, and morphological characteristics of flood hydrographs. At the same time, there are many factors that affect the flood process, and each factor has a different degree of interpretation on the flood process. Therefore, it is also crucial to accurately identify the main controlling factors among the many factors.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种流域洪水类型辨识及其影响因子解析的方法，构建耦合超定量阈值抽样和流量序列移动方差的洪水事件划分方法，以快速、有效地从水文站日径流数据中准确划分洪水事件，辨识流域主要洪水类型及其代表性特征，分析影响因子空间异质性对洪水过程的影响，为流域洪水过程特征信息挖掘、雨洪资源利用及水资源科学管理提供支撑。The purpose of the present invention is to provide a method for identifying flood types in a watershed and analyzing its impact factors, and to construct a flood event division method that couples ultra-quantitative threshold sampling and flow sequence moving variance, so as to quickly and effectively obtain accurate data from the daily runoff data of hydrological stations. Divide flood events, identify the main flood types and their representative characteristics in the basin, analyze the impact of spatial heterogeneity of influencing factors on the flood process, and provide support for the information mining of flood process characteristics in the basin, the utilization of rainwater resources and the scientific management of water resources.

为实现上述目的，本发明采取的技术方案如下：To achieve the above object, the technical scheme adopted by the present invention is as follows:

步骤1)收集下载研究区内水文站点连续日尺度径流数据；收集下载影响流域洪水过程的气象地理因子、人类活动因子和社会经济因子，包括：降水、温度、风速、DEM、NDVI、坡度、城市化率、人口密度、GDP和碳排放量等；Step 1) Collect and download continuous daily-scale runoff data of hydrological stations in the study area; collect and download meteorological and geographical factors, human activity factors and socio-economic factors that affect the flood process in the basin, including: precipitation, temperature, wind speed, DEM, NDVI, slope, urban change rate, population density, GDP and carbon emissions, etc.;

步骤2)数据前处理，包括数据空间插值、采样、重分类、数据格式转换和区域统计中的一项或几项；数据的空间插值采用反距离加权平均方法；采样通过GIS软件中的Fishnet功能，按1km分辨率在空间上生成均匀的采样点，获取采样点位置处的各项数据；重分类采用自然断点法；数据格式转换主要指根据研究需要将矢量和栅格数据进行相互转换；区域统计采用GIS软件中的zonal statistics功能，统计各项影响因子的在各区域的均值、最大值、最小值等；Step 2) Data preprocessing, including one or more of data spatial interpolation, sampling, reclassification, data format conversion, and regional statistics; the spatial interpolation of the data adopts the inverse distance weighted average method; the sampling is performed through the Fishnet function in the GIS software. , generate uniform sampling points in space at a resolution of 1km, and obtain various data at the location of the sampling points; reclassification adopts the natural breakpoint method; data format conversion mainly refers to the mutual conversion of vector and raster data according to research needs; Regional statistics use the zonal statistics function in GIS software to count the mean, maximum and minimum values of various influencing factors in each region;

步骤3)构建耦合超定量阈值抽样和流量序列移动方差的洪水事件划分方法，从日径流数据中划分洪水事件：3-1.利用超定量阈值选样方法判定洪峰；3-2基于流量序列移动方差确定洪水过程的起涨点和退水点，起涨点和退水点之间为一次洪水过程；Step 3) Construct a flood event classification method that couples over-quantitative threshold sampling and flow sequence moving variance, and divide flood events from daily runoff data: 3-1. Use over-quantitative threshold sampling method to determine flood peaks; 3-2 Move based on flow sequence The variance determines the rising point and the retreat point of the flood process, and the interval between the rising point and the retreat point is a flood process;

步骤4)构建洪水行为特征指标体系刻画完整洪水过程：4-1.构建能够完整刻画洪水过程的洪水行为特征指标体系，所述洪水行为特征指标包括洪水量级、时域、变化率和洪水过程线形态特征的相关指标；4-2.计算步骤3)划分所得的洪水事件的洪水行为特征指标；Step 4) Build a flood behavior characteristic index system to describe the complete flood process: 4-1. Construct a flood behavior characteristic index system that can completely describe the flood process. The flood behavior characteristic index includes flood magnitude, time domain, rate of change and flood process. 4-2. Flood behavior characteristic indexes of flood events obtained from the division of calculation step 3);

步骤5)利用聚类方法对划分所得洪水事件进行聚类，辨识流域洪水主要类型及其洪水特征：利用k-means聚类算法对划分所得研究区的洪水事件进行聚类，获取流域类别数量，辨识流域洪水主要类型及洪水过程特征；Step 5) Use the clustering method to cluster the flood events obtained from the division, and identify the main types of floods in the watershed and their flood characteristics: use the k-means clustering algorithm to cluster the flood events in the divided study area to obtain the number of watershed categories, Identify the main types of floods in the basin and the characteristics of the flood process;

步骤6)统计分析研究区内各水文站主要洪水类型，识别流域洪水过程区域特征：计算各水文站不同洪水类型占比，以占比最大的类型表征该站点洪水类型，将站点洪水类型一样的划分为同一区域，识别流域洪水过程的区域特征；Step 6) Statistically analyze the main flood types of each hydrological station in the study area, and identify the regional characteristics of the flood process in the watershed: calculate the proportion of different flood types of each hydrological station, and use the type with the largest proportion to characterize the flood type of the site. Divide into the same area to identify the regional characteristics of the flood process in the watershed;

步骤7)基于地理探测器分析影响因子的空间异质性对流域洪水过程的影响：利用地理探测器中的因子探测模块对流域洪水过程的区域分布特征进行探测，分析影响因子空间异质性对洪水区域特征的影响。Step 7) Analyze the influence of the spatial heterogeneity of influencing factors on the flood process of the river basin based on the geographic detector: use the factor detection module in the geographic detector to detect the regional distribution characteristics of the flood process in the river basin, and analyze the influence of the spatial heterogeneity of the influencing factors on the flood process of the river basin. Influence of flood zone characteristics.

进一步的，步骤3)中3-1所述超定量阈值抽样方法独立洪峰的判定公式为：Further, in step 3), the judgment formula of the independent flood peak of the ultra-quantitative threshold sampling method described in 3-1 is:

式中，θ为两个相邻洪峰间时间间隔，单位为天；A为流域面积，单位为km²；Q_min为两个相邻洪峰Q₁和Q₂间最小流量，单位为m³/s。In the formula, θ is the time interval between two adjacent flood peaks, the unit is day; A is the basin area, the unit is km ² ; Q _min is the minimum flow between the two adjacent flood peaks Q ₁ and Q ₂ , the unit is m ³ / s.

进一步的，步骤3)中所述超定量阈值抽样方法的自适应阈值由以下六个步骤确定：Further, the adaptive threshold of the ultra-quantitative threshold sampling method described in step 3) is determined by the following six steps:

Ⅰ、以年最大序列的中值作为初始阈值u₀，组成阈值空间：u_i＝u₀+(i-1)/5×σ_x，式中i为阈值取样个数，σ_x为日平均流量的标准差；以5+ln(A)天为抽样分块时间，A为流域面积(km²)，构成i个超限洪峰序列。超阈值洪峰数量应服从泊松分布：Ⅰ. The median value of the annual maximum sequence is used as the initial threshold u ₀ to form a threshold space: u _i =u ₀ +(i-1)/5×σ _x , where i is the number of threshold samples, and σ _x is the daily average Standard deviation of flow; taking 5+ln(A) days as the sampling block time, and A as the area of the watershed (km ² ), i constituted a sequence of over-limit flood peaks. The number of over-threshold flood peaks should obey a Poisson distribution:

P(x＝k)＝e^-λλ^k/k！ (2)P(x=k)=e ^-λλk / ^k ! (2)

式中，k＝0,1,2,…,λ为年平均发生的超阈值数。In the formula, k = 0, 1, 2, ..., λ is the annual average number of over-threshold values.

Ⅱ、对各超限洪峰序列的样本数进行卡方检验，显著水平为0.05。Ⅱ. The chi-square test is carried out on the number of samples of each over-limit flood peak sequence, and the significant level is 0.05.

Ⅲ、绘制每个超限洪峰序列的平均超过函数图，判断合理的阈值u范围。其中平均超过函数是阈值u的线性函数，可表示为：Ⅲ. Draw the average exceeding function graph of each over-limit flood peak sequence, and judge the reasonable range of the threshold u. where the average exceeding function is a linear function of the threshold u, which can be expressed as:

式中，X为随机变量，u为阈值，ξ是形态参数，σ是尺度参数。where X is a random variable, u is a threshold, ξ is a morphological parameter, and σ is a scale parameter.

样本的经验平均超过函数e_n(u)可用下式来估计：The empirical mean excess function of the sample _en (u) can be estimated by the following formula:

式中，n为样本数。where n is the number of samples.

Ⅳ、对个超限洪峰序列进行Anderson-Darling检验，并结合上述平均超过函数图结果选取合适的阈值u。k个样本的Anderson-Darling检验的统计量可以表示为：Ⅳ. Carry out the Anderson-Darling test for each over-limit flood peak sequence, and select the appropriate threshold u in combination with the above average over-function graph results. The statistic of the Anderson-Darling test for k samples can be expressed as:

式中，n_i为样本个数；F(x)为样本x的累积分布函数，N＝∑n_i为所有样本个数；H_N(x)为所有N个样本的分布函数；B_N＝{x∈R:H_N(x)<1}。在实际计算中可通过插值外推得到Anderson-Darling统计量

的P_AD值，当P_AD>α，接受原假设，否则拒绝原假设。计算中以逐个超限洪峰序列的经验分布和理论累积分布作为两个独立样本计算统计量，进而计算P_AD值。In the formula, n _i is the number of samples; F(x) is the cumulative distribution function of the sample x, N= _∑ni is the number of all samples; H _N (x) is the distribution function of all N samples; B _N = {x∈R:H _N (x)<1}. Anderson-Darling statistics can be obtained by interpolation and extrapolation in actual calculations

The value of _PAD , when _PAD >α, accept the null hypothesis, otherwise reject the null hypothesis. In the calculation, the empirical distribution and the theoretical cumulative distribution of each over-limit flood peak sequence are used as two independent samples to calculate the statistics, and then the _PAD value is calculated.

Ⅴ、对选取的阈值u以及该阈值所对应的超限洪峰序列，使用极大似然法进行广义帕累托分布(Generalized Pareto Distribution)的参数估计。广义帕累托分布可表示为：V. For the selected threshold u and the over-limit flood peak sequence corresponding to the threshold, use the maximum likelihood method to estimate the parameters of the Generalized Pareto Distribution. The generalized Pareto distribution can be expressed as:

式中，ξ是形态参数，σ是尺度参数，u是位置参数，在本方法中u是阈值。当形态参数ξ为零时，广义帕累托分布对应于指数分布；当形态参数ξ小于零时，为常Pareto分布；当形态参数ξ大于零时，为Pareto-Ⅱ型分布。In the formula, ξ is the morphological parameter, σ is the scale parameter, u is the position parameter, and u is the threshold in this method. When the morphological parameter ξ is zero, the generalized Pareto distribution corresponds to the exponential distribution; when the morphological parameter ξ is less than zero, it is a constant Pareto distribution; when the morphological parameter ξ is greater than zero, it is a Pareto-II distribution.

Ⅵ、根据研究需求，选取适宜重现期所对应的流量值作为超定量阈值选样方法的阈值。Ⅵ. According to the research needs, select the flow value corresponding to the suitable return period as the threshold value of the sample selection method for over-quantitative threshold value.

进一步的，步骤3)中3-2所述基于流量序列移动方差确定洪水过程的起涨点和退水点，包括以下三个步骤：A、选取恰当的移动窗口天数，计算该流量序列的移动方差Var；B、确定流量序列移动方差的阈值TH_var；C、比较流量序列的移动方差和阈值，确定洪水的起涨点和退水点。Further, in step 3) described in 3-2 based on the movement variance of the flow sequence to determine the rising point and the receding point of the flood process, comprising the following three steps: A, select the appropriate moving window days, calculate the movement of the flow sequence Variance Var; B. Determine the threshold value TH _var of the moving variance of the flow sequence; C. Compare the moving variance and the threshold value of the flow sequence to determine the rising point and the ebb point of the flood.

进一步的，步骤3)中3-2所述基于流量序列移动方差确定洪水过程的起涨点和退水点，流量序列移动方差及其阈值的计算公式分别为：Further, according to 3-2 in step 3), the starting point and the receding point of the flood process are determined based on the moving variance of the flow sequence, and the calculation formulas of the moving variance of the flow sequence and its threshold are respectively:

Var(i)＝Var(Q_i,Q_i+1,…,Q_i+n) (7)Var(i)=Var(Q _i ,Q _i+1 ,...,Q _i+n ) (7)

式中，Q为流量序列值，n为移动窗口天数，

是该流量序列移动方差的均值，σ(Var)是该流量序列移动方差的方差，θ是一系数，用于控制该算法识别洪水过程的敏感度，θ越小，能够识别出的洪水过程越多。where Q is the flow sequence value, n is the number of days in the moving window,

is the mean value of the moving variance of the flow sequence, σ(Var) is the variance of the moving variance of the flow sequence, and θ is a coefficient used to control the sensitivity of the algorithm to identify the flood process. The smaller the θ, the more flood process can be identified. many.

进一步的，步骤4)中4-1所述洪水行为特征指标包括变差系数、偏态系数、峰度、涨洪时间占比、高脉冲时间占比、标准化洪峰、涨洪速率、落洪速率、洪峰、洪水总量、峰现时间、洪水发生时间、持续时间、洪峰数和洪峰模数。Further, the characteristic indicators of flood behavior described in 4-1 in step 4) include coefficient of variation, skewness coefficient, kurtosis, proportion of flood time, high pulse time proportion, standardized flood peak, flood rate, and flood rate. , flood peak, flood total, peak time, flood occurrence time, duration, flood peak number and flood peak modulus.

进一步的，步骤4)中4-2所述计算步骤3划分的洪水事件的洪水行为特征指标，可表示为：Further, in step 4) in 4-2, the flood behavior characteristic index of the flood event divided by the calculation step 3 can be expressed as:

X＝[x₁,x₂,…,x_n] (9)X=[x ₁ ,x ₂ ,...,x _n ] (9)

式中，X为一场洪水事件的行为特征指标矩阵；x是一个洪水行为特征指标；n是指标个数。In the formula, X is the behavior characteristic index matrix of a flood event; x is a flood behavior characteristic index; n is the number of indexes.

进一步的，步骤5)中所述k-means聚类算法所采用的距离为欧式距离，其计算公式为：Further, the distance adopted by the k-means clustering algorithm described in step 5) is the Euclidean distance, and its calculation formula is:

式中，d为n维欧式空间中两点X和Y间的欧式距离，X坐标为(x₁,x₂,…,x_n),Y坐标为(y₁,y₂,…,y_n)。In the formula, d is the Euclidean distance between two points X and Y in the n-dimensional Euclidean space, the X coordinate is (x ₁ , x ₂ ,...,x _n ), and the Y coordinate is (y ₁ ,y ₂ ,...,y _n ).

进一步的，步骤5)中所述判断聚类达到最优结果的标准为戴维森堡丁指数(Davies-Bouldin Index,DBI)达到最小，该指数的计算公式为：Further, in step 5), the criterion for judging that the clustering reaches the optimal result is that the Davies-Bouldin Index (DBI) reaches the minimum, and the calculation formula of the index is:

式中，n是聚类算法划分的类别数，c_i是第i类的聚类中心，σ_i是第i类中所有洪水事件距其聚类中心的距离，d是第i类聚类中心和第j类聚类中心之间的距离，c_j是第j类的聚类中心，σ_j是第j类中所有洪水事件距其聚类中心的距离。In the formula, n is the number of categories divided by the clustering algorithm, c _i is the cluster center of the i-th category, σ _i is the distance of all flood events in the i-th category from its cluster center, and d is the i-th category of cluster centers and the distance between the cluster center of the jth class, c _j is the cluster center of the _jth class, and σj is the distance of all flood events in the jth class from its cluster center.

进一步的，步骤5)中所述k-means聚类算法的输入为流域内所有站点洪水事件的洪水行为特征指标，可表示为：Further, the input of the k-means clustering algorithm described in step 5) is the flood behavior characteristic index of all site flood events in the watershed, which can be expressed as:

对于具有m个观测站点的流域而言，其流域内所有站点洪水事件的洪水行为特征指标所构成的矩阵W可表示为：For a watershed with m observation stations, the matrix W formed by the flood behavior characteristic indicators of flood events of all stations in the watershed can be expressed as:

W＝[Y₁,Y₂,…,Y_m] (12)W=[Y ₁ ,Y ₂ ,...,Y _m ] (12)

式中，Y_m为流域内第m个站点所有洪水事件的洪水行为特征指标所组成的矩阵，下标1,2,…,m表示第1个至第m个站点。In the formula, Y _m is the matrix composed of the flood behavior characteristic indexes of all flood events at the m-th station in the watershed, and the subscripts 1, 2,...,m represent the first to m-th stations.

对于一个发生了k次洪水事件的站点Y，其所有洪水事件行为特征指标所构成的矩阵可表示为：For a site Y where k flood events have occurred, the matrix composed of all the flood event behavioral characteristic indicators can be expressed as:

Y＝[X₁,X₂,…,X_k] (13)Y=[X ₁ , X ₂ ,...,X _k ] (13)

式中，X为上述公式(9)中一场洪水事件的行为特征指标矩阵。并且公式(13)可进一步表示为：In the formula, X is the behavior characteristic index matrix of a flood event in the above formula (9). And formula (13) can be further expressed as:

根据式(14)，则W可进一步表示为：According to formula (14), then W can be further expressed as:

式中，第一个下标1,2,…,m表示第1个至第m个站点；第二个下标1,2,…,k代表第1次至第k次洪水事件；第三个下标1,2,…,n表示第1个至第n个洪水行为特征指标。In the formula, the first subscript 1,2,…,m represents the 1st to mth stations; the second subscript 1,2,…,k represents the 1st to kth flood events; the third The subscripts 1,2,…,n represent the 1st to nth flood behavior characteristic indicators.

进一步的，步骤7)中所述地理探测器的因子探测和交互作用探测模块中影响因子对洪水区域特征的解释程度可表示为：Further, in the factor detection and interaction detection module of the geographic detector described in step 7), the degree of interpretation of the flood area characteristics by the influencing factors can be expressed as:

式中，h＝1，…，L，是影响因子X的分区；N_h和N分别是第h个区域和全区的单元数；

和σ²分别为洪水区域特征Y在第h区域和全区的方差。In the formula, h=1,...,L, is the partition of the influence factor X; N _h and N are the number of units in the h-th area and the whole area, respectively;

and σ ² are the variances of the flood area characteristic Y in the h-th area and the whole area, respectively.

本发明的有益效果：Beneficial effects of the present invention:

本发明的流域洪水类型辨识及其影响因子解析的方法，通过构建耦合超定量阈值抽样和流量序列移动方差的洪水事件划分方法，可快速、有效地从水文站日径流数据中准确划分洪水事件，辨识流域主要洪水类型及其代表性特征，分析影响因子空间异质性对洪水过程的影响。本发明方法可从长序列径流数据中准确划分洪水事件，并辨识流域主要洪水类型及其主要影响因子，可为流域洪水过程特征信息挖掘、雨洪资源利用及水资源科学管理提供支撑。The method for identifying the type of flood in the river basin and analyzing its impact factor of the present invention can quickly and effectively divide the flood event from the daily runoff data of the hydrological station by constructing a flood event division method that couples ultra-quantitative threshold sampling and flow sequence moving variance. Identify the main flood types and their representative characteristics in the basin, and analyze the influence of spatial heterogeneity of influencing factors on the flood process. The method of the invention can accurately divide flood events from long-sequence runoff data, identify main flood types and main influencing factors in the river basin, and provide support for information mining of flood process characteristics in the river basin, utilization of rainwater resources and scientific management of water resources.

下面将结合附图及具体实施方式对本发明作进一步详细说明。The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

附图说明Description of drawings

图1本发明流域洪水类型辨识及其影响因子解析方法流程图；Fig. 1 is a flow chart of basin flood type identification and its impact factor analysis method of the present invention;

图2耦合超定量阈值抽样和流量序列移动方差的洪水事件划分方法流程示意图；Fig. 2 is a schematic flow chart of the flood event classification method coupled with ultra-quantitative threshold sampling and flow sequence moving variance;

图3超限样本的平均超过函数示意图；Figure 3 is a schematic diagram of the average exceeding function of the out-of-limit sample;

图4基于流量序列移动方差确定洪水过程的起涨点和退水点方法示意图；Figure 4 is a schematic diagram of the method for determining the rising point and the retreat point of the flood process based on the moving variance of the flow sequence;

图5洪水事件行为特征指标示意图；Figure 5 is a schematic diagram of the behavioral characteristic indicators of flood events;

图6流域洪水类型及其区域分布特征辨识流程示意图。Figure 6 Schematic diagram of the identification process of flood types and their regional distribution characteristics in the basin.

具体实施方式Detailed ways

实施例1Example 1

本发明提出了一种流域洪水类型辨识及其影响因子解析的方法，下面将以我国某一流域为案例区，对本发明的技术方案的具体应用作进一步说明，具体应用包括以下步骤：The present invention proposes a method for identifying the type of flood in a watershed and analyzing its impact factors. The following will take a certain watershed in my country as a case area to further illustrate the specific application of the technical solution of the present invention, and the specific application includes the following steps:

步骤1)收集研究区内气象地理因子、人类活动因子、社会经济指标及水文站点的径流数据：Step 1) Collect meteorological and geographical factors, human activity factors, socioeconomic indicators and runoff data of hydrological stations in the study area:

研究区内水文站点的逐日径流数据来源于《中华人民共和国水文年鉴》，这些站点均匀地分布在整个流域；气象站点的逐日降水数据来自国家气象科学数据中心(http://data.cma.cn)；DEM(90m)和NDVI(1km)获取自中国科学院资源环境科学与数据中心(https://www.resdc.cn/)；社会经济数据包括城镇化率、人口密度、GDP和碳排放量，分别摘自《中国城市统计年鉴》和中国碳核算数据库(CEADs)；影响因子数据情况见表1：The daily runoff data of the hydrological stations in the study area are from the "Hydrological Yearbook of the People's Republic of China", and these stations are evenly distributed in the whole basin; the daily precipitation data of the meteorological stations are from the National Meteorological Science Data Center (http://data.cma.cn). ); DEM (90m) and NDVI (1km) were obtained from the Resource and Environmental Science and Data Center, Chinese Academy of Sciences (https://www.resdc.cn/); socioeconomic data including urbanization rate, population density, GDP and carbon emissions , extracted from China Urban Statistical Yearbook and China Carbon Accounting Database (CEADs) respectively; the impact factor data are shown in Table 1:

表1影响因子数据情况Table 1 Impact factor data situation

序号serial number 影响因子Impact factor 简称short name 单位unit 数据类型type of data 11 年累积降水量Annual cumulative precipitation PtotalPtotal mmmm 站点site 22 年最大日降水量Annual maximum daily precipitation PmaxPmax mmmm 站点site 33 年最大3日降水量Annual maximum 3-day precipitation Pmax3Pmax3 mmmm 站点site 44 坡度slope SlopeSlope —— 栅格grid 55 高程Elevation ElevElev mm 栅格grid 66 NDVINDVI NDVINDVI —— 栅格grid 77 城镇化率urbanization rate CRCR ％% 栅格grid 88 人口密度Population density HDHD People/km2People/km2 栅格grid 99 GDPGDP GDPGDP Billion yuanBillion yuan 栅格grid 1010 人类活动强度intensity of human activity IHAIHA Million tonsMillion tons 栅格grid

步骤2)对上述降水、DEM、NDVI以及社会经济数据进行空间插值、采样、重分类、数据格式转换和区域统计中的一项或几项。Step 2) One or more items of spatial interpolation, sampling, reclassification, data format conversion and regional statistics are performed on the above-mentioned precipitation, DEM, NDVI and socioeconomic data.

步骤3)划分洪水事件，利用matlab软件编写了相应程序。具体算法如下：Step 3) Divide flood events and write corresponding programs using matlab software. The specific algorithm is as follows:

3-1.根据超定量阈值抽样方法确定洪水事件的洪峰(图2)：3-1. Determine the flood peak of the flood event according to the over-quantitative threshold sampling method (Fig. 2):

3-1-1采用以下步骤确定洪水阈值，以该流域下游某一水文站为例，以年最大序列的中值32800m³/s作为初始阈值u₀，组成的阈值空间为32800～51556m³/s。以24天为抽样分块时间，构成15个超限洪峰序列。各序列的阈值和超限数量见表2：3-1-1 The following steps are used to determine the flood threshold. Taking a hydrological station in the downstream of the basin as an example, the median value of the annual maximum sequence of 32800m ³ /s is used as the initial threshold u ₀ , and the threshold space is 32800～51556m ³ / s. Taking 24 days as the sampling block time, 15 over-limit flood peak sequences are formed. See Table 2 for the thresholds and the number of overruns for each sequence:

表2广义帕累托分布的阈值、参数估计结果Table 2 Threshold and parameter estimation results of generalized Pareto distribution

3-1-2对各超限洪峰序列的样本数进行卡方检验，显著水平为0.05。结果表明，15个超限洪峰序列中有13个序列超限洪峰样本数符合泊松分布(见表1中H₀一列，H₀为0表示接受原假设，即样本服从泊松分布；H₀为1表示拒绝原假设)。3-1-2 Perform chi-square test on the number of samples of each over-limit flood peak sequence, and the significance level is 0.05. The results show that 13 of the 15 over-limit flood peak sequences conform to the Poisson distribution (see the column H _{0 in Table 1, where H 0} _is 0 to accept the null hypothesis, that is, the samples obey the Poisson distribution; H ₀ A value of 1 indicates rejection of the null hypothesis).

3-1-3绘制每个超限洪峰序列的平均超过函数图，判断合理的阈值u范围。根据图3可以发现，平均超过函数图显示线性趋势出现的位置为43500m³/s附近。考虑超限数据量和超限量分布的收敛性，选择43500m³/s作为确定阈值的参考.3-1-3 Draw the average exceeding function graph of each over-limit flood peak sequence, and judge the reasonable threshold u range. According to Fig. 3, it can be found that the average over-function graph shows that the linear trend appears near 43500m ³ /s. Considering the over-limit data volume and the convergence of the over-limit distribution, 43500m ³ /s is chosen as the reference for determining the threshold.

3-1-4对个超限洪峰序列进行Anderson-Darling检验，结果如表2所示，并结合上述平均超过函数图结果，确定阈值为43518m³/s。3-1-4 The Anderson-Darling test is carried out on each over-limit flood peak sequence, and the results are shown in Table 2. Combined with the results of the above-mentioned average exceeding function graph, the threshold is determined to be 43518m ³ /s.

3-1-5对选取的阈值43518m³/s以及该阈值所对应的超限洪峰序列，使用极大似然法进行广义帕累托分布(Generalized Pareto Distribution)的参数估计，结果如表1中所示；3-1-5 For the selected threshold of 43518m ³ /s and the over-limit flood peak sequence corresponding to the threshold, the maximum likelihood method is used to estimate the parameters of the Generalized Pareto Distribution. The results are shown in Table 1. shown;

3-1-6根据研究需求，选取重现期为5年所对应的流量值作为超定量阈值选样方法的阈值。3-1-6 According to the research needs, select the flow value corresponding to the recurrence period of 5 years as the threshold value of the over-quantitative threshold sampling method.

3-1-7利用findpeaks函数计算径流序列中的极大值，并根据上述公式(1)以及阈值编写相应判别语句，筛选符合条件的洪水事件洪峰T_pk。3-1-7 Use the findpeaks function to calculate the maximum value in the runoff sequence, and write the corresponding judgment sentence according to the above formula (1) and the threshold value, and filter the flood peak T _pk of the flood event that meets the conditions.

3-2.基于流量序列移动方差确定洪水过程的起涨点和退水点。基于流量序列移动方差确定洪水过程的起涨点和退水点的具体实施过程如图4所示，根据上述公式(7)和公式(8)计算流量序列的移动方差和阈值，通过比较流量序列的移动方差和阈值，确定洪水过程的起涨点和退水点。起涨点和退水点之间为一次洪水过程。此次划分，该水文站划分出30场洪水事件。3-2. Determine the ups and downs of the flood process based on the moving variance of the flow sequence. Figure 4 shows the specific implementation process of determining the rising and retreating points of the flood process based on the moving variance of the flow sequence. The moving variance and threshold of the flow sequence are calculated according to the above formulas (7) and (8). The moving variance and threshold of , determine the ups and downs of the flood process. Between the rising point and the receding point is a flood process. In this division, the hydrological station divided 30 flood events.

步骤4)计算洪水行为特征指标：Step 4) Calculate the flood behavior characteristic index:

4-1.筛选用于刻画洪水行为特征的指标，所述洪水行为特征指标包括洪水量级、时域、变化率和洪水过程线形态特征的相关指标，本次案例区采用的洪水行为特征指标共15个，如表3所示：4-1. Screen the indicators used to describe the characteristics of flood behavior. The indicators of flood behavior characteristics include indicators related to flood magnitude, time domain, rate of change, and morphological characteristics of flood hydrographs. The flood behavior characteristic indicators used in this case area There are 15 in total, as shown in Table 3:

表3洪水行为特征指标Table 3 Flood behavior characteristic indicators

注：Q_t是第t个时段的流量(m³/s)；A是流域面积(km²)；t_start和t_end是洪水事件的起涨点和退水点，

σ、μ₃、μ₄分别是洪水事件的平均流量(m³/s)、方差、三阶中心矩和四阶中心矩；t_0.75pk是洪水过程中流量大于0.75倍洪峰的时间长。Note: Q _t is the flow rate (m ³ /s) in the t-th period; A is the area of the watershed (km ² ); t _start and t _end are the rising and retreating points of flood events,

σ, μ ₃ , μ ₄ are the mean flow (m ³ /s), variance, third-order central moment and fourth-order central moment of the flood event, respectively; t _0.75pk is the time when the flow is greater than 0.75 times the flood peak during the flooding process.

4-2.根据计算步骤3)划分的洪水事件的洪水行为特征指标：根据洪水事件流量过程线计算洪水行为特征指标的过程如图5所示。4-2. The flood behavior characteristic index of the flood event divided according to the calculation step 3): the process of calculating the flood behavior characteristic index according to the flood event flow hydrograph is shown in FIG. 5 .

步骤5)利用聚类方法对划分所得洪水事件进行聚类，辨识流域洪水主要类型及其洪水特征：辨识流域洪水类型的流程如图6上半部分所示，利用k-means聚类算法对划分所得研究区的洪水事件进行聚类，获取流域类别数量，辨识流域洪水主要类型及洪水过程特征：Step 5) Use the clustering method to cluster the flood events obtained from the division, and identify the main types of floods in the basin and their flood characteristics: the process of identifying the types of floods in the basin is shown in the upper part of Figure 6, and the k-means clustering algorithm is used to divide the The obtained flood events in the study area are clustered, the number of watershed categories is obtained, and the main types of watershed floods and flood process characteristics are identified:

将洪水事件行为特征所构成的矩阵输入进matlab中，利用kmeans函数对站点洪水事件进行聚类。选择DBI指数作为聚类效果判别标准，DBI指数计算公式如公式(10)所示。DBI指数值越小，说明类别间差距越大，聚类效果越好。在本案例中，站点洪水事件被分为3类。The matrix composed of the behavior characteristics of flood events is input into matlab, and the kmeans function is used to cluster the flood events of the site. The DBI index is selected as the criterion for judging the clustering effect, and the calculation formula of the DBI index is shown in formula (10). The smaller the DBI index value, the greater the gap between categories, and the better the clustering effect. In this case, site flooding events are classified into 3 categories.

步骤6)统计分析研究区内各水文站主要洪水类型，识别流域洪水过程区域特征：辨识流域洪水过程区域分布特征的流程如图6下半部分所示，通过计算各水文站不同洪水类型占比，以占比最大的类型表征该站点洪水类型，将站点洪水类型一样的划分为同一区域，识别流域洪水过程的区域分布特征。Step 6) Statistically analyze the main flood types of each hydrological station in the study area, and identify the regional characteristics of the flood process in the basin: The process of identifying the regional distribution characteristics of the flood process in the basin is shown in the lower part of Figure 6. By calculating the proportion of different flood types in each hydrological station , the flood type of the site is characterized by the type with the largest proportion, and the flood type of the site is divided into the same area to identify the regional distribution characteristics of the flood process in the basin.

步骤7)利用地理探测器方法分析影响因子空间异质性对流域洪水类型区域分布特征的影响：Step 7) Use the geographic detector method to analyze the influence of spatial heterogeneity of influencing factors on the regional distribution characteristics of flood types in the basin:

采用自然断点法对影响因子进行分级，年累积降水量、年最大日降水量、年最大3日降水量等气象因子被分为10级，坡度、高程等地理因子被分为12级，NDVI因子被分为6级，城镇化率、人口密度、GDP以及人类活动强度因子被分为8级。The natural breakpoint method is used to classify the influencing factors. Meteorological factors such as annual cumulative precipitation, annual maximum daily precipitation, and annual maximum 3-day precipitation are divided into 10 grades, and geographical factors such as slope and elevation are divided into 12 grades. NDVI The factors are divided into 6 levels, and the urbanization rate, population density, GDP and human activity intensity factors are divided into 8 levels.

利用地理探测器中的因子探测模块对流域洪水区域特征的影响因子进行探测，确定各影响因子对洪水区域特征的影响程度，结果如表4所示：Use the factor detection module in the geographic detector to detect the influencing factors of the flood area characteristics of the river basin, and determine the influence degree of each influencing factor on the flood area characteristics. The results are shown in Table 4:

表4地理探测器结果Table 4 Geodetector results

上述地理探测器结果表明，该流域气象条件、地理因素、土地利用和人类活动的影响和相互作用是造成洪水区域特征的重要原因。对于单个因子而言，气象条件(q＝0.40～0.57)、地理因素(q＝0.08～0.54)和人类活动(q＝0.24～0.48)是洪水区域特征的主要驱动因素，而植被覆盖(q＝0.15)的影响较小。对于多因子共同作用而言，P_max3和Elev的共同作用对流域洪水区域特征的影响最大(q＝0.79)，其次为P_total和P_max的共同作用(q＝0.72)。The above geographic detector results show that the influence and interaction of meteorological conditions, geographical factors, land use and human activities in the basin are important reasons for the characteristics of flood areas. For a single factor, meteorological conditions (q=0.40-0.57), geographical factors (q=0.08-0.54) and human activities (q=0.24-0.48) were the main drivers of flood area characteristics, while vegetation cover (q=0.08-0.48) 0.15) has less effect. For the multi-factor interaction, the interaction of P _max3 and Elev had the greatest impact on the characteristics of flood areas in the basin (q=0.79), followed by the interaction of P _total and P _max (q=0.72).

以上所述仅对本发明的实例实施而已，并不用于限制本发明，本发明中对数据的转换以及聚类算法的选取，可根据需求及具体研究区设定。凡是在本发明的权利要求限定范围内，所做的任何修改、等同替换、改进等，均应在本发明的保护范围之内。The above description is only for the implementation of an example of the present invention, and is not intended to limit the present invention. The data conversion and the selection of the clustering algorithm in the present invention can be set according to requirements and specific research areas. Any modification, equivalent replacement, improvement, etc. made within the scope of the claims of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method for identifying the flood type of a drainage basin and analyzing the influence factors of the flood type is characterized in that: the method comprises the following steps:

step 1) collecting continuous daily-scale radial flow data of hydrological stations in a downloading research area; collecting and downloading meteorological geographic factors, human activity factors and social and economic factors which influence the flood process of the drainage basin;

step 2), data preprocessing, including one or more items of data spatial interpolation, sampling, reclassification, data format conversion and region statistics;

step 3) constructing a flood event partitioning method coupling the over-quantitative threshold sampling and the flow sequence moving variance, and partitioning flood events from daily runoff data: 3-1, judging the flood peak by using a super-quantitative threshold sampling method; the threshold value of the super quantitative threshold value sampling method is determined by the following six steps: i, taking the median of the annual maximum sequence as an initial threshold value u ₀ And forming a threshold space: u. of _i ＝u ₀ +(i-1)/5×σ _x Where i is the number of threshold samples, σ _x Standard deviation of daily average flow; taking 5+ ln (A) days as sampling block time, A is drainage basin area and unit is km ² I overrun peak sequences are formed; II, performing chi-square test on the sample number of each overrun flood peak sequence, wherein the significance level is 0.05; III, drawing an average over function graph of each over-limit peak sequence, and judging the range of the threshold value u; IV, performing Anderson-Darling test on the i transfinite flood peak sequences, and selecting a threshold u by combining the average transfinite function graph result; v, performing parameter estimation of generalized pareto distribution on the selected threshold u and the overrun flood peak sequence corresponding to the threshold by using a maximum likelihood method; VI, according to the research requirements,selecting a flow value corresponding to the suitable recurrence period as a threshold value of the over-quantitative threshold value sampling method; 3-2, determining a rising point and a water withdrawal point of a flood process based on the flow sequence moving variance, wherein a flood process is performed between the rising point and the water withdrawal point, and the rising point and the water withdrawal point of the flood process are determined based on the flow sequence moving variance, and the method comprises the following three steps: A. selecting the number of days of a moving window, and calculating the moving variance Var of the flow sequence; B. determining a threshold TH for a variance of a flow sequence movement _var (ii) a C. Comparing the moving variance of the flow sequence with a threshold value, and determining a rising point and a falling point of the flood; the calculation formula of the threshold value of the flow sequence moving variance is as follows:

in the formula (I), the compound is shown in the specification,

is the mean value of the flow sequence moving variance, sigma (Var) is the variance of the flow sequence moving variance, theta is a coefficient for controlling the sensitivity of identifying the flood process, and the smaller theta is, the more flood processes can be identified;

step 4), constructing a flood behavior characteristic index system to describe a complete flood process: 4-1, constructing a flood behavior characteristic index system capable of completely depicting a flood process, wherein the flood behavior characteristic indexes comprise relevant indexes of flood magnitude, time domain, change rate and flood process line state characteristics; 4-2, calculating flood behavior characteristic indexes of the flood events obtained by dividing in the step 3);

step 5) clustering the flood events obtained by division by using a clustering method, and identifying the main types and the flood characteristics of the flood in the drainage basin: clustering flood events of the divided research areas by using a k-means clustering algorithm to obtain the number of types of the drainage basin, and identifying the main types and the process characteristics of the flood of the drainage basin;

step 6), carrying out statistical analysis on main flood types of all hydrological stations in the research area, and identifying regional characteristics of the flood process of the drainage basin: calculating the occupation ratios of different flood types of each hydrological station, representing the flood type of the station by the type with the largest occupation ratio, dividing the station with the same flood type into the same area, and identifying the area characteristics of the watershed flood process;

step 7) analyzing the influence of the spatial heterogeneity of the influence factors on the flood process of the drainage area based on the geographic detector: and detecting the influence factors of the regional distribution characteristics of the watershed flood process by using a factor detection module in the geographic detector, and analyzing the influence of the spatial heterogeneity of the influence factors on the regional characteristics of the flood.

2. The method of claim 1, wherein the method comprises the steps of: the independent flood peak in the over-quantitative threshold sampling method in the step 3) should meet the following conditions:

in the formula, theta is the time interval between two adjacent flood peaks, and the unit is day; a is the area of the drainage basin and the unit is km ² ；Q _min For two adjacent peaks Q ₁ And Q ₂ Minimum flow rate in m ³ /s。

3. The method of claim 1, wherein the method comprises the steps of: the flood behavior characteristic indexes in the step 4) comprise relevant indexes of flood magnitude, time domain, change rate and flood process line state characteristics, and specifically comprise variation coefficients, state bias coefficients, kurtosis, flood rise time ratio, high pulse time ratio, standardized flood peak, flood rise rate, flood fall rate, flood peak, total flood amount, peak occurrence time, flood occurrence time, duration, flood peak number and flood peak modulus.

4. The method of claim 1, wherein the method comprises the steps of: the distance adopted by the k-means clustering algorithm in the step 5) is an Euclidean distance, the standard for judging that the clustering reaches the optimal result is that the Daisenberg Digital index DBI reaches the minimum, and the calculation formula of the index is as follows:

wherein n is the number of categories divided by the clustering algorithm, c _i Is the cluster center of the ith class, σ _i Is the distance of all flood events in the ith class from its cluster center, d is the distance between the class center of the ith class and the class center of the jth class, c _j Is the cluster center of class j, σ _j Is the distance of all flood events in class j from its cluster center.

5. The method of claim 1, wherein the method comprises the steps of: and 7) detecting the interpretation degree of the influence factors on the characteristics of the flood area by adopting a factor detection module and an interaction detection module in the geographic detector.