CN117058433A

CN117058433A - Ecological hydrologic partition method based on Gaussian mixture clustering algorithm

Info

Publication number: CN117058433A
Application number: CN202311062177.XA
Authority: CN
Inventors: 白梦婷; 李发文; 李旻昊; 寇瑞荣
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2023-08-22
Filing date: 2023-08-22
Publication date: 2023-11-14

Abstract

The invention discloses an ecological hydrologic partition method based on a Gaussian mixture clustering algorithm. According to the invention, an ecological hydrologic partition index system is constructed, and an entropy weight method is used for weighting indexes; taking the feature vector of each sample as the input of a Gaussian mixture model clustering algorithm, estimating model parameters by using an expectation maximization algorithm, and determining the optimal clustering number according to a Bayesian information criterion; dividing a river basin into a plurality of ecological hydrologic subareas according to the output result of the Gaussian mixture model clustering algorithm, and outputting probability values of each sample belonging to each subarea; the partitioning effect was quantitatively evaluated using the profile factor, the Calinski-Harabasz index, the davison-boolean Ding Zhishu, and the like, and a partitioning result graph was drawn. The method can improve the zoning precision and flexibility, and achieves the effect of comprehensively reflecting the characteristics of hydrology, weather, ecology, land utilization, socioeconomic and the like in the current domain.

Description

Ecological hydrologic partition method based on Gaussian mixture clustering algorithm

Technical Field

The invention relates to the technical field of hydrology, in particular to an ecological hydrologic partition method based on a Gaussian mixture clustering algorithm.

Background

The ecological hydrologic partition is to divide the area with similar hydrologic, meteorological, ecological, land utilization, socioeconomic and other aspects in the river basin into a plurality of relatively independent units so as to facilitate the management and protection of water resources. Ecological hydrologic partition is a complex multi-objective, multi-constraint and multi-level space optimization problem, and the influence and relation of various factors in a flow field need to be comprehensively considered. The traditional ecological hydrologic partition method mainly comprises the following steps:

(1) The qualitative partitioning method based on expert knowledge or experience, such as partitioning according to the characteristics of the drainage basin, such as landform, climate, vegetation and the like, is simple and visual, but has strong subjectivity and lacks objective basis and quantitative evaluation;

(2) Quantitative partitioning methods based on an index system, such as clustering or classifying according to index data of each partition unit in a river basin, are objective and scientific, but need to select a proper index system and a clustering or classifying algorithm, and are difficult to process on fuzzy or overlapped clusters;

(3) The dynamic partitioning method based on model simulation, such as simulation according to hydrological process or ecological process of each partition unit in the river basin, can reflect dynamic change of various factors in the river basin, but requires a large amount of data and parameters, and has large calculation amount and low efficiency.

Therefore, we propose an ecological hydrologic partition method based on a Gaussian mixture clustering algorithm.

Disclosure of Invention

The invention aims to provide an ecological hydrologic partitioning method based on a Gaussian mixture clustering algorithm, which has the advantages of high precision, high flexibility, high efficiency and the like, and solves the problems of strong subjectivity, difficulty in processing fuzzy or overlapped clusters, large calculation amount and the like in the traditional method.

4. In order to achieve the above purpose, the present invention provides the following technical solutions: an ecological hydrologic partition method based on a Gaussian mixture clustering algorithm comprises the following steps:

(a) Selecting a research area and a dividing unit, dividing the research area into a plurality of small areas according to rules, wherein each small area is used as one sample, and each sample has the same area;

(b) Constructing an ecological hydrologic partition index system, selecting indexes capable of reflecting characteristics in aspects of hydrologic, meteorological, ecological, land utilization and socioeconomic aspects in a research area, and constructing an index system;

(c) Weighting indexes in an index system, determining the weight of each index by using an entropy weight method, wherein the entropy weight method is an objective weighting method based on an information theory, and can determine the weight according to the information quantity of index data;

(d) Taking index data of each sample as input of a Gaussian mixture model clustering algorithm, wherein the Gaussian mixture model clustering algorithm is a clustering method based on probability density, estimates model parameters by using an expected maximization algorithm based on the assumption that the data obeys the linear combination of a plurality of Gaussian distribution functions, and determines the optimal clustering number according to a Bayesian information criterion;

(e) Dividing a research area into a plurality of ecological hydrologic subareas according to the output result of the Gaussian mixture model clustering algorithm, and outputting probability values of each sample belonging to each subarea;

(f) The partition effect was evaluated and a partition result map was output, and the partition effect was quantitatively evaluated using the contour coefficient, the Calinski-Harabasz index, and the davison-boolean Ding Zhishu index, and the partition result map was drawn.

Preferably, the gaussian mixture model clustering algorithm describes a linear combination of data obeying a plurality of gaussian distribution functions based on the following formula:

where K is the number of Gaussian distribution functions, i.e., the number of partitions; phi (phi) _i Is the weight coefficient of the ith Gaussian distribution function, meets the following conditionsμ _i Sum sigma _i Is the mean and standard deviation of the ith gaussian distribution function.

Preferably, the expectation maximization algorithm comprises the following two steps: e, step E: calculate each

The posterior probability that each sample belongs to each partition, namely:

m step: updating parameters according to posterior probability, namely:

preferably, the gaussian distribution function is isotropic, i.e. the covariance matrix is a diagonal matrix, simplifying the calculation and fitting of the model, and adapting to partitions of different shapes and sizes.

Preferably, the dividing unit is a grid of 1km×1km, and the accuracy and efficiency of the partition are adapted to cover the whole range of the investigation region.

Preferably, the index system comprises 8 indexes of DEM (digital elevation model) elevation, gradient, NDVI (normalized vegetation index), annual average precipitation, annual average air temperature, annual average relative humidity, annual average runoff and annual average GDP (total domestic production value).

Preferably, the number of the partitions is 5, the maximum iteration number is 100, the initial parameter is a randomly selected sample value, the parameter is determined based on a Bayesian information criterion, the log likelihood function of the data reaches the maximum, and over-fitting or under-fitting is avoided.

Compared with the prior art, the invention has the following beneficial effects:

1. according to the invention, the basin is partitioned by using the Gaussian mixture model clustering algorithm, so that the effects of improving the partition precision and flexibility are achieved. The gaussian mixture model clustering algorithm is a probability density-based clustering method that assumes that data obeys a linear combination of a plurality of gaussian distribution functions, estimates model parameters using a expectation maximization algorithm, and determines an optimal cluster number according to bayesian information criteria. The Gaussian mixture model clustering algorithm can process fuzzy or overlapped clusters and output probability values of each partition unit belonging to each partition, so that partition precision and flexibility are improved;

2. the invention achieves the effect of comprehensively reflecting the characteristics of hydrology, weather, ecology, land utilization, socioeconomic and the like in the current domain by constructing an ecological hydrology partition index system and weighting the indexes by using an entropy weighting method. The ecological hydrologic partition index system is the basis of the partition, and proper index selection is the key of the partition. The invention selects 8 indexes such as DEM (digital elevation model) elevation, gradient, NDVI (normalized vegetation index), annual average precipitation, annual average air temperature, annual average relative humidity, annual average runoff, and annual average GDP (total domestic production value), which can comprehensively reflect the characteristics of hydrology, weather, ecology, land utilization, socioeconomic aspects and the like in the flow field, and has the advantages of completeness, availability and representativeness. The invention uses the entropy weighting method to weight the index, the entropy weighting method is an objective weighting method based on the information theory, and the weight can be determined according to the information quantity of the index data, thereby avoiding subjective interference;

3. the invention achieves the effect of ensuring the partition precision and efficiency by setting the grid of the partition unit as 1km multiplied by 1 km. The dividing unit is the minimum unit of the partition, and selecting a proper dividing unit is an important link of the partition. According to the invention, the river basin is divided into a plurality of small areas according to a certain rule, each small area is used as one sample, and each sample has the same area. The invention sets the dividing unit as a grid of 1km multiplied by 1km, thus ensuring the accuracy and efficiency of the partition and covering the whole range of the river basin;

4. the method quantitatively evaluates the partition effect by using the indexes such as the contour coefficient, the Calinski-Harabasz index, the Dyson-Boolean Ding Zhishu and the like, and draws a partition result graph, so that the effects of intuitively displaying and evaluating the partition result are achieved. The evaluation of the partitioning effect is the final purpose of the partitioning, and the selection of an appropriate evaluation index is the necessary step of the partitioning. The partition effect is quantitatively evaluated by using indexes such as the contour coefficient, the Calinski-Harabasz index, the Dyson-Boolean Ding Zhishu and the like, and the indexes can reflect the similarity between the sample and other samples in the affiliated cluster and the difference between the sample and other samples in the cluster, and the intra-cluster divergence, the inter-cluster divergence and the like, so that the partition effect is comprehensively evaluated.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a graph showing the characteristic average values of the DEM elevation indicators in different partitions according to the present invention;

FIG. 3 is a graph showing characteristic averages of runoff indicators in different partitions according to the present invention;

FIG. 4 is a graph showing the characteristic average values of the NDVI index in different partitions according to the present invention;

FIG. 5 is a graph showing characteristic averages of the Slope index of the present invention in different partitions;

FIG. 6 is a graph showing characteristic averages of the humidity Hum indicator of the present invention in different partitions;

FIG. 7 is a graph showing characteristic averages of the precipitation Pre index of the present invention in different zones;

FIG. 8 is a graph showing the characteristic average values of the temperature TEM index of the present invention in different zones;

FIG. 9 is a graph showing the characteristic average values of the inventors' average GDP indicator in different partitions.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1 to 9, the invention discloses an ecological hydrologic partitioning method based on a gaussian mixture clustering algorithm, which can comprehensively consider characteristics of hydrologic, meteorological, ecological, land utilization, socioeconomic and the like in a river basin, partition the river basin by using a gaussian mixture model clustering algorithm, improve the precision and flexibility of the partition, output probability values of each partition unit belonging to each partition, and facilitate water resource management and protection. The invention will now be described in detail with reference to the drawings and examples.

FIG. 1 is a flow chart of the present invention, comprising the steps of:

step 101: selecting a research area and a dividing unit, dividing the research area into a plurality of small areas according to a certain rule, wherein each small area is used as one sample, and each sample has the same area;

step 102: constructing an ecological hydrologic partition index system, selecting indexes capable of reflecting characteristics of hydrologic, meteorological, ecological, land utilization, socioeconomic and the like in a research area, and constructing the index system;

step 103: weighting indexes in an index system, determining the weight of each index by using an entropy weight method, wherein the entropy weight method is an objective weighting method based on an information theory, and can determine the weight according to the information quantity of index data;

step 104: taking index data of each sample as input of a Gaussian mixture model clustering algorithm, wherein the Gaussian mixture model clustering algorithm is a clustering method based on probability density, and the method adopts a linear combination of a plurality of Gaussian distribution functions to estimate model parameters by using an expected maximization algorithm and determines optimal clustering numbers according to Bayesian information criteria;

step 105: dividing a research area into a plurality of ecological hydrologic subareas according to the output result of the Gaussian mixture model clustering algorithm, and outputting probability values of each sample belonging to each subarea;

step 106: and (3) evaluating the partition effect, outputting a partition result graph, quantitatively evaluating the partition effect by using indexes such as profile coefficients, calinski-Harabasz indexes, davison-Boolean Ding Zhishu and the like, and drawing the partition result graph.

The present invention is specifically implemented by taking a child river basin as an example.

Example 1:

the sub-tooth river basin is a large tributary of the sea river basin, and consists of two tributaries of the Hutuo river and the Fu Yang He, and the area of the river basin is about 4.68 km ² . The watershed has a complex and diverse ecological hydrologic system, and is a typical object needing ecological hydrologic partition.

First, the boundary data of the child river basin was imported into ArcGIS software and divided into grids of 1km×1km, and a total of 47889 grid cells each serving as one sample were obtained.

Secondly, an ecological hydrologic partition index system is constructed, 10 indexes such as DEM (digital elevation model), gradient, NDVI (normalized vegetation index), annual average precipitation, annual average air temperature, annual average relative humidity, annual average diameter flow, annual average GDP (global production value), population total and the like are selected, corresponding data are imported in ArcGIS software, index data of each grid unit are extracted, and a 10-dimensional feature vector is formed.

Then, the indexes in the index system are weighted, and the weight of each index is determined by using an entropy weight method. The specific calculation steps of the entropy weight method are as follows:

(1) Normalizing each index to make its value be 0,1]The normalization formula is:wherein x is _ij An original value of a j index representing the i sample;

(2) Calculating the information entropy of each index, wherein the information entropy reflects the information quantity of the index, and the smaller the information entropy is, the larger the information quantity is, and the information entropy formula is as follows:wherein e _j Information entropy indicating the j-th index;

(3) Calculating the difference coefficient of each index, wherein the difference coefficient reflects the distinguishing capability of the index, and the larger the difference coefficient is, the stronger the distinguishing capability is, and the difference coefficient formula is as follows: d, d _j ＝1-e _j Wherein d is _j A difference coefficient indicating a j-th index;

(4) Calculating the weight of each index, wherein the weight reflects the importance of the index, and the weight formula is that the larger the weight is, the higher the importance is:wherein w is _j The weight of the j-th index is represented.

According to the above steps, the weight of each index is obtained as shown in the following table:

then, the feature vector of each sample is multiplied by a corresponding weight to obtain a weighted feature vector, and the weighted feature vector is used as the input of a Gaussian mixture model clustering algorithm. The gaussian mixture model clustering algorithm is a probability density-based clustering method that assumes that data obeys a linear combination of a plurality of gaussian distribution functions, estimates model parameters using a expectation maximization algorithm, and determines an optimal cluster number according to bayesian information criteria. The Gaussian mixture model clustering algorithm comprises the following specific steps:

(1) Initializing parameters, setting the number K of Gaussian distribution functions as 5, setting the maximum iteration number as 100, and setting the initial parameters as randomly selected sample values;

(2) E, calculating the posterior probability of each sample belonging to each partition;

(3) And executing the step M, and updating parameters according to the posterior probability, namely:

(4) Judging whether the maximum iteration times or convergence conditions are reached, if so, stopping iteration, otherwise, returning to the step (2);

(5) Determining the optimal cluster number according to a Bayesian information criterion, wherein the Bayesian information criterion is a model selection criterion based on model complexity and data fitting degree, and can balance the simplicity and accuracy of a model, and the Bayesian information criterion formula is as follows:

BIC＝-2lnL+KlnN

where L is the log-likelihood function of the data, K is the number of parameters of the model, and N is the number of samples. The smaller the bayesian information criterion, the better the representation model.

According to the steps, the optimal clustering number is 5, and the probability value of each sample belonging to each partition is obtained.

And finally, dividing the sub-tooth river basin into 5 ecological hydrologic subareas according to the output result of the Gaussian mixture model clustering algorithm, and outputting the probability value of each grid unit belonging to each subarea. Meanwhile, the partitioning effect is quantitatively evaluated using the profile coefficient, the Calinski-Harabasz index, the davison-Boolean Ding Zhishu, and the like, and a partitioning result graph is drawn. The specific calculation method of the evaluation index is as follows:

the contour coefficient is a cluster evaluation index based on the distance between samples, reflects the similarity between the samples and other samples in the cluster and the difference between the samples in other clusters, and has the following formula:

wherein a is _i Is the average distance of the ith sample from other samples in the cluster, b _i Is the minimum average distance of the ith sample from the samples in the other clusters. The closer the profile coefficient is to 1, the better the clustering effect is.

The Calinski-Harabasz index is a cluster evaluation index based on intra-cluster and inter-cluster divergences, and reflects the tightness degree of samples in clusters and the separation degree of samples between clusters, and the Calinski-Harabasz index formula is as follows:

wherein SS is _B Is inter-cluster divergence, SS _W Is intra-cluster divergence, K is the number of clusters, and N is the number of samples. The larger the Calinski-Harabasz index, the better the clustering effect.

Davison-boolean Ding Zhishu is a cluster evaluation index based on the cluster inner diameter and the inter-cluster distance, and reflects the uniformity of the intra-cluster samples and the separation of the inter-cluster samples, and the davison-boolean index formula is:

where d_k is the average distance of samples in the kth cluster, d _k’ Is the average distance of samples in the kth cluster, D _kk’ Is the distance between the center points of the kth cluster and the kth' cluster. The smaller the davison-boolean Ding Zhishu, the better the clustering effect.

According to the above method, the evaluation index of the obtained partitioning effect is shown in the following table:

the partition results are shown in the table below, with the numbers on each grid cell representing the probability values belonging to that partition. The regional result diagram shows that the ecological hydrologic characteristics of the sub-dental river basin have obvious space difference, and different regions have different hydrologic, meteorological, ecological, land utilization, socioeconomic and other characteristics, so that scientific basis can be provided for water resource management and protection.

Example 2:

in order to verify the effectiveness and superiority of the method, the method is compared with the traditional K-means clustering algorithm by using the same data set and evaluation index, and the partitioning effects of the two methods are compared.

The K-means clustering algorithm is a distance-based clustering method, which divides data into K clusters such that the distance between each intra-cluster sample and the cluster center is the smallest and the distance between each inter-cluster sample and the cluster center is the largest. The specific steps of the K-means clustering algorithm are as follows:

(1) Initializing parameters, setting the number K of clusters as 5, setting the maximum iteration number as 100, and setting the center of the initial cluster as a randomly selected sample value;

(2) Calculating the distance between each sample and the center of each cluster, and dividing each sample into clusters closest to each sample;

(3) Updating the center of each cluster as the average value of the samples in the cluster;

(5) And outputting the cluster to which each sample belongs.

According to the steps, the partitioning result of the K-means clustering algorithm is obtained, and the partitioning effect is quantitatively evaluated by using indexes such as profile coefficients, calinski-Harabasz indexes, dyson-Boolean Ding Zhishu and the like. The evaluation index is shown in the following table:

the partitioning results are shown in the following table. As can be seen from the partition results, the partition results of the K-means clustering algorithm are rough and irregular, some partitions are too large or too small, some partitions are overlapped or have gaps, and the ecological hydrologic characteristics of the sub-dental river basin can not be reflected well.

Grid cell	Partitioning result of Gaussian mixture model clustering algorithm
		A1	2
A2	2
		A3	2
A4	2
		A5	2
A6	2
		A7	2
A8	2
		A9	2
B1	2
		B2	3
B3	3
		B4	3
B5	3
		B6	4
B7	4
		B8	4
B9	4
		…	…

Compared with a K-means clustering algorithm, the method has obvious improvement on indexes such as profile coefficients, calinski-Harabasz indexes, dyson-Boolean Ding Zhishu and the like, and the method can improve the accuracy and flexibility of the partition, process fuzzy or overlapped clusters, output probability values of each partition unit belonging to each partition and better accord with ecological hydrologic characteristics of the child river basin;

although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. An ecological hydrologic partition method based on a Gaussian mixture clustering algorithm is characterized by comprising the following steps of: the method comprises the following steps:

2. The ecological hydrologic partitioning method based on a Gaussian mixture clustering algorithm according to claim 1, wherein the method is characterized by comprising the following steps of: the gaussian mixture model clustering algorithm describes a linear combination of data obeying a plurality of gaussian distribution functions based on the following formula:

3. The ecological hydrologic partitioning method based on the Gaussian mixture clustering algorithm according to claim 2, wherein the method is characterized by comprising the following steps of: the expectation maximization algorithm comprises the following two steps: e, step E: the posterior probability that each sample belongs to each partition is calculated, namely:

m step: updating parameters according to posterior probability, namely:

4. the ecological hydrologic partitioning method based on a Gaussian mixture clustering algorithm according to claim 3, wherein the method is characterized by comprising the following steps of: the Gaussian distribution function is isotropic, namely the covariance matrix is a diagonal matrix, calculation and fitting of a model are simplified, and partitions with different shapes and sizes are adapted.

5. The ecological hydrologic partitioning method based on the Gaussian mixture clustering algorithm, which is disclosed in claim 4, is characterized in that: the dividing unit is a grid of 1km multiplied by 1km, the accuracy and the efficiency of the partition are adapted, and the whole range of the research area is covered.

6. The ecological hydrologic partitioning method based on a Gaussian mixture clustering algorithm according to claim 5, wherein the method is characterized in that: the index system comprises 8 indexes of DEM (digital elevation model) elevation, gradient, NDVI (normalized vegetation index), annual average precipitation, annual average air temperature, annual average relative humidity, annual average runoff and annual average GDP (total domestic production value).

7. The ecological hydrologic partitioning method based on a Gaussian mixture clustering algorithm according to claim 6, wherein the method is characterized in that: the number of the partitions is 5, the maximum iteration number is 100, the initial parameters are sample values selected randomly, the parameters are determined based on Bayesian information criteria, the log likelihood function of the data is maximum, and overfitting or underfilling is avoided.