CN108615059B

CN108615059B - Lake automatic selection method and device based on dynamic multi-scale clustering

Info

Publication number: CN108615059B
Application number: CN201810443646.5A
Authority: CN
Inventors: 段佩祥; 钱海忠; 何海威; 郭敏; 王骁; 刘闯; 谢丽敏; 罗登瀚
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2018-05-10
Filing date: 2018-05-10
Publication date: 2020-08-07
Anticipated expiration: 2038-05-10
Also published as: CN108615059A

Abstract

The invention relates to the field of map synthesis, in particular to a lake automatic selection method and device based on dynamic multi-scale clustering. Firstly, setting an area threshold according to a target scale, and selecting all lakes with areas larger than the threshold; then, buffer areas are constructed for all lakes, and lakes without buffer area intersection relations are selected as important holding points of distribution characteristics; then dividing the lake group into regions according to distribution density difference by adopting dynamic multi-scale clustering; and finally, adopting different selection strategies for different areas (namely the small lake group). The invention effectively keeps the morphological structure and density comparison before and after the selection of the lake groups under different target scales, has good rationality of the selection result, and can be applied to the automatic selection and comprehensive treatment of the lake elements of the topographic maps under different target scales.

Description

Lake automatic selection method and device based on dynamic multi-scale clustering

Technical Field

The invention relates to the field of map synthesis, in particular to a lake automatic selection method and device based on dynamic multi-scale clustering.

Background

Lakes distributed on the land are thousands of, and the sizes of areas and capacities are greatly different. Overall, lakes are unevenly distributed, some are distributed in a form of a cluster of lakes being gathered and connected to each other by a river, and some are distributed in a scattered form. For cartographers, the difficulty in selecting lake groups in the map compiling process is high, a large amount of knowledge and experience are needed for learning and accumulation, and the time consumption is long. At present, relatively few research is conducted on automatic selection of lake elements, and an integral selection mode is adopted, so that the attribute characteristics, the distribution characteristics and the topological characteristics of lakes are difficult to be considered. Therefore, a selection algorithm for reasonably and automatically selecting lake groups under different target scales needs to be researched urgently.

Disclosure of Invention

The invention aims to provide a lake automatic selection method and a lake automatic selection device based on dynamic multi-scale clustering, which are used for solving the problem that the existing lake selection method is not capable of considering attribute characteristics, distribution characteristics and topological characteristics of lakes at the same time, so that automatic selection of lakes in map synthesis is inaccurate.

In order to achieve the aim, the invention provides a lake automatic selection method based on dynamic multi-scale clustering, which comprises the following steps:

determining a clustering initial point of lake clustering for a lake group to be processed, and establishing a corresponding clustering layer;

clustering according to the distance between the lakes and the clustering starting point, and moving the lakes in the same cluster into the clustering layer;

after the current clustering is finished, determining a next clustering starting point in the rest lakes, establishing a corresponding clustering layer, and starting the next round of clustering;

and finally, obtaining a cluster image layer corresponding to each cluster, and selecting one or more lakes according to the number of the lakes contained in each cluster image layer by adopting a corresponding selection method to obtain a first lake image layer.

Further, setting an area threshold value for the lake group according to a target scale, and selecting lakes with lake areas larger than the area threshold value to obtain a second lake image layer; establishing corresponding buffer areas for the lakes in the lake group according to a set distance, and selecting the lakes with the buffer areas without intersection relation to obtain a third lake image layer, wherein the lakes with the buffer areas with the intersection relation form the lake group to be processed; and fusing the first lake image layer, the second lake image layer and the third lake image layer to obtain a final lake selection result.

Further, the process of determining a cluster starting point of a lake cluster includes:

extracting the central point of each lake, and then establishing a corresponding lake buffer area according to the radius set under the target scale;

and calculating the number of the central points in each lake buffer area, and taking the lake corresponding to the lake buffer area with the largest number of the central points as a clustering starting point.

Further, the process of clustering according to the distance between the lake and the clustering starting point comprises the following steps:

setting a distance threshold according to a target scale, determining a near lake closest to the clustering starting point, calculating the distance between the near lake and the clustering starting point, if the distance is smaller than the distance threshold and smaller than a set multiple of the class average distance between lakes in the current clustering layer, moving the near lake into the clustering layer, updating the class average distance between lakes in the current clustering layer, and then determining the next near lake closest to the clustering starting point; otherwise, finishing clustering.

Further, for each clustering layer, the process of selecting one or more lakes by adopting a corresponding selection method according to the number of lakes included in the clustering layer includes:

selecting the clustering layers only comprising one lake;

if the clustering image layer comprises two lakes, judging the shortest surface distance between the two lakes, and if the shortest surface distance is larger than the field distance set under the target scale, completely selecting the lakes; otherwise, calculating the weight value of each lake according to the area of the lake and the connection quantity of the lakes and rivers, and selecting the lakes with large weight values;

if the clustering map layer comprises at least three lakes, the clustering map layer is divided into two conditions: if the number of the lakes in the clustering map layer is larger than the set number, iteratively calculating the weight value of the lakes by using a principal component analysis method according to importance factors, wherein the importance factors comprise the area of the lakes, the connection number of rivers, the area of an influence domain, the density, the centrality and the average adjacency distance; deleting lakes with set digits of weighted value ranking reciprocals and non-interfering numbers in each iteration process until the selected number index and the deleted number index are met, wherein the non-interfering numbers refer to adjacent lakes which are not overlapped and are the same; and secondly, if the number of lakes in the clustering layer is less than the set number, calculating the weight value of each lake according to the area of the lakes and the connection number of the lakes and rivers, sequencing the lakes from large to small, and selecting the lakes with the set number of positive ranking digits.

The invention also provides a lake automatic selection device based on dynamic multi-scale clustering, which comprises a processor and a memory, wherein the processor stores instructions for realizing the following method:

selecting the clustering layers only comprising one lake;

The invention has the beneficial effects that: firstly, setting an area threshold according to a target scale, and selecting all lakes with areas larger than the threshold; then, buffer areas are constructed for all lakes, and lakes without buffer area intersection relations are selected as important holding points of distribution characteristics; then dividing the lake group into regions according to distribution density difference by adopting dynamic multi-scale clustering; and finally, adopting different selection strategies for different areas (namely the small lake group). The invention effectively keeps the morphological structure and density comparison before and after the selection of the lake groups under different target scales, has good rationality of the selection result, and can be applied to the automatic selection and comprehensive treatment of the lake elements of the topographic maps under different target scales.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a flow chart of a dynamic multi-scale clustering method in the method of the present invention;

FIG. 3 shows the clustering test results of the present invention under different parameters;

FIG. 4 is an image of an original lake group in an embodiment;

FIG. 5 is an image of an original lake group after lake extraction;

FIG. 6 is a schematic diagram of an area selection result;

FIG. 7 is a diagram illustrating buffer establishment;

FIG. 8 is a schematic diagram of buffer selection;

FIG. 9 is a graph of clustering effects;

FIG. 10 is an image of a class 4 lake in a cluster before selection;

FIG. 11 is an image of a selected lake of class 4 in a cluster;

FIG. 12 is an image of a category 5 lake in cluster before selection;

FIG. 13 is an image of a selected lake of category 5 in a cluster;

FIG. 14 is a graph of the results of an area-based selection method;

FIG. 15 is an enlarged view of area A of FIG. 14;

FIG. 16 is an enlarged view of area B of FIG. 14;

FIG. 17 is a graph of selected results of the method of the present invention;

FIG. 18 is an enlarged view of area C of FIG. 17;

FIG. 19 is an enlarged view of area D of FIG. 17;

fig. 20 is an enlarged view of region E in fig. 17.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

The invention aims to select large-area lakes through an area selection step, select isolated lakes which are beneficial to keeping morphological structures through a buffer area selection step, divide areas with different lake distribution densities through dynamic multi-scale clustering, scientifically and quantitatively calculate comprehensive evaluation of comprehensive importance of lakes through a principal component analysis method, and divide and select lake groups by combining the areas and the principal component analysis method so as to solve the problem of automatic lake group selection; the method comprises the following steps:

selecting the area, namely selecting a large-area lake;

selecting a buffer area, namely selecting an 'isolated' lake;

partitioning by adopting dynamic multi-scale clustering;

different selection strategies are adopted for different small lake groups.

The specific steps are shown in fig. 1 and comprise the following processing steps.

(1) And (4) selecting the area. Setting the area threshold value as 1mm on the graph under the target scale²And traversing all lakes according to the represented solid areas, and selecting the lakes with the areas larger than the threshold value.

(2) And selecting a buffer area. And (3) establishing buffer areas for all lakes, setting the radius of the buffer areas as the field distance represented by 3mm below a target scale, and selecting the lakes with the buffer areas without intersection relation.

(3) And dividing all lakes except the selected buffer area into a plurality of small lake groups by adopting a dynamic multi-scale clustering method.

(4) And adopting different selection strategies for different small lake groups, wherein for the small lake groups with more lakes, the calculation of comprehensive importance evaluation is carried out by iteration by adopting a principal component analysis method, and the ranked lakes are deleted after sequencing until the indexes of the selected number are met.

The lakes selected by the buffer are not included when clustered because they are relatively isolated and can be regarded as having no influence on other lakes, and the lakes selected by the area have a significant influence on surrounding lakes, especially small lakes, and thus they must be included when both the buffer are selected and the clustered.

Finally, the lake images obtained by the three selection modes can be fused, repeated lakes can be selected only once, and the selection result of the original lake group is obtained finally.

The invention divides the whole lake group into a plurality of small lake groups according to the difference of the distribution density of the lakes by using a dynamic multi-scale clustering method, and then adopts different selection strategies in local classification, thereby gradually keeping the consistency of the morphological structure of the lake group before and after the selection from the whole to the local.

Based on the above analysis, the basic idea of the dynamic multi-scale clustering algorithm of step (3) is: setting a solid distance threshold D corresponding to the longitudinal multi-scale clustering requirement from a distance threshold D on the graph according to different target scales, and identifying the distribution density difference corresponding to the transverse multi-scale clustering requirement by using the average lake distance of the class (hereinafter referred to as the class average distance). Each clustering takes the lake with the largest distribution density as a starting point, dynamically expands from the center to the periphery, sequentially judges whether the distance of the nearest lake of the class is smaller than a distance threshold D and smaller than n times of the average distance of the class, if so, the nearest lakes are clustered into one class, and the average distance of the class is recalculated; if not, finishing the clustering and starting the next clustering. d and n are obtained from the effect of the dynamic multi-scale clustering experiment to obtain the optimal values. And D is larger than n times of the class average distance, whether clustering is performed is determined primarily according to whether the value of D is smaller than D, and whether clustering is performed is determined finally according to whether the value of D is smaller than n times of the class average distance.

The steps are shown in fig. 2, and comprise:

(1) extracting a lake central point and establishing a buffer area;

(2) calculating the number of central points in each buffer area, taking the central points as the quantitative index values of the distribution density, taking the lakes with the maximum distribution density as a clustering starting point, and building a new clustering layer;

(3) obtaining a clustered nearest lake by calculating the distances between all lakes in the clustering layer and all lakes in the original lake group layer, if the distance between the type of nearest lake and the nearest lake is smaller than a distance threshold value D set according to a target scale and is smaller than n times of a type average distance (the condition is not judged when the clustering starts), moving the nearest lake in the original lake group layer into the clustering layer, and recalculating the type average distance of the type; otherwise, the clustering is finished, and the step 2 is returned.

The invention adopts different selection strategies for different classes and distributes the selection quantity indexes according to the root of the evolution law in proportion. After clustering by a dynamic multi-scale method, the lake groups can be divided into 3 types according to the number of lakes in the lake groups: only one "single point class" of lakes; "double-point" with two lakes; contains three lakes and more than three 'multi-point classes'. The selection strategy of each class is as follows:

and for the single-point class, all the points are selected as important holding points of the distribution characteristics of the lake groups.

Secondly, judging the shortest surface distance between two lakes for the double-point type, and if the shortest surface distance is larger than the real distance represented by 3mm under a target scale, completely selecting the two lakes; otherwise, the importance is calculated according to the weight of the lake area and the number of the connections with the river, each accounting for 0.5, and one lake which is relatively unimportant is deleted, for example, I is 0.5S_Area+0.5*N_RiverI denotes lake importance, S_Area、N_RiverRespectively representing the normalized lake area and river flow connection number.

Thirdly, for the multipoint category, two situations are divided: firstly, if the number of lakes in a class is large (for example, more than 7), selecting 6 indexes of lake area, river connection number, area of an affected area, density, centrality and average adjacent distance as importance factors, iteratively calculating comprehensive evaluation on the lake importance by using a principal component analysis method, and deleting 30% of lakes which are not interfered with each other (the lakes which are not interfered with each other and have the same overlap) in each iteration process until the selected number index and the deleted number reach the indexes; secondly, the number of the lakes in the category is relatively small (for example, less than or equal to 7), and the lakes are selected according to the importance ranking calculated by the weight of the lake area and the number of the connections with the rivers, which respectively account for 0.5.

A specific example is given below. In order to determine the optimal values of the clustering parameters d and n and verify the effectiveness of multi-scale dynamic clustering, a clustering test is performed on a certain lake group, and fig. 3 shows the test results under different clustering parameters, wherein m is the number of lake group clusters, and small lake groups clustered into one class are represented by the same color. In fig. 3(a), d is 4mm, n is 2, and m is 13; in fig. 3(b), d is 4mm, n is 2.5, and m is 11; in fig. 3(c), d is 5mm, n is 2, and m is 8; in fig. 3(d), d is 5mm, n is 2.5, and m is 6; in fig. 3(e), d is 5mm, n is 1.9, and m is 15; in fig. 3(f), d is 6mm, n is 2.5, and m is 4; from the results, the following conclusions can be drawn: when n is less than 2, clustering is too scattered, as shown in fig. 3 (e); when d is greater than 5mm, the clustering is too concentrated, as shown in FIG. 3 (f). Experiments prove that the value of d is 4-5 mm, and the value of n is 2-2.5, which is more in line with the clustering judgment of human vision, so that the value of d is 5mm, and the value of n is 2.5.

In a region of 1: for example, as shown in fig. 4, the original experimental data is shown, and fig. 5 shows the extracted lake element distribution, where the comprehensive target scale is set to 1: 400 ten thousand, the selection method of the invention is tested.

Setting the area threshold value to be 16km²(1: 1mm under 400 million)²Representative solid area), area selection is performed, and a total of 29 lakes greater than the threshold are selected for all of these lakes, see fig. 6 (black indicates a lake greater than the area threshold and gray indicates less than the threshold).

A buffer area is built for a lake, the radius is set to be 12km (1: 400 thousands of field distances represented by 3 mm), as shown in figure 7, 1 lake (in a circle) has no buffer area intersection, so that the lake is selected as shown in figure 8, and therefore the obvious distribution characteristic of the southeast direction of the whole lake group can be kept.

Clustering is carried out, the distance threshold value is set to be 20km (the solid area distance represented by 5mm under 1: 400 ten thousand), the obtained result is clustered into 5 classes, wherein the two-point class is 2, the multi-point class is 3, the number of the lakes is respectively 2, 4, 31 and 86, the average class distance is respectively 4514.4, 19582.2, 10456.5, 4736.4 and 10412.4, and the specific clustering effect is shown in figure 9. The clustering result can be seen to accord with the visual clustering feeling of human eyes, and the difference of the average distance of various types can effectively reflect the lake distribution density difference of each small lake group area.

And determining the selected quantity index of each small lake group, and respectively adopting the selection strategy provided by the invention.

According to the root of evolution model

Can calculate the targetThe scale bar is 1: the number of lakes to be selected from 400 ten thousand categories is shown in table 1.

Table 1 various kinds of selected quantity indexes

Calculating the lake distance of the class 1 and the class 2 of the double-point class to judge whether the lake distance is greater than a threshold value (the field distance represented by 3mm under 1: 400 ten thousand scale bars), wherein the two-lake distance in the class 1 is less than the threshold value, so that the importance calculation is carried out by taking the weight of 0.5 of the area and the river flow connection number respectively, and one lake is deleted; the distance between two lakes of class 2 is greater than the threshold value, so that the two lakes are all selected.

And for the multipoint type 3, the number of lakes is 4 and less than 7, the principal component analysis cannot be carried out, the importance calculation is carried out by the weight of 0.5 of the area and the river flow connection number respectively, and the lakes are sequentially selected according to the importance sequence until the selected number index is reached.

And extracting blank area skeleton lines from the classes 4 and 5 of the multipoint classes, constructing meshes, obtaining 6 attributes of lake area, river connection number, influence area, density, centrality and average adjacent distance, and iteratively using principal component analysis to obtain comprehensive evaluation of lake importance, wherein only 30% of the sorted lakes which are not interfered with each other are deleted in each iteration process as shown in the table 2 until the number of the lakes reaches the selected number index.

Table 2 example of principal component analysis of lake importance

After 7 iterations, class 4 is selected. After 21 iterations, class 5 is selected. The selection effect of class 4 and class 5 is shown in fig. 10 to fig. 13, where fig. 10 is before class 4 is selected, fig. 11 is after class 4 is selected, fig. 12 is before class 5 is selected, and fig. 13 is after class 5 is selected.

The final selection result can be obtained by combining the selection results of the various types as shown in fig. 17. To demonstrate the rationality of the method of the invention, a comparison of the area-based selection method (as shown in FIG. 14) with the method of the invention (as shown in FIG. 17) is used, the circles in the two figures indicating the difference in the results of the two selections, the dashed boxes A, B, C, D and E indicating the partially enlarged areas, the enlarged areas A, B, C, D, E being shown in FIGS. 15, 16, 18, 19, and 20, respectively, the enlarged areas being selected in black and deleted in gray. By comparison, it can be found that:

in the selecting method of the invention, under the condition of small area difference, the more the river connections are, the higher the comprehensive evaluation of importance is (see table 1), so that the selecting result better retains some lakes with important positions (more connected rivers) but relatively small areas, and deletes some lakes with relatively large areas but unimportant positions, as shown in fig. 20. In contrast, the area-based selection method only considers the area attribute singly, and does not consider the association relation with the river elements during selection. The method calculates the comprehensive evaluation of the importance of the lake through a principal component analysis method, and comprehensively considers the importance factors of two attributes of the area and the connection relation with the river.

The result of the selection method of the invention better ensures that the distribution characteristics before and after the selection of the lake groups can be maintained, including the overall morphological structure similarity and the distribution density difference of different local areas. In the area with dense lake distribution density, the accepting and rejecting degree is relatively large, but the characteristic of the area with relatively dense distribution is still maintained, as shown in fig. 19; in the area with sparse lake density, the selection result of the area still appears to be distributed sparsely, although the selection result of the area is relatively small, as shown in fig. 18. On the contrary, the area-based selection result is distributed unevenly, the lakes in some areas are distributed too densely, as shown in fig. 16, and the lakes in some areas are distributed too sparsely, as shown in fig. 15, so that the distribution characteristics of the lake groups are kept worse. The selection method of the invention selects the isolated lakes through the buffer area to keep the obvious distribution characteristics of the specific positions of the lake groups; areas with different distribution densities are identified through dynamic multi-scale clustering, and are divided into small lake groups, so that the overall distribution characteristics are maintained; and (3) in each small lake group, performing iterative selection by considering importance factors reflecting distribution characteristics and topological characteristics, and keeping the distribution characteristics in a local area.

The analysis and summary shows that the lake selection method has the following characteristics:

1. the dynamic multi-scale clustering method is adopted, so that the whole lake group is divided into a plurality of small lake groups according to the difference of the distribution density of the lakes, then the local classification adopts different selection strategies, the consistency of the morphological structure of the lake group before and after the selection is gradually kept from the whole to the local, and the selection of the isolated lakes is added, so that the distribution characteristics of the whole lake group are well kept.

2. As the principal component analysis method is adopted to comprehensively evaluate the importance of the lakes, and the importance factors which can fully reflect the attribute characteristics, the topological characteristics and the distribution characteristics are considered, the comprehensive evaluation of the importance of the lake selection is more comprehensive and scientific, and the selection result can well reflect the real situation.

Due to the characteristics, the selection method provided by the invention follows the principle of lake selection, comprehensively considers the importance factors of the lakes for comprehensive evaluation, and effectively keeps comparison between morphological structures and densities before and after the lake group is selected. According to the experimental result, the automatic selection method can effectively perform automatic selection on the lakes.

The embodiments of the present invention have been described above, but the present invention is not limited to the described embodiments, for example, the specific values involved in the above implementation processes are changed, or the sequence of different selection steps is adjusted, so that the technical solution formed by fine tuning the above embodiments still falls into the protection scope of the present invention.

Claims

1. A lake automatic selection method based on dynamic multi-scale clustering is characterized by comprising the following steps:

finally, obtaining a cluster image layer corresponding to each cluster, and selecting one or more lakes according to the number of lakes contained in each cluster image layer by adopting a corresponding selection method to obtain a first lake image layer;

for each clustering layer, the process of selecting one or more lakes by adopting a corresponding selection method according to the number of lakes contained in the clustering layer comprises the following steps:

selecting the clustering layers only comprising one lake;

2. The method according to claim 1, wherein the method comprises the following steps:

setting an area threshold value for the lake group according to a target scale, and selecting lakes with lake areas larger than the area threshold value to obtain a second lake image layer; establishing corresponding buffer areas for the lakes in the lake group according to a set distance, and selecting the lakes with the buffer areas without intersection relation to obtain a third lake image layer, wherein the lakes with the buffer areas with the intersection relation form the lake group to be processed; and fusing the first lake image layer, the second lake image layer and the third lake image layer to obtain a final lake selection result.

3. The method for automatically selecting lakes according to claim 1 or 2, which is based on dynamic multi-scale clustering, and is characterized in that: the process for determining the cluster starting point of the lake cluster comprises the following steps:

4. The method according to claim 3, wherein the method comprises the following steps:

the process of clustering according to the distance between the lake and the clustering starting point comprises the following steps:

5. A lake automatic selection device based on dynamic multi-scale clustering comprises a processor and a memory, and is characterized in that,

the processor stores instructions that implement a method comprising:

selecting the clustering layers only comprising one lake;

6. The automatic lake selection device based on dynamic multi-scale clustering according to claim 5, wherein:

7. The automatic lake selection device based on dynamic multi-scale clustering according to claim 5 or 6, wherein: the process for determining the cluster starting point of the lake cluster comprises the following steps:

8. The lake automatic selection device based on dynamic multi-scale clustering according to claim 7, wherein: