CN115563493A - Method for dividing rural landscape ecological units based on clustering algorithm - Google Patents

Method for dividing rural landscape ecological units based on clustering algorithm Download PDF

Info

Publication number
CN115563493A
CN115563493A CN202211410012.2A CN202211410012A CN115563493A CN 115563493 A CN115563493 A CN 115563493A CN 202211410012 A CN202211410012 A CN 202211410012A CN 115563493 A CN115563493 A CN 115563493A
Authority
CN
China
Prior art keywords
grids
data
clustering
type
grid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211410012.2A
Other languages
Chinese (zh)
Inventor
徐宁
张超
樊梦楚
王伟
成玉宁
宋义智
伊丹阳
刘琦琳
王思宇
徐小东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202211410012.2A priority Critical patent/CN115563493A/en
Publication of CN115563493A publication Critical patent/CN115563493A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for dividing rural landscape ecological units based on a clustering algorithm, belonging to the field of research of landscape gardens; the method comprises the following steps: s1, performing mesh division on a research area to obtain mesh division results of A and B; s2, collecting basic information of the research area, and establishing a basic information base of each area; s3, the acquired basic information data are arranged and preprocessed, and secondary analysis is prepared for the grids; s4, carrying out multi-dimensional clustering analysis on the A-type grids; s5, naming and distinguishing the clustering results of the A-type grids; s6, perfecting data of the B-type grids by taking the classification result of the A-type grids as reference; s7, carrying out multi-dimensional clustering analysis on the B-type grids, and naming and distinguishing the results; s8, visually outputting the A-type and B-type grid analysis result data, and distinguishing classification results by using color blocks with different gray levels; s9, combining adjacent grids with the same gray colors, and generating ecological units according to the analysis result.

Description

Method for dividing rural landscape ecological units based on clustering algorithm
Technical Field
The invention belongs to the field of research on landscape gardens, and particularly relates to a method for dividing rural landscape ecological units based on a clustering algorithm.
Background
At present, the existing landscape ecological units are divided by a few methods, and most of the landscape ecological units focus on pure natural landscape and urban landscape. The commonly adopted ecological unit division method comprises a natural division method, a kilometer grid method and the like, but the two methods respectively have the problems of large later-period workload and the like caused by irregular region shape, inconsistent area, unobvious boundary lines and the like.
Since different land use types have different ecological functions, even the same land use type has different ecological functions due to different areas, differences in hydrology, climate, elevation, vegetation coverage, and the like.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a method for dividing rural landscape ecological units based on a clustering algorithm, which considers the current situations of natural geographic conditions, administrative divisions and vegetation conditions of different areas, increases the research and drawing precision, and enables the division of the landscape ecological units to be suitable for various scales and better accord with the dynamic change characteristics of the ecological units in time and space.
The purpose of the invention can be realized by the following technical scheme:
a method for dividing rural landscape ecological units based on a clustering algorithm comprises the following steps:
s1, performing mesh division on a research area to obtain mesh division results of A and B;
s2, collecting basic information of the research area and establishing a basic information base of each area;
s3, the collected basic information data are arranged and preprocessed, and secondary analysis is prepared for the grids;
s4, carrying out multi-dimensional clustering analysis on the A-type grids;
s5, naming and distinguishing the clustering results of the A-type grids;
s6, perfecting the data of the B-type grids by taking the classification result of the A-type grids as a reference;
s7, carrying out multi-dimensional clustering analysis on the B-type grids, and naming and distinguishing the results;
s8, visually outputting the data of the A-type and B-type grid analysis results, and distinguishing classification results by using color patches with different gray levels;
and S9, combining adjacent grids with the same gray colors, and generating an ecological unit according to an analysis result.
Further, in S1, the step of dividing the grid includes:
s11, calculating the regional scale of the research object:
the maximum length H in the X direction and the maximum length L in the Y direction are calculated according to the formula:
number of meshes in Y direction
Figure BDA0003937281210000021
Total number of cells
Figure BDA0003937281210000022
Wherein x is the grid number in the Y direction, and x is a positive integer; n is the unit side length; covering the grids with corresponding scales on a satellite map of a research object, removing the grids with empty contents, and enabling the left grids to be effective grids;
and S12, dividing the effective unit grids into A-type grids and B-type grids according to the difference of the internal functions, the structures and the forms of the single grids.
Further, the basic information collected in S2 includes: land properties, hydrology and climate, elevation and grade, soil sensitivity, and vegetation coverage.
Further, in S3, the data preprocessing includes:
s31, cleaning and transforming the data to obtain data capable of being effectively processed;
s32, converting the continuous data into discrete data;
s33, preprocessing the dimensionality of the data by a PCA principal component analysis method, then selecting 95% of energy factor dimensionality data for clustering, and reducing the dimensionality of a high-dimensional data sample under the condition of low information loss.
Further, in S4, performing multidimensional clustering analysis on the a-type mesh includes:
s41, initializing a matrix to store the data of each grid;
s42, clustering the divided grids by adopting a K-means algorithm of a Sciket-learn library function in Python language, wherein K is the number of the centers of the initially selected samples, and the error square sum function of the overall classification finally reaches a minimum value through iterative calculation convergence of multiple centroids to obtain the centers of K samples;
and S43, performing multiple attempts on the k value and the position of the initial centroid in the algorithm, and selecting a group of results with the best clustering effect as final clustering results.
Further, in S5, the step of naming and distinguishing the clustering results of the a-type grids includes:
s51, obtaining clustering results after multi-dimensional clustering analysis, wherein grid units of the clustering results of different groups are marked by different colors;
and S52, determining similar plots, namely ecological units, by utilizing the adjacent positions of the grids to divide the colors into regions, and naming the ecological units respectively.
Further, in S6, when completing the data of the B-type grid, a condition needs to be added: adjacent cases of different ecosystems from class a.
Further, in S7, the step of performing multidimensional clustering analysis on the B-type grid includes:
1) Initializing a matrix to store data for each grid;
2) Creating centroids, and randomly generating k centroids;
3) Calculating distance, and calculating Euclidean distance: each point X of the n-dimensional euclidean space may be represented as (X [1 ]) X [2] \8230, X [ n ]), where X (i =12 \8230, n) is a real number, the distance d (AB) between two points a = (a [1], [2] \8230, a [ n ]) and B = (B [1], [2] \8230, B [ n ]), which are the i-th coordinates of X, defined as the formula d (AB) = sqrt [ ∑ (a-B) [2] ((i =12 \8230, n) ];
4) Judging the k value, and iteratively calculating the distance;
5) And obtaining a B-type grid clustering result.
Further, in S33, the step of performing PCA dimension reduction includes:
1) Centralizing the sample set X = [ X1, X2, X3, X4,. ] i.e. each attribute of each sample is subtracted by the mean value of the corresponding attribute in the sample set;
2) Calculating a covariance matrix D = XXT;
3) Sorting the eigenvalues from big to small, selecting the projection directions of the k attributes with the lowest correlation for linear combination, and then taking the corresponding k eigenvectors as row vectors respectively to form an eigenvector matrix P;
4) The data is converted into a new space constructed of k eigenvectors, i.e., Y = PX.
Further, the specific step of S42 is:
firstly, randomly selecting any number k, and respectively recording k centers as mu 12 ,…,μ k (ii) a And substituting the mean value of each clustering object into a formula group:
Figure BDA0003937281210000041
D=min Dis j
computing each object and among theseThe Euclidean distance Dis of the center object, and the corresponding object is divided again according to the minimum distance D, so as to calculate the clustering center of each obtained new cluster
Figure BDA0003937281210000042
This process is repeated until the mean square error standard measure function begins to converge; the calculation formula is as follows:
Figure BDA0003937281210000051
e is the sum of the mean square deviations of all the objects in the database; p is a point in space of the object; mu.s i : clustering x i Is measured.
The invention has the beneficial effects that:
1. the invention considers the current situations of natural geographic conditions, administrative divisions and vegetation conditions of different areas, increases the research and drawing precision, and leads the landscape ecological unit division to be capable of adapting to various scales and better conforming to the problem of dynamic change characteristics of ecological units in time and space;
2. the invention utilizes artificial intelligence algorithm to divide the country landscape ecological units, makes up the defects of the prior country landscape ecological unit division method based on subjective image reading and perceptual judgment, and provides a quantitative analysis method which divides the country landscape ecological units into main research objects, introduces computer programming algorithm and combines the prior grid division method to realize the division of the country landscape ecological units.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is an overall flow diagram of the present invention;
FIG. 2 is a schematic diagram of meshing according to an embodiment of the present invention;
FIG. 3 is a schematic view of a study range grid in accordance with an embodiment of the present invention;
FIG. 4 is a schematic diagram of the nature of the plot in accordance with an embodiment of the present invention;
FIG. 5 is a schematic illustration of a grade profile of an embodiment of the invention;
FIG. 6 is a land sensitivity schematic of an embodiment of the present invention;
FIG. 7 is a schematic representation of vegetation coverage of an embodiment of the invention;
FIG. 8 is a schematic diagram of a class A mesh clustering process according to an embodiment of the present invention;
fig. 9 is a schematic diagram of a final clustering result according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The clustering algorithm is a clustering method which takes the distance as a similarity index, takes the sum of the squares of errors from sample points to the center of a category as an evaluation standard of cluster goodness and inferiority, and finally enables the sum of the squares of errors of overall classification to reach a minimum value through a continuous iteration method.
As shown in fig. 1, a method for dividing rural landscape ecological units based on a clustering algorithm includes the following steps:
s1, reasonably dividing a research area into grids according to a standard to obtain a grid division result A and a grid division result B; the method comprises the following specific steps:
s11, calculating the regional scale of the research object;
the maximum length H in the X direction and the maximum length L in the Y direction are calculated according to the formula:
number of meshes in Y direction
Figure BDA0003937281210000061
Total number of cells
Figure BDA0003937281210000062
Wherein x is the grid number in the Y direction, and x is a positive integer; n is the unit side length, and the size of n depends on the size of the scale of the research object; when the area of the study object is 10km 2 When the sum is within, n is suggested to take a value of 100m; when the study subject is at 10km 2 In the above, n is generally in the range of [100, 500 ]]In the middle of; covering the grids with corresponding scales on the satellite map of the research object, removing the grids with empty contents, and leaving the grids as effective grids.
S12, dividing the grids into A and B types, and naming;
the effective unit grids have relatively consistent grids with internal functions, composition, forms and the like of a single grid, and are called A-type grids; there are also meshes with large differences in functions, composition, morphology, etc., called B-type meshes; the method is named respectively, wherein the A-type grids are named as A1 from northwest corner, and the A1, A2 and A3 are numbered 82308230in sequence from left to right from top to bottom, the B-type grids are similarly named as B1, B2 and B3 8230, 8230.
S2, researching and collecting basic information of the research area, and establishing a basic information base of each area;
basic information of research areas is researched and collected, wherein the basic information comprises 5 types of factors including land property, hydrology and climate, elevation and gradient, soil sensitivity and vegetation coverage, and a basic information base of each area is established;
the land property: according to the classification of the current land utilization (GB/T21010-2017), the land utilization types in China are divided into 8 primary types, namely cultivated land, garden land, forest land, pasture land, residential site, industrial and mining land, traffic land, water area and unused land; accordingly, land use information of the research area is collected in a classified mode.
Hydrology and climate: comprises hydrological elements and climatic elements; the hydrological factors determine the data collecting conditions including water level, flow rate and the like according to whether surface runoff exists; the climate elements comprise air temperature characteristics, precipitation characteristics, wind direction and wind power and precipitation;
elevation and gradient: recording the slope direction and the gradient while collecting elevation data;
soil sensitivity: the degree of the soil erosion modulus in a research area is shown; generally classified into six grades according to soil erosion modulus: micro erosion, mild erosion, moderate erosion, strong erosion, extremely strong erosion, and severe erosion;
vegetation coverage: the area of the vegetation is the percentage of the total area of the statistical area in the vertical projection of the vegetation on the ground, and the vegetation types are recorded while data are collected.
S3, the collected basic information data are arranged and preprocessed, and secondary analysis is prepared for the grids; the method specifically comprises the following steps:
s31, data preprocessing, including data cleaning, data transformation and the like, is carried out to obtain data which can be effectively processed, cleaning is carried out on complex objects with large data quantity, and data is subjected to normalized processing;
s32, data conversion, namely converting continuous data into discrete data;
and S33, carrying out PCA (principal component analysis) dimensionality reduction processing, preprocessing the dimensionality of the data by using a PCA principal component analysis method, then selecting 95% of data with energy factor dimensionality for clustering, reducing the dimensionality of a high-dimensional data sample under the condition of low information loss, and improving the operation performance and the processing effect of the algorithm.
S4, carrying out multi-dimensional clustering analysis on the A-type grids;
importing the A-type data into Python for multidimensional clustering, and calculating the Euclidean distance of the A-type data after loading the data, wherein the multidimensional clustering algorithm analysis comprises the following steps:
s41, initializing a matrix to store the data of each grid; the method comprises the steps of obtaining a grid name and corresponding item data of land property, water level, flow rate, runoff, average air temperature, annual precipitation, elevation, gradient, soil sensitivity and vegetation coverage;
s42, clustering the divided grids by adopting a K-means algorithm of a Sciket-leann library function in Python language, wherein K is the number of the centers of the initially selected samples; through iterative calculation convergence of multiple centroids, the sum of squared errors of the overall classification finally reaches a minimum value, and the centers of K samples are obtained;
s43, performing multiple attempts on the k value in the algorithm, performing multiple attempts on the position of the initial centroid, and finally selecting a group of results with the best clustering effect as a final clustering result;
the program code of S4 is:
Figure BDA0003937281210000081
Figure BDA0003937281210000091
Figure BDA0003937281210000101
Figure BDA0003937281210000111
s5, naming and distinguishing the clustering results of the A-type grids; the method specifically comprises the following steps:
s51, obtaining clustering results after multi-dimensional clustering analysis, wherein grid units of the clustering results of different groups are marked by different colors;
s52, determining similar plots by partitioning colors according to adjacent positions of grids, namely, an ecological unit; are respectively named as L1, L2 \8230;.
The program code of S5 is:
Figure BDA0003937281210000112
Figure BDA0003937281210000121
Figure BDA0003937281210000131
Figure BDA0003937281210000141
s6, perfecting the data of the B-type grids by taking the classification result of the A-type grids as a reference;
since the areas with different geographic features included in the type B grid as a complex parcel have geographic and feature relationships with adjacent ecological units, a condition is imposed in the data of the type B grid: the adjacency of different ecounits from class a; such as: between L1 and L2 is labeled as being adjacent to the L1 and L2 plots.
S7, carrying out multi-dimensional clustering analysis on the B-type grids, and naming and distinguishing the results; the method specifically comprises the following steps:
s71, loading data;
s72, carrying out multi-dimensional clustering algorithm and k-means clustering;
(1) initializing a matrix to store data of each grid, wherein the data comprises grid names and corresponding items such as the number of land types, land properties, water levels, flow rates, runoff volumes, average air temperatures, annual precipitation volumes, elevations, gradients, soil sensitivity, vegetation coverage, conditions adjacent to surrounding A-type plots and the like;
(2) creating centroids, and randomly generating k centroids;
(3) calculating distance, and calculating Euclidean distance: each point X of the n-dimensional euclidean space may be represented as (X [1 ]) X [2] \8230, X [ n ]), where X (i =12 \8230, n) is a real number, the distance d (AB) between two points a = (a [1], [2] \8230, a [ n ]) and B = (B [1], [2] \8230, B [ n ]), which are the i-th coordinates of X, defined as the formula d (AB) = sqrt [ ∑ (a-B) [2] ((i =12 \8230, n) ];
(4) judging the k value, and iteratively calculating the distance;
(5) obtaining a B-type grid clustering result;
s73, determining similar plots by partitioning colors by using adjacent positions of grids, namely an ecological unit; respectively named as P1, P2 \8230; "Ying8230;".
S8, visually outputting the data of the A-type and B-type grid analysis results, and distinguishing classification results by using color blocks with different gray levels; the method comprises the following specific steps:
s81, marking the country landscape ecological units obtained by clustering results of different groups with different colors;
and S82, putting the visualized grid data array back to the map in situ according to the sequence.
S9, combining adjacent grids with the same gray colors, and generating an ecological unit according to an analysis result; the method comprises the following specific steps:
s91, analyzing grid colors according to administrative divisions and geographic positions, and combining adjacent grids with the same gray colors to obtain an ecological unit;
s92, systematically naming each unit according to the characteristics; such as XX village XX mountain land, XX reservoir, XX boundary wetland and the like.
Example (b):
taking a place as an example, the rural landscape ecological units are divided by using a clustering algorithm, scientific division of the rural landscape ecological units based on the clustering algorithm is realized, and landscape ecological units with clear characteristics are divided by classifying and researching single and composite ecological regions, and the specific implementation steps are as follows:
s1, reasonably dividing the grids according to the standard to obtain a grid division result A and a grid division result B.
S11, the total area of the region is 8.2 square kilometers, the research precision corresponding to the grids of 100mx 100m is applicable to the research object according to the regional dimension of the research object, and n =100m is taken; according to the grid standard, a fishing net tool in ArcGIS is utilized to perform grid division on the external rectangular range of the housing community, 353 × 564 grids are formed together, and a system with a single grid of 100m × 100m is established, as shown in FIG. 2;
establishing a geometric center of the fishernet label corresponding to each grid, and determining as a sampling point of the grid; finally, removing the grids with empty contents in the circumscribed rectangle, as shown in fig. 3;
s12, the effective unit grids include a grid (called type A grid) with relatively consistent functions, structures, forms and the like in a single grid, and a grid (called type B grid) with relatively large differences in functions, structures, forms and the like; the method is named respectively, wherein the A-type grids are named as A1 from northwest corner, and the A1, A2 and A3 are numbered 82308230in sequence from left to right from top to bottom, the B-type grids are similarly named as B1, B2 and B3 8230, 8230.
S2, researching and collecting basic information of the selected area, wherein the basic information comprises specific factors such as land use property, hydrology, climate, elevation, gradient, soil sensitivity and vegetation coverage, and establishing a basic information base of each area;
land type: according to the classification of the current land utilization (GB/T21010-2017), the land utilization types of China are divided into 8 primary types of cultivated land, garden land, forest land, pasture land, residential site, industrial and mining land, transportation land, water area and unused land, and land information of a research area is collected according to the primary types, and is shown in FIG. 4;
hydrological climate: comprises hydrological elements and climatic elements; the hydrological factors determine the data collecting conditions including water level, flow rate and the like according to whether surface runoff exists; the climate elements comprise air temperature characteristics, precipitation characteristics, wind direction, wind power, precipitation and the like;
elevation: recording the slope direction and the gradient while collecting elevation data; as shown in fig. 5;
soil sensitivity: the degree of the soil erosion modulus in a research area is shown; generally classified into six grades according to soil erosion modulus: micro erosion, mild erosion, moderate erosion, strong erosion, extremely strong erosion and severe erosion; as shown in fig. 6;
vegetation coverage: the percentage of the vertical projection area of the vegetation on the ground to the total area of the statistical region is referred to. Recording vegetation types while collecting data; as shown in fig. 7;
the following matters need to be noted when collecting data:
(1) Collecting data by taking a grid as a unit;
(2) When the A-type grids collect data, the average value is taken, each grid collects a set of data by taking the grid as a unit, and the data in the grid is representative. Wherein: recording the land property according to a land property map provided locally; the hydrological climate comprises hydrological elements and climate elements, the hydrological elements comprise whether surface runoff exists in the grid, and runoff data is collected synchronously when the surface runoff exists. Acquiring the average air temperature and the annual precipitation of the position of each grid through network data during the collection of climate elements; adopting an average sampling method for elevation and gradient, and calculating an average value and variance of elevation after three rows and three columns of sampling points are selected at equal intervals in each small grid to obtain a result; vegetation coverage this data is calculated as the area fraction of vegetation in the area of the grid.
(3) When collecting data in the B-type grid, attention is paid to element extraction of natural geographic properties of different types. Such as: and respectively acquiring data in the same grid unit including a water area, cultivated land and residential land. Wherein: recording the land property according to a land property map provided locally; when the vegetation coverage is used for collecting data, the non-water area is used as a denominator in the same grid, and the water area is marked;
s3, the collected basic information data are arranged and preprocessed, and secondary analysis is prepared for the grids;
s31, data preprocessing including data cleaning, data transformation and the like to obtain data capable of being processed effectively;
data cleaning refers to deleting irrelevant data and repeated data in original data and processing abnormal values and normal values; the method mainly aims at cleaning objects with large data volume and complexity and removing abnormal numerical values according to rules. The processing of the missing value adopts an interpolation method, such as the processing of the 'elevation' missing value: if data are lost by adopting an equidistant sampling method, interpolation can be carried out by using a mean value/median value/mode method;
data transformation refers to the normalization of data; because dimensions of different feature data in the collected original data may not be consistent, the numerical difference may be large, for example, the value range of the "elevation average value" is often greatly different from the "vegetation coverage", and at this time, the "elevation average value" may be scaled in proportion through data normalization. Scaling the intermediate values to the range of [ -1,1] by specifying a maximum value max =1 and a minimum value min = -1 for elevation, thereby increasing comparability between different types of data;
s32, data conversion
Presenting the collected data in an Excel table form, wherein one grid corresponds to one piece of data; the A-type data comprises 11 items of land property, water level, flow rate, runoff, average air temperature, annual precipitation, elevation, gradient, soil sensitivity and vegetation coverage; when surface runoff does not exist in the grid, item runoff data does not exist; for the A-type data clustering, the number of the entries is actually considered to be 10 or 11; the class B data comprises more than two land types due to the complexity of land blocks, so that the data collection needs to count the number of the land types contained in a single grid range in addition to the conventional data; such as: the number of the land types of a certain land area including three land areas of cultivated land, a water area and a forest land is 3.
And (3) data conversion: and converting continuous data into discrete data. Giving identification to the continuous data type segment; such as: and processing the elevation data by using an equal-width method, dividing the value range of the elevation data into intervals with the same width, wherein the number of the intervals is determined by the characteristics of the data.
S33, PCA (principal component analysis) dimension reduction processing
Preprocessing the dimensionality of the data by a PCA principal component analysis method, then selecting 95% of data with energy factor dimensionality for clustering, reducing dimensionality of a high-dimensional data sample under the condition of low information loss, and improving the operation performance and processing effect of an algorithm; the method mainly comprises the following steps:
1) Sample set X = [ X1, X2, X3, X4, ] is centered, i.e. each attribute of each sample is subtracted by the mean value of the corresponding attribute in the sample set; for example, a sample set X in the A-type grid is not = the land property, water level, flow rate, runoff rate, average air temperature, annual precipitation amount, elevation, gradient, soil sensitivity and vegetation coverage;
2) Calculating a covariance matrix D = XXT (reflecting the degree of correlation between different attributes);
3) Sorting the eigenvalues from large to small, selecting the projection directions of k attributes with the lowest correlation for linear combination, and then taking the corresponding k eigenvectors as row vectors respectively to form an eigenvector matrix P;
4) The data is transformed into a new space constructed of k eigenvectors, i.e., Y = PX.
S4, carrying out multi-dimensional clustering analysis on the A-type grids;
importing the A-type data into Python for multi-dimensional clustering, and calculating the Euclidean distance of the A-type data after loading the data; wherein the multidimensional clustering algorithm analysis comprises:
initializing a matrix to store data of each grid, wherein the data comprises grid names and corresponding land property, water level, flow rate, runoff rate, average air temperature, annual precipitation amount, elevation, gradient, soil sensitivity and vegetation coverage item data;
the three-dimensional clustering process of the Python language comprises the following steps: first, randomly selecting any number k (representing the final country landscape unit type of the study object), and recording k centers as mu 12 ,…,μ k (ii) a And substituting the mean value of each clustering object into a formula group:
Figure BDA0003937281210000191
D=min Dis j
calculating Euclidean distance Dis between each object and the central objects, dividing the corresponding objects again according to the minimum distance D, and further calculating the clustering center of each obtained new cluster
Figure BDA0003937281210000192
(mean of all objects in the cluster), this process is repeated until the mean square error standard measure function begins to converge; the calculation formula is as follows:
Figure BDA0003937281210000193
e is the sum of the mean square deviations of all the objects in the database; p is null of objectOne point in between; mu.s i Is a cluster x i Mean value of (p and μ) i Are all multi-dimensional); as shown in fig. 8.
S5, naming and distinguishing the clustering results of the A-type grids;
s51, obtaining a clustering result after multi-dimensional clustering analysis; marking the grid cells of the clustering results of different groups with different colors;
s52, determining similar plots by using the adjacent positions of the grids in a color partition mode, namely an ecological unit; are respectively named as L1, L2 \8230;.
S6, perfecting the data of the B-type grids by taking the classification result of the A-type grids as a reference;
since the areas with different geographic features included in the type B grid as a complex parcel have geographic and feature relationships with adjacent ecological units, a condition is imposed in the data of the type B grid: the adjacency of different ecounits from class a; such as: between L1 and L2 is labeled as being adjacent to the L1 and L2 plots.
And S7, carrying out multi-dimensional clustering analysis on the B-type grids, and naming and distinguishing the results. The method comprises the following steps:
s71, loading data;
s72, carrying out multi-dimensional clustering algorithm and k-means clustering;
(1) initializing a matrix to store data of each grid, wherein the data comprises grid names and items of corresponding land type number, land property, water level, flow rate, runoff, average air temperature, annual precipitation amount, elevation, gradient, soil sensitivity, vegetation coverage, adjacent condition with surrounding A-type plots and the like;
(2) creating centroids, and randomly generating k centroids;
(3) calculating distance, and calculating Euclidean distance: each point X of the n-dimensional euclidean space may be represented as (X [1 ]) X [2] \8230, X [ n ]), where X (i =12 \8230, n) is a real number, the distance d (AB) between two points a = (a [1], [2] \8230, a [ n ]) and B = (B [1], [2] \8230, B [ n ]), which are the i-th coordinates of X, defined as the formula d (AB) = sqrt [ ∑ (a-B) [2] ((i =12 \8230, n) ];
(4) judging the k value, and iteratively calculating the distance;
(5) obtaining a B-type grid clustering result;
and S73, determining a similar plot by partitioning colors by using adjacent positions of the grids, namely an ecological unit. Are respectively named as P1, P2 \8230, 8230.
S8, visually outputting the data of the A-type and B-type grid analysis results, and distinguishing classification results by using color patches with different gray levels;
s81, marking the country landscape ecological units obtained by clustering results of different groups with different colors;
and S82, putting the visualized grid data array back to the map in situ according to the sequence.
S9, combining adjacent grids with the same gray colors, and generating ecological units according to analysis results;
s91, analyzing grid colors according to administrative divisions and geographic positions, and combining adjacent grids with the same gray colors to obtain an ecological unit; (as shown in FIG. 9);
and S92, systematically naming each unit according to the characteristics. Such as XX village XX mountain land, XX reservoir, XX boundary wetland and the like.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are given by way of illustration of the principles of the present invention, but that various changes and modifications may be made without departing from the spirit and scope of the invention, and such changes and modifications are within the scope of the invention as claimed.

Claims (10)

1. A method for dividing rural landscape ecological units based on a clustering algorithm is characterized by comprising the following steps:
s1, performing mesh division on a research area to obtain mesh division results of A and B;
s2, collecting basic information of the research area, and establishing a basic information base of each area;
s3, arranging the acquired basic information data, preprocessing the acquired basic information data, and preparing to carry out secondary analysis on the grids;
s4, carrying out multi-dimensional clustering analysis on the A-type grids;
s5, naming and distinguishing the clustering results of the A-type grids;
s6, perfecting the data of the B-type grids by taking the classification result of the A-type grids as a reference;
s7, carrying out multi-dimensional clustering analysis on the B-type grids, and naming and distinguishing the results;
s8, visually outputting the data of the A-type and B-type grid analysis results, and distinguishing classification results by using color blocks with different gray levels;
and S9, combining adjacent grids with the same gray colors, and generating an ecological unit according to an analysis result.
2. The method for partitioning rural landscape ecological units based on clustering algorithm as claimed in claim 1, wherein in S1, the step of partitioning grids comprises:
s11, calculating the regional scale of the research object:
the maximum length H in the X direction and the maximum length L in the Y direction are calculated according to the formula:
number of meshes in Y direction
Figure FDA0003937281200000011
Total number of cells
Figure FDA0003937281200000012
Wherein x is the grid number in the Y direction, and x is a positive integer; n is the unit side length; covering the grids with corresponding scales on a satellite map of a research object, removing the grids with empty contents, and enabling the left grids to become effective grids;
and S12, dividing the effective unit grids into A-type grids and B-type grids according to the difference of the internal functions, the structures and the forms of the single grids in the effective unit grids.
3. The method for partitioning rural landscape ecosystem according to the clustering algorithm, according to the claim 1, wherein the basic information collected in S2 comprises: land properties, hydrology and climate, elevation and grade, soil sensitivity, and vegetation coverage.
4. The method for partitioning rural landscape ecosystem according to the clustering algorithm, according to the claim 1, wherein in the S3, the data preprocessing step is:
s31, cleaning and transforming the data to obtain data capable of being effectively processed;
s32, converting the continuous data into discrete data;
s33, preprocessing the dimensionality of the data by a PCA (principal component analysis) method, then selecting 95% of data with energy factor dimensionality for clustering, and reducing the dimensionality of a high-dimensional data sample under the condition of low information loss.
5. The method for partitioning ecological units of the rural landscape based on the clustering algorithm as claimed in claim 4, wherein the step of performing the multidimensional clustering analysis on the A-type grids in S4 comprises:
s41, initializing a matrix to store the data of each grid;
s42, clustering the divided grids by adopting a K-means algorithm of a Scikit-learn library function in a Python language, wherein K is the number of centers of the initially selected samples, and the error square sum function of the overall classification finally reaches a minimum value through iterative calculation convergence of multiple centroids to obtain the centers of K samples;
s43, the k value and the position of the initial centroid in the algorithm are tried for multiple times, and a group of results with the best clustering effect are selected as final clustering results.
6. The method for partitioning rural landscape ecological units based on clustering algorithm as claimed in claim 5, wherein in S5, the step of naming and distinguishing the clustering results of the A-type grids comprises:
s51, obtaining clustering results after multi-dimensional clustering analysis, wherein grid units of the clustering results of different groups are marked by different colors;
and S52, determining similar plots by partitioning colors according to adjacent positions of the grids, namely ecological units, and naming the ecological units respectively.
7. The method for partitioning rural landscape ecological units based on clustering algorithm as claimed in claim 6, wherein in the step S6, when the data of the B-type grid is perfected, the following conditions are added: the adjacency of different ecounits from class a.
8. The method for partitioning rural landscape ecological units based on clustering algorithm as claimed in claim 7, wherein in S7, the step of performing multidimensional clustering analysis on the B-type grids comprises:
1) Initializing a matrix to store data for each grid;
2) Creating centroids, and randomly generating k centroids;
3) Calculating distance, and calculating Euclidean distance: each point X of the n-dimensional euclidean space may be represented as (X [1 ]) X [2] \8230, X [ n ]), where X (i =12 \8230, n) is a real number, the distance d (AB) between two points a = (a [1], [2] \8230, a [ n ]) and B = (B [1], [2] \8230, B [ n ]), which are the i-th coordinates of X, defined as the formula d (AB) = sqrt [ ∑ (a-B) [2] ((i =12 \8230, n) ];
4) Judging the k value, and iteratively calculating the distance;
5) And obtaining a B-type grid clustering result.
9. The method for partitioning rural landscape ecosystem according to the clustering algorithm, according to the claim 4, wherein in the step S33, the PCA dimension reduction processing is performed by:
1) Centralizing the sample set X = [ X1, X2, X3, X4,. ] i.e. each attribute of each sample is subtracted by the mean value of the corresponding attribute in the sample set;
2) Calculating a covariance matrix D = XXT;
3) Sorting the eigenvalues from large to small, selecting the projection directions of k attributes with the lowest correlation for linear combination, and then taking the corresponding k eigenvectors as row vectors respectively to form an eigenvector matrix P;
4) The data is transformed into a new space constructed of k eigenvectors, i.e., Y = PX.
10. The method for partitioning rural landscape ecosystem according to the clustering algorithm, according to the claim 5, wherein the S42 comprises the following specific steps:
firstly, randomly selecting any number k, and respectively recording k centers as mu 12 ,…,μ k (ii) a And substituting the mean value of each clustering object into a formula group:
Figure FDA0003937281200000041
calculating Euclidean distance Dis between each object and the central objects, dividing the corresponding objects again according to the minimum distance D, and further calculating the clustering center of each obtained new cluster
Figure FDA0003937281200000042
This process is repeated until the mean square error standard measure function begins to converge; the calculation formula is as follows:
Figure FDA0003937281200000043
e is the sum of the mean square deviations of all the objects in the database; p is a point in the space of the object; mu.s i Is a cluster x i Of the average value of (a).
CN202211410012.2A 2022-11-10 2022-11-10 Method for dividing rural landscape ecological units based on clustering algorithm Pending CN115563493A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211410012.2A CN115563493A (en) 2022-11-10 2022-11-10 Method for dividing rural landscape ecological units based on clustering algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211410012.2A CN115563493A (en) 2022-11-10 2022-11-10 Method for dividing rural landscape ecological units based on clustering algorithm

Publications (1)

Publication Number Publication Date
CN115563493A true CN115563493A (en) 2023-01-03

Family

ID=84770537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211410012.2A Pending CN115563493A (en) 2022-11-10 2022-11-10 Method for dividing rural landscape ecological units based on clustering algorithm

Country Status (1)

Country Link
CN (1) CN115563493A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115860695A (en) * 2023-02-09 2023-03-28 广东智环创新环境科技有限公司 Environment-friendly informatization management system based on ecological space
CN116484266A (en) * 2023-05-18 2023-07-25 广东国地规划科技股份有限公司 Fine urban land type recognition model training method
CN117079124A (en) * 2023-07-14 2023-11-17 北京大学 Urban and rural landscape image quantification and promotion method based on community differentiation

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115860695A (en) * 2023-02-09 2023-03-28 广东智环创新环境科技有限公司 Environment-friendly informatization management system based on ecological space
CN116484266A (en) * 2023-05-18 2023-07-25 广东国地规划科技股份有限公司 Fine urban land type recognition model training method
CN116484266B (en) * 2023-05-18 2023-11-24 广东国地规划科技股份有限公司 Fine urban land type recognition model training method
CN117079124A (en) * 2023-07-14 2023-11-17 北京大学 Urban and rural landscape image quantification and promotion method based on community differentiation
CN117079124B (en) * 2023-07-14 2024-04-30 北京大学 Urban and rural landscape image quantification and promotion method based on community differentiation

Similar Documents

Publication Publication Date Title
CN112966926B (en) Flood sensitivity risk assessment method based on ensemble learning
Schulz et al. Land use mapping using Sentinel-1 and Sentinel-2 time series in a heterogeneous landscape in Niger, Sahel
CN115563493A (en) Method for dividing rural landscape ecological units based on clustering algorithm
CN111666918B (en) Coastline change identification method based on multiple factors
Bai et al. Recent land degradation and improvement in China
Treitz et al. Remote sensing for mapping and monitoring land-cover and land-use change-an introduction
CN113642849B (en) Geological disaster risk comprehensive evaluation method and device considering spatial distribution characteristics
CN111598045B (en) Remote sensing farmland change detection method based on object spectrum and mixed spectrum
CN115861629A (en) High-resolution farmland image extraction method
CN117036061B (en) Intelligent solution providing method and system for intelligent agricultural insurance
Tu et al. A 30 m annual cropland dataset of China from 1986 to 2021
CN113570273A (en) Spatial method and system for irrigation farmland statistical data
Mansor et al. Optimization of land use suitability for agriculture using integrated geospatial model and genetic algorithms
Nelson et al. Spatial statistical techniques for aggregating point objects extracted from high spatial resolution remotely sensed imagery
CN115965812B (en) Evaluation method for classification of unmanned aerial vehicle images on wetland vegetation species and land features
CN114706900B (en) Precipitation similarity forecasting method based on image feature combination
Qu et al. Mapping large area tea plantations using progressive random forest and Google Earth Engine
CN114462834A (en) Regional portrait construction method and system based on multi-channel data fusion
Serra et al. Thematic accuracy consequences in cadastre land-cover enrichment from a pixel and from a polygon perspective
Bao et al. An automatic extraction method for individual tree crowns based on self-adaptive mutual information and tile computing
Furberg et al. Satellite monitoring of urbanization and environmental impacts in Stockholm, Sweden, through a multiscale approach
Tiwari et al. In-season crop-area mapping for wheat and rice in Afghanistan and Bangladesh
Suwanlee et al. Population Estimation Using Land-Use Change Data from Multi-Sensor Images in Maha Sarakham Province, Thailand
CN117931793A (en) Method and system for creating crop area sampling statistical model cluster
Mariz et al. Fusion of multi‐spectral SPOT‐5 images and very high resolution texture information extracted from digital orthophotos for automatic classification of complex Alpine areas

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination