CN116644809A

CN116644809A - Urban development boundary demarcation method integrating geographic big data and machine learning

Info

Publication number: CN116644809A
Application number: CN202310597567.0A
Authority: CN
Inventors: 夏南; 高醒; 赵鑫; 庄苏丹; 王梓宇; 梁加乐; 李满春
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2023-05-25
Filing date: 2023-05-25
Publication date: 2023-08-25
Anticipated expiration: 2043-05-25
Also published as: CN116644809B

Abstract

The invention discloses a town development boundary demarcation method integrating geographic big data and machine learning, which comprises the following steps: s1, preprocessing such as data cleaning is carried out on a large amount of collected multi-source geographic big data; s2, constructing index factors from three dimensions of natural factors, ecological factors and human factors, analyzing town development boundary suitability from a single factor angle, and obtaining scoring standards of all evaluation indexes; s3, constructing a random forest model to determine each index weight. The invention integrates multisource geographic big data based on basic geographic data such as space survey data and the like, integrates the geographic big data such as night light, micro-blog check-in, house land price data and the like, selects comprehensive and reasonable factors from three dimensions of nature, humanity and ecology to construct a town development boundary suitability index system, takes a manually drawn boundary as a reference, determines weights by using a machine learning method, and defines a town development boundary.

Description

Urban development boundary demarcation method integrating geographic big data and machine learning

Technical Field

The invention relates to the technical field of town development, in particular to a town development boundary demarcation method integrating geographic big data and machine learning.

Background

The prior researches are mainly focused on natural and ecological factors in index selection, and are limited to discussion of population, traffic infrastructures and the like in human factors, so that the prior researches are deficient in obtaining macroscopic social and economic driving force of urban expansion, and the influence of human activities and economic development levels of various areas in a research area on urban land expansion cannot be fully considered. A large number of researches show that the daily dynamic activities of residents have extremely high modeling capability on urban space structures, and the consumption activities and action tracks of the residents can promote the demands of construction lands, so that the micro-blog sign-in and public comment user evaluation data representing population activity characteristics and space distribution are outstanding in the research of urban space expansion. In addition, the economic level is important power and pulse for promoting urban development, has driving effect on urban construction and expansion, and can well grasp the economic condition of a research area by night lamplight remote sensing data and house land price data so as to comprehensively understand the development architecture and comprehensive level, and the application of the method in UGB demarcation can lead the result to have better application value. The geographic big data are fused, the human factors are further excavated, and the method has a certain research significance in the aspects of omnibearing evaluation of cities and accurate expression of city boundaries, and deserves intensive research.

In the UGB demarcation process, the weight determination of the influencing factors relates to the accuracy of the result. The current weight determination method is mainly based on a hierarchical analysis method, an entropy weight method and the like which are consulted by Delfei experts, however, the subjective weight assignment method is difficult to deal with in the presence of large data volume, multi-dimensional index factors and complex association of influence factors and UGB. With the rapid development of artificial intelligence, machine learning has attracted extensive attention from students. The method is strong in intelligence, has strong learning ability, can ensure higher operation speed and accuracy, can avoid errors caused by subjective experiences to a certain extent, and is applied to UGB related researches such as land utilization classification, city expansion simulation combined with a CA model and the like. The method has the advantages that the method synthesizes good performance of machine learning, a way can be provided for weight determination in comprehensive evaluation of development boundaries, the training data is considered, the prior method for manually defining UGB is deficient in quantitative analysis, but the conditions of natural feature forms, feature boundaries and current cardinality are fully considered, a 'double evaluation' result and a resource constraint base line are combined, and the method has certain rationality and reference value in consideration of the requirement of a tight policy. Combining with the manual drawing version, applying machine learning in UGB-defined factor weight setting widens the thought for more scientific basis and accurately and objectively defining UGB, and needs to be further discussed.

Therefore, the problem of objectively and scientifically determining the index weight defined by UGB comprehensive evaluation is focused, the relation between the humane factors and UGB is fully developed, the comprehensive requirements of boundary definition are more comprehensive, the potential of machine learning on weight setting is maximized, multi-source geographic big data such as micro-blog check-in, house land price economic data and night lamplight are fused, and UGB suitability evaluation is carried out on various urban lands from nature, ecology and humane all-around.

Disclosure of Invention

The invention aims to provide a town development boundary demarcation method integrating geographic big data and machine learning, which aims to solve the problems in the background technology.

A town development boundary demarcation method integrating geographic big data and machine learning comprises the following steps:

s1, preprocessing such as data cleaning is carried out on a large amount of collected multi-source geographic big data;

s2, constructing index factors from three dimensions of natural factors, ecological factors and human factors, analyzing town development boundary suitability from a single factor angle, and obtaining scoring standards of all evaluation indexes;

s3, constructing a random forest model to determine each index weight;

s4, after an evaluation index system and an index 1-4 score suitability grading result are obtained, carrying out multi-factor superposition analysis according to index weights determined by a random forest model, and weighting and superposing the suitability scores of the indexes of each grid by using an ArcGIS tool according to the index weights to obtain final town development boundary suitability evaluation scores of the grids in the research area, thereby comprehensively evaluating and obtaining the suitability degree of the grids defined as the town development boundary;

s5, taking different suitability scores as thresholds for dividing town development boundaries, discussing the total precision and Kappa coefficient (classification consistency evaluation index, the higher the value is, the better the classification effect is) of the delimited town development boundaries under different thresholds, selecting the score with the highest precision to be determined as the threshold for dividing the town development boundaries in the study, and determining that the grid suitability score is larger than the score to be defined as the town development boundary, and if smaller than the score, not suitable to be defined as the town development boundary, thereby obtaining the final town development boundary defining result of the experiment.

As a further improvement of the present invention, the natural elements include four factors of terrain, water area, landscape pattern and geological disaster;

the terrain comprises two indexes of gradient and elevation;

the water area comprises three indexes of a distance from a river, a density of a water area in a grid and a distance from a lake to a reservoir;

the landscape pattern comprises an index of comprehensive landscape indexes;

the geological disaster comprises an index which is distant from a ground disaster point;

the landscape pattern index is selected from landscape indexes such as plaque AREA (AREA), gyration radius (GYRATE), SHAPE index (SHAPE) and the like, and the landscape pattern of the current construction land is calculated based on the Fragstats platform.

As a further improvement of the invention, the ecological elements comprise 5 factors of ecological land conditions, cultivated land conditions, soil, vegetation abundance and atmospheric environment;

the ecological land condition comprises three indexes of the coverage rate (forest, garden, grass and wetland) of the ecological land, wherein the three indexes are equidistant from a forest park, a scenic spot and a natural protection area and are equidistant from the ecological land (forest, garden, grass and wetland);

the cultivated land condition comprises two indexes of a distance between the cultivated land and a cultivated land occupation ratio;

the soil comprises two indexes of soil organic matter content and soil erosion sensitivity;

the vegetation abundance comprises an index of normalized vegetation index NDVI;

the atmospheric environment includes PM _2.5 Concentration and PM ₁₀ Two indicators of concentration.

As a further improvement of the present invention, the human factor includes seven factors of traffic conditions, location, public/commercial facilities, population, economy (land price), industrial prospect, and land use;

the traffic conditions comprise four indexes of the density of railways and road networks and the density of subways and bus stops, wherein the distance between the traffic conditions and the adjacent high-speed intersections, the distance between the traffic conditions and the adjacent airports and the distance between the traffic conditions and the adjacent high-speed intersections, the distance between the traffic conditions and the adjacent train stations;

the location comprises an index of distance from the city county center;

the public/commercial facilities comprise two indexes of the density of the science and technology facilities and the density of the commercial service industry facilities;

the population comprises two indexes of population density and population activity distribution;

the economy (land price) comprises four indexes of second industry increment, house rent, shop rent and economic activity;

the industrial prospect comprises an index of development area/industrial park density;

the land use condition comprises an index of urban land occupation ratio.

As a further improvement of the present invention, the step S3 specifically includes the steps of:

firstly training a model by using town development boundary reference data, and assigning 1 to an area with town development boundary attribute in training data and 0 to other areas as land state values;

respectively extracting about 10% of sample points of town development edges and other areas in training data by using a random hierarchical sampling method, and acquiring space coordinates of the sample points;

the Sample function of ArcGIS is used for reading the application state value and the space variable value (namely the scores of all indexes) corresponding to the Sample points to obtain an original training set X;

and (3) unifying the grid data of the evaluation and grading results of each index into a tif format, and inputting the tif format into a random forest model to obtain a space variable data set.

As a further improvement of the invention, the random forest model uses Python as a programming language, adopts a machine learning open source tool package Scikit-learn, and inputs an original training set X to train and generate the random forest model.

As a further improvement of the present invention, in the step S5, precision evaluation is performed on the demarcation result using the user precision, the producer precision, the overall precision and the Kappa coefficient (K), and accuracy and reliability of town development boundary demarcation are verified.

As a further improvement of the present invention, the calculation of User's Accuracy (UA), producer's Accuracy (PA), and Overall Accuracy (OA) is based on the concept of True Positive (TP), true Negative (TN), false Positive (FP), false Negative (FN) in statistics, and the specific formulas are as follows:

wherein TP is the number of grids belonging to the town development boundary in the town development boundary reference data and divided into the town development boundary by the model, TN is the number of grids belonging to other areas in the town development boundary reference data and divided into other areas by the model, FP is the number of grids belonging to other areas in the town development boundary reference data and divided into the town development boundary by the model, and FN is the number of grids belonging to the town development boundary in the town development boundary reference data and divided into other areas by the model.

As a further improvement of the invention, the step big data comprise basic geographical data and earth observation big data, wherein,

the base geographic data includes spatial survey data, demographic survey data, industry survey data, geological disaster data, meteorological data, and soil data;

the ground observation big data comprise digital elevation DEM data, night light image data and vegetation coverage data.

As a further improvement of the present invention, the big data further includes human behavior big data including POI/AOI data, microblog check-in data, and house price data.

Compared with the prior art, the invention has the beneficial effects that:

according to the invention, based on basic geographic data such as space investigation data and the like, the geographic big data such as night light, microblog check-in and house land price data and the like are integrated into multi-source geographic big data, a UGB suitability index system is constructed by selecting comprehensive and reasonable factors from three dimensions of nature, humanity and ecology, a manually drawn boundary is used as a reference, a machine learning method is used for determining weights, a UGB scientific demarcating method is used, the importance of the geographic big data in UGB research is verified, and the method has certain reference significance and value for UGB related research.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of a partial index suitability evaluation grading result, wherein abc represents three indexes of nature, humanity and ecology;

FIG. 3 is a graph of index contribution;

FIG. 4 is a schematic diagram of classification and threshold determination;

FIG. 5 is a schematic illustration of town development boundary delineation results;

FIG. 6 is a diagram comparing UGB reference data with experimental demarcation results.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1-6, the present invention provides the following technical solutions: a town development boundary demarcation method integrating geographic big data and machine learning comprises the following steps:

s3, constructing a random forest model to determine each index weight;

s5, taking different suitability scores as thresholds for dividing town development boundaries, discussing the total precision and Kappa coefficient of the town development boundaries defined under different thresholds, selecting the score with the highest precision to determine the threshold for dividing the town development boundaries in the study, wherein the grid suitability score is larger than the threshold for dividing the town development boundaries, and the score is smaller than the threshold and is not suitable for dividing the town development boundaries, so that the final town development boundary dividing result of the experiment is obtained.

The invention researches the Changsha city, uses 300 x 300m grids as basic evaluation units defined by town development boundaries (UGB)The eastern part of Hunan province in the Shake city is far north, 9 areas (county and city) are governed by the whole city in the transition zone from the hilly to the plain of the beach lake in the Hunan province, and are respectively a cottage area, a sky-heart area, a Yuenu area, a Fu area, a Yuhua area, a city-looking area, a Changsha county area, a Liuyang city and a Ningxiang city, and the total area is 11816.14km ² The population 1023.93 thousands of people are resident at the end of 2021, and the town rate reaches 83.16 percent. The natural condition is superior, the subtropical monsoon humid climate area is clear in four seasons, the rain is abundant, the annual precipitation is 1300-1600mm, the river system in the market mostly belongs to the Xiangjiang river basin, the water resource is rich, and the total water storage is 30 hundred million m ³ . The mountain area, hills and plains in the great relief of the Hunan Liuzhou basin at the great sand are respectively 30.7%, 19.3%, 28.6% and 21.4% in the area, yue Lushan in the city is the E' xi stand-off, the Liuyang river, 36178, 36836, the east, the Hunan river and the water pass through the city, the orange continent is quiet and lying in the heart, the mountain, the water, the continent and the city are integrated. As Hunan province meeting, GDP in 2022 is expected to increase by 4.8%, and the middle part of the position column is the first, which is the central city in the middle part of China, the middle-stream city group in the Yangtze river and the important node city in the Yangtze river economic zone. Today in the rapid urban and economic rapid development period, the area of the built-up area is 272.39km from 2010 ² Up to 2020 km 560.8 ² 。

Firstly, big data are acquired, specifically:

basic geographic data

(1) Spatial survey data: 2019 "third national investigation" decryption vector data (bottom summary).

(2) Demographic data, demographic data of each region (county, city) of the long sand was obtained from the seventh national demographic data.

(3) Industry investigation data, and second industry value-added data obtained by statistics of yearbook of each district (county, city) in 2020.

(4) Geological disaster data, and geological disaster distribution point position data of landslide, collapse, debris flow and the like in 2018.

(5) Weather data, PM of 2020 Changsha ten sites _2.5 And PM ₁₀ Annual average concentration monitoring data.

(6) Soil data, which is derived from a Chinese soil data set (v 1.1) of a world soil database (HWSD), and estimates soil erosion sensitivity (corrodibility) and soil organic matter content according to field information such as soil names and soil organic carbon content;

earth observation big data

(1) Digital elevation DEM data, ASTER GDEM V global digital elevation data at 30m resolution.

(2) Night light image data, NPP-VIIRS novel night light data with resolution of 500 meters in 2020.

(3) Vegetation coverage data, obtaining Landsat remote sensing data in 2020, and calculating a normalized vegetation index (NDVI) of 30 meters;

human behavior big data. And carrying out data cleaning, coordinate correction, manual spot check and other works on the large human behavior data obtained by crawling so as to ensure the reliability and the precision of the data.

(1) POI/AOI data: the system is characterized by crawling from a hundred-degree map and a public comment, and comprises infrastructure POI data such as a high-speed crossing, a railway station, a passenger station, a subway, a bus station and the like, development area and industrial park POI data representing industrial prospect factors, forest geology parks, scenic spots and natural protection area AOI data, and business service industry POI data such as food, life service, leisure and entertainment, sports and fitness, shopping and beauty.

(2) And the microblog registration data are obtained by adopting a crawler technology, and the microblog registration point data of the sandy city of 7 months in 2020.

(3) The house price data is obtained by crawling 2021-9-year house rent information from peasants, and by crawling 58 the store rent information from the same city, and the space coordinate information is determined according to the house source name and the like, so that the economic condition in the research area is represented.

The training sample and verification sample data used are derived from UGB reference data (UGB, hereinafter referred to as UGB) _r ). The research takes town development boundaries manually drawn in 2021 as UGB reference parameters according to land requirements, policy requirements, planning targets, ecological agriculture current situation, natural ground conditions and the like by the Changsha city natural resources and planning bureau, and the area is about 1722km ² Delineating influence factors as UGBTraining sample data for weight determination. Since 300 x 300m grids are used as basic evaluation units, UGB is removed by adopting an ArcGIS drawing comprehensive elimination tool _r Medium area less than 0.09km ² Is converted into a raster data representation.

After pretreatment such as data cleaning is carried out on a large amount of collected multi-source geographic big data, a natural, ecological and humane multi-element index system is constructed, various methods such as GIS space analysis, landscape pattern analysis and the like are adopted to carry out space quantization on each index factor, and then index scoring results and UGB are used for carrying out space quantization _r And determining each index weight by using a random forest model, and finally, comprehensively evaluating by multi-factor superposition analysis to obtain a UGB suitability result, thereby defining the UGB. The specific technical flow is shown in figure 1;

index factors are constructed from three dimensions of nature, ecology and humane, UGB suitability is analyzed from a single factor angle, and scoring standards of each evaluation index are obtained, on the basis of existing research and literature, relevant specialists are actually and consulted in the research area, and a finally constructed index system is shown in table 1.

(1) The natural factors comprise 4 factors such as terrain, water area, landscape pattern, geological disasters and 7 indexes such as gradient and elevation. The landscape pattern indexes comprise landscape indexes such as plaque AREA (AREA), gyration radius (GYRATE), SHAPE index (SHAPE) and the like, and the landscape pattern of the current construction land is calculated based on a Fragstats platform;

(2) The human factors comprise 6 factors such as traffic conditions, zone positions, public/business facilities, population, economic (land price) industry prospects, land use conditions and 15 indexes such as distance from an adjacent airport, subway/bus station density, store rentals, economic vitality (night light indexes) and the like, wherein the micro-blog sign-in data are used for indicating the activity intensity of human beings, the night light intensity is used for indicating the economic vitality, and the residential rentals and the store rentals are used for indicating the economic capacity;

(3) The ecological factors comprise 5 factors such as ecological land conditions, cultivated land conditions, soil, vegetation abundance, atmospheric environment and the like, cultivated land area ratio, soil organic matter content and normalized vegetation index NDVI and PM _2.5 Concentration meansNumber, etc. 10 indexes. Based on the air quality monitoring site data, air quality grid data in the range of the research area are obtained by adopting Kriging interpolation.

Dividing a research area into 133533 grids of 300 x 300m for evaluation, performing spatial analysis on each index factor in an ArcGIS platform, dividing four score grades (1, 2, 3 and 4 are respectively assigned from low to high by the proper value of UGB) according to a natural breakpoint method and combining practical conditions, and obtaining grading results (table 1) of each index grid grade so as to carry out weighted superposition analysis after the index weights are determined subsequently. Town development boundary suitability scores for some of the metrics are shown in FIG. 2.

To construct a random forest model to determine the index weights, UGB is first used _r The model is trained. And (5) assigning 1 to the region with UGB attribute in the training data and 0 to the other regions to serve as the land state value. The method comprises the steps of respectively extracting Sample points of UGB (user generated) and other areas in training data by using a random hierarchical sampling method, obtaining space coordinates of the Sample points, reading an application state value and a space variable value (namely scores of all indexes) corresponding to the Sample points by using a Sample function of ArcGIS (geographic information system), obtaining an original training set X, unifying grid data of evaluation hierarchical results of 31 indexes in an index system into a tif format, inputting the uniform format into a random forest model to obtain a space variable data set, wherein the random forest is a combined algorithm taking a decision tree as a base classifier, the decision tree is a tree prediction model consisting of nodes and directed edges, randomly extracting mtry prediction variables from the original variables as candidate variables of the splitting nodes in each internal node, splitting the nodes in the best splitting mode, and repeating the operation until a ntree decision tree is generated.

The Python is used as a programming language, a machine learning open source tool package Scikit-learn is adopted, an original training set X is input for training to generate a random forest model, and in the training process, the two parameters ntree and mtry are required to be set. The mtry is taken as the predicted variable number randomly extracted from the original variable by each node, and the variable with the most classification capability is selected from the predicted variable number to carry out node splitting; meanwhile, the random forest combines predictions of a plurality of decision trees, and a final prediction result is obtained through voting, so that the arrangement of mtry and ntree can influence the prediction precision, OOB in the random forest can be used as a test sample to be evaluated internally without cross-checking or using an independent test set, OOB error rate is an unbiased estimation of the generalization error of the random forest, and the research uses the performance of an OOB error rate estimation model as an index for measuring and calculating the classification precision. The prediction accuracy is improved by continuously adjusting to determine proper parameters, and the ntree is set to be 100 and the mtry is set to be 16 in combination with the actual situation of a computer and the calculation speed of a model.

After an evaluation index system and 1-4-point suitability classification results of each index are obtained through experiments, multi-factor superposition analysis is carried out according to index weights determined by a random forest model, and the suitability scores of each index of each grid are weighted and superposed according to the index weights by using an ArcGIS tool to obtain final UGB suitability evaluation scores of each grid in a research area, so that the suitability degree of each grid defined as UGB is comprehensively evaluated.

And carrying out precision evaluation on the demarcation result by adopting the precision of a user, the precision of a producer, the overall precision and the Kappa coefficient (K) so as to verify the accuracy and the reliability of UGB demarcation. Based on the concepts of True Positives (TP), true Negatives (TN), false Positives (FP), false Negatives (FN) in statistics, the formulas of User's Accuracies (UA), producer's Accuracies (PA), and Overall Accuracies (OA) are as follows:

wherein: TP is UGB _r The number of grids belonging to UGB and divided into UGB by a model, and TN is the number of grids in UGB _r The number of grids belonging to other areas and divided into other areas by the model, FP is UGB _r In other regions but the model is divided into UGB, FN is UGB _r The number of grids belonging to UGB but divided into other areas by the model.

The method comprises the steps of taking different suitability scores as thresholds for dividing UGB, discussing the overall precision and Kappa coefficient of UGB defined under different thresholds, selecting the score with the highest precision as the threshold for dividing UGB of the study, determining the grid suitability score as UGB when the score is larger than the threshold, and determining UGB when the score is smaller than the threshold, and obtaining the final UGB definition result of the experiment, and simultaneously calculating the precision of a user and the precision of a producer to perform result auxiliary verification.

About 10% of sample points are randomly layered and extracted, namely 2000 development boundary sample points and 10000 illegal boundary sample points, so that a training set is formed, and a space variable data set is obtained based on each index data after evaluation. The weights of the indexes are measured by using the trained random forest model, and the result is shown in figure 3, wherein the total weight of the natural elements is 9.13, the total weight of the human elements is 68.03, and the total weight of the ecological elements is 22.84.

The analysis shows that the human factors play a larger role in UGB demarcation than the natural and ecological factors, and the weight occupation is higher. Among the 32 indexes determined by experiments, the weight ratio of the urban land density and the economic viability is the highest and exceeds 10%, and meanwhile, people pay attention to the passenger stations at the high-speed crossing, urban village road land density, city county centers, commercial service facility density, resident population density, house and shop rentals, soil organic matter content and PM ₁₀ The index ratio is more than 3%, which shows that the factors have larger influence on urban construction. The method is reflected on a secondary index, and the economic (land price) level, the construction land condition, the traffic, the location condition, the public business facilities, the population distribution and the soil condition are important to urban development, so that the geographical big data can be seen from the index weight ratio to make a certain contribution in UGB demarcation. The economic development level is an engine for urban construction, so that night light data and house land price data for representing economic activity have remarkable advantages in estimating UGB range; the 'people' are the root of urban development, and the micro-blog sign-in data is utilized to focus population activity distribution, so that the directivity effect on UGB (UGB) demarcation is achieved. In addition, the more perfect the public facilities such as traffic and business wear, the more likely it is to be converted into urban construction land, and the POI data such as infrastructure is in UGBPlays a key role in demarcation.

When grading the final UGB suitability evaluation score, the number and the grading threshold value of the grading have influence on the accuracy of town development boundary division. The classification is performed by using a natural breakpoint method, and the method is based on statistics so as to find the breakpoint classification by using the principle that the intra-group variance is minimum and the inter-group variance is maximum. Setting the number of grades to 4, 5 and 6 to find the optimal grade number (fig. 4 abc), and taking the highest grade of the grades as town development boundary according to the experimental result, wherein when the grade is classified to 4 grades, the production precision PA is 85.16%, and the user precision UA is 70.08%; 92.07% and 93.32% when divided into 5 stages; similarly, the content of the components in the grade 6 is 78.2 percent and 82.8 percent. With this accuracy, the fitness score is divided into 5 classes.

After 5 grades are determined, further discussion is conducted on threshold values among the grades, the threshold values among the grades are adjusted according to UGB reference data based on the breakpoint of a natural breakpoint method, and the demarcation range of a development boundary and the balance among the number of grids of each grade are considered in the adjustment process, so that the demarcation precision is improved. After the threshold value is determined, the threshold values from the highest level to the lowest level are respectively 2.68, 2.17, 1.76 and 1.63, the most suitable to the most unsuitable grids of the corresponding levels after classification respectively account for 5.19%, 7.48%, 44.81%, 25.71 and 16.80%, the production precision is 93.89% after calculation, the user precision is 95.03%, the precision is higher than that of classification by directly using a natural breakpoint method, and the accuracy of demarcation of town development boundaries is improved.

And carrying out suitability comprehensive evaluation on the research area by using the constructed evaluation index system and index weight to obtain suitability scores of UGB defined by each grid, and dividing the grids into five grades by using a natural breakpoint method to draw a suitability score grading graph so as to more intuitively display comprehensive evaluation results, thereby more clearly recognizing the suitability condition of UGB in the long-time market. And taking each suitability score as a threshold for dividing UGB, discussing the overall precision and Kappa coefficient of different results, finding out the score corresponding to the result with the highest demarcation precision, and determining the score as the threshold for dividing UGB in the experiment. The results show that the threshold value is approximately within the range of optimum and more optimum scores, so that the 2-3 subareas are selected for refinementAnd (3) degree verification, namely narrowing the range to 2-2.5 partitions through the accuracy results of 2, 2.5 and 3 partitions, and finally carrying out statistical analysis at intervals of 0.01 partition between 2.1-2.2 partitions to obtain that the OA is 93.71% at most and the K is 0.7575 at most when the OA is 2.16 partition as a threshold value. Finally, the statistical results are integrated, and when the suitability score is 2.15, the OA and the K values are closest to the highest point (the OA is 93.69 percent and the K is 0.7573), so that the suitability score is 2.15 and is divided into thresholds for dividing UGB. The UGB (FIG. 5) is defined by this (for convenience of description, the UGB defined by the present study based on multi-source geographic big data experiments is defined as "UGB with geographic big data", hereinafter referred to as UGB) _g ) The total area is about 1436km ² . The accuracy auxiliary verification of the calculation producer and the user can obtain 90.07% of PA, 92.59% of UA, and smaller error of error division and omission division, and more than 93% of UGB is correctly identified by the research, so that the UGB demarcation index system is reasonable and the model accuracy is higher.

Delineating the experiment to result UGB _g With reference data UGB _r Comparing (FIG. 6), it was observed that the overall morphology was consistent, UGB was not suitable for urban development due to the eastern portion of Liuyang city belonging to the Daguanshan-Jinling mountain region, the middle portion of Liuyang city belonging to the cloud mountain region, and the western portion of Ningxiang city belonging to the mountain region, all of which were not suitable for urban development in ecological space _g Mainly concentrated in the six areas and one county of the long-sand urban area. The boundary range defined by the study is greater than UGB _r The experimental results are smaller, and are mainly not divided into development boundaries in the eastern region, the middle southern region and the middle foot region of the urban area, and meanwhile, the experimental results are scattered in small-area regions of Ningxiang and Liuyang cities, and cannot be shown in UGB demarcation.

TABLE 1 town development boundary delineating index and scoring criteria therefor

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A town development boundary demarcation method integrating geographic big data and machine learning is characterized in that: the method comprises the following steps:

s3, constructing a random forest model to determine each index weight;

2. The town development boundary delineating method integrating geographic big data and machine learning as claimed in claim 1, wherein: the natural factors comprise four factors including topography, water area, landscape pattern and geological disasters;

the terrain comprises two indexes of gradient and elevation;

the water area comprises three indexes of the distance between the water area and the river, the density of the water area in the grid and the distance between the water area and the lake and the reservoir;

the landscape pattern comprises an index of comprehensive landscape indexes;

the geological disaster comprises an index of the distance from the ground disaster point;

and the landscape pattern index is used for selecting plaque area, gyration radius and shape index, and calculating the landscape pattern of the current construction land based on the Fragstats platform.

3. The town development boundary delineating method integrating geographic big data and machine learning as claimed in claim 1, wherein: the ecological elements comprise 5 factors of ecological land conditions, cultivated land conditions, soil, vegetation abundance and atmospheric environment;

the ecological land condition comprises three indexes of the distance from a forest park, a scenic spot and a natural protection area and the distance from the ecological land, and the ecological land coverage rate comprises forests, gardens, grasses and wetlands;

4. The town development boundary delineating method integrating geographic big data and machine learning as claimed in claim 1, wherein: the human factors include seven factors of traffic conditions, location, public/commercial facilities, population, economy, industry prospects and land use conditions;

the location comprises an index of distance from the city county center;

the economy comprises four indexes of second industry increment, house rent, shop rent and economic activity;

the land use condition comprises an index of urban land occupation ratio.

5. A town development boundary delineating method integrating geographic big data and machine learning as claimed in claim 1-4, wherein: the step S3 specifically comprises the following steps:

reading the application state value and the space variable value corresponding to the Sample point by utilizing the Sample function of the ArcGIS, namely, the scores of all indexes to obtain an original training set X;

6. The town development boundary delineating method integrating geographic big data and machine learning as claimed in claim 1, wherein: the random forest model uses Python as programming language, adopts a machine learning open source tool package Scikit-learn, and inputs an original training set X to train and generate the random forest model.

7. The town development boundary delineating method integrating geographic big data and machine learning as claimed in claim 1, wherein: in the step S5, precision evaluation is performed on the demarcation result by adopting the precision of the user, the precision of the producer, the overall precision and the Kappa coefficient (K), and the accuracy and the reliability of demarcation of the town development boundary are verified.

8. The town development boundary delineating method integrating geographic big data and machine learning as claimed in claim 7, wherein: the calculated user precision UA, the producer precision PA and the total precision OA are based on concepts of true positive TP, true negative TN, false positive FP and false negative FN in statistics, and the specific formulas are as follows:

9. The town development boundary delineating method integrating geographic big data and machine learning as claimed in claim 1, wherein: the step big data includes basic geographical data and earth observation big data, wherein,

10. The town development boundary delineating method integrating geographic big data and machine learning as claimed in claim 1, wherein: the big data also includes human behavioral big data including POI/AOI data, microblog check-in data, and house price data.