CN113191553A - Population space distribution estimation method and system based on building scale - Google Patents
Population space distribution estimation method and system based on building scale Download PDFInfo
- Publication number
- CN113191553A CN113191553A CN202110491470.2A CN202110491470A CN113191553A CN 113191553 A CN113191553 A CN 113191553A CN 202110491470 A CN202110491470 A CN 202110491470A CN 113191553 A CN113191553 A CN 113191553A
- Authority
- CN
- China
- Prior art keywords
- building
- data
- scale
- population
- density
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000009826 distribution Methods 0.000 title claims abstract description 63
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000007637 random forest analysis Methods 0.000 claims abstract description 29
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 23
- 238000013439 planning Methods 0.000 claims abstract description 14
- 238000004458 analytical method Methods 0.000 claims abstract description 7
- 238000004364 calculation method Methods 0.000 claims description 30
- 238000004590 computer program Methods 0.000 claims description 9
- 238000003860 storage Methods 0.000 claims description 9
- 238000003066 decision tree Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 238000012549 training Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 238000005070 sampling Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000012271 agricultural production Methods 0.000 description 1
- 238000010420 art technique Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007636 ensemble learning method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000002420 orchard Substances 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000001681 protective effect Effects 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 239000002689 soil Substances 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/08—Construction
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Development Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Quality & Reliability (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Operations Research (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Game Theory and Decision Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Educational Administration (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- General Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a method and a system for estimating population space distribution based on building dimension, belonging to the technical field of population space distribution, wherein the method comprises the following steps: obtaining classification data of a building, the classification data describing a category of the building; calculating a total area of the building from the classification data; and constructing a random forest algorithm model, and estimating the population number on the building scale according to the total area of the building. The invention realizes the estimation of the population number on the scale of the building by constructing a random forest algorithm model based on the classification data of the building, can be applied to the fields of social resource analysis, emergency evacuation, business decision, urban planning and the like, and has higher social value.
Description
Technical Field
The invention relates to the technical field of population space distribution, in particular to a population space distribution estimation method and system based on building dimensions.
Background
The population distribution data can reflect the natural conditions and the economic development level of a region, and the population space distribution data with fine scale can be applied to many fields, such as disaster management, resource allocation, smart city construction and the like. Traditional demographic data is often obtained by census, which, although very accurate, has some significant drawbacks: first, the cost is high and the time spent is long; second, census is often performed on a unit scale of administrative districts with low spatial resolution, and thus cannot fully represent the differences in demographics within an administrative district.
In order to better interact with other spatial data and truly reflect the distribution characteristics of the population on the space, it is necessary to use a highly automated method to estimate the population distribution on a fine scale.
Disclosure of Invention
The invention provides a method and a system for estimating population space distribution based on building scale, which are used for solving the problems that population estimation in the prior art is high in cost and long in time and cannot complete population distribution difference in a standard administrative district, and realizing fine population space distribution estimation.
The invention provides a method and a system for estimating population space distribution based on building dimension, comprising the following steps:
obtaining classification data of a building, the classification data describing a category of the building;
calculating a total area of the building from the classification data;
and constructing a random forest algorithm model, and estimating the population number on the building scale according to the total area of the building.
According to the building scale-based population space distribution estimation method, the acquiring classification data of the building comprises the following steps:
superposing the acquired city planning data and building vector data to obtain building data, wherein the building data comprises city functional area data;
classifying the building data to obtain urban residence data and rural residence data;
and classifying the urban residential data based on the urban functional area data to obtain single residential building data, common residential building data and dense residential building data.
According to the building scale-based population space distribution estimation method, the urban functional area data comprise one or more combinations of urban planning data, building vector data, university campus area data and urban functional area product data.
According to the building scale-based population space distribution estimation method, the obtaining of classification data of the building comprises the following steps:
obtaining remote sensing data corresponding to a building through a satellite;
and obtaining classification data of the building based on the remote sensing data and according to the classification characteristics of the building.
According to the building scale-based population space distribution estimation method, the calculating the total area of the building according to the classification data comprises the following steps:
calculating the POI density on the grid of the classified data, wherein the calculation formula is as follows:
where n is the number of meshes intersecting the ith cell, pjIs the POI density, c, of the jth mesh intersecting cell ijIs the area of intersection with the cell i and the jth grid;
calculating a vegetation coverage area ratio based on the POI density, wherein the calculation formula is as follows:
wherein G isiPoint of interest POI density, a, representing the ith celliRepresenting the vegetation area of the ith cell, i.e. the floor area of the building, biRepresents the total area of the ith cell;
calculating the total area of the building according to the vegetation coverage area ratio, wherein the calculation formula is as follows:
Ai=aifi;
wherein A isiIs the total area of the building, aiIs the floor area of the building, fiIs the number of floors in the building.
According to the building scale-based population space distribution estimation method, the POI density is calculated on the grid of the classification data, and the method comprises the following steps:
respective point-of-interest POI densities are calculated using kernel density analysis on a mesh having a resolution of 30m, each point-of-interest POI density being an average point-of-interest POI density of the mesh with which it intersects.
According to the building scale-based population space distribution estimation method, the calculating the total area of the building according to the classification data further comprises:
and calculating a Pearson correlation coefficient between the POI density and the population density according to the POI density, wherein the calculation formula is as follows:
wherein, XiPoint of interest POI density, Y, representing the ith celliRepresents the population density of the ith cell;
calculating a vegetation coverage area ratio based on the Pearson correlation coefficient;
wherein the Pearson correlation coefficient represents the degree of correlation between the POI density and the population density.
The invention also provides a system for estimating the population space distribution based on the building scale, which comprises the following components:
the classification module is used for acquiring classification data of the building, and the classification data is used for describing the category of the building;
a calculation module for calculating the total area of the building according to the classification data;
and the estimation module is used for constructing a random forest algorithm model and estimating the population number on the building scale according to the total area of the building.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the building scale-based population space distribution estimation method.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the building-scale based population space distribution estimation method as described in any of the above.
According to the method and the system for estimating the population space distribution based on the building scale, the population number on the building scale is estimated by constructing the random forest algorithm model based on the classification data of the building, and the method and the system can be applied to the fields of social resource analysis, emergency evacuation, business decision, city planning and the like and have higher social value.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method for estimating a spatial distribution of a population based on a building scale according to the present invention;
FIG. 2 is a schematic diagram of a process for obtaining classification data according to the present invention;
FIG. 3 is a schematic flow chart of the present invention for calculating the total area of a building;
FIG. 4 is a schematic diagram of classification data of a building provided by an embodiment of the present invention;
FIG. 5 is a diagram illustrating population extraction results provided by an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a building-scale-based population space distribution estimation system provided by the present invention;
FIG. 7 is a schematic structural diagram of an electronic device provided by the present invention;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," and the like in the description and in the claims, and in the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein.
Urbanization in china is rapidly developing, resulting in bringing resources and elements into cities. Urbanization and human socioeconomic activities affect and change the geographic landscape of cities, creating different types of functional areas of urban landscape (e.g., commercial, residential, industrial, green, and water). The data of the urban landscape functional area is important for analyzing the urban spatial pattern, revealing the urbanization process, evaluating the urban ecological environment and promoting urban land planning and sustainable development.
The existing urban land coverage/utilization data cannot completely reflect the urban landscape function, the spatial resolution is low, and local detailed information cannot be provided, so that the development of fine-grained urban research is limited. In order to better interact with other spatial data and truly reflect the distribution characteristics of the population on the space, it is necessary to use a highly automated method to estimate the population distribution on a fine scale.
The building-scale-based population space distribution estimation method and system of the present invention are described below with reference to fig. 1-7.
Fig. 1 is a schematic flow chart of a method for estimating a spatial distribution of a population based on a building scale according to the present invention, as shown in fig. 1. A building-scale-based population spatial distribution estimation method, comprising:
Currently, methods commonly used for population estimation studies include symmetric mapping, multivariate regression, and multifactor fusion (Batista, Gallego and Lavalle, 2013; Steven et al, 2015; Zeng et al, 2011). These studies generally take into account the effects of natural environmental factors and socioeconomic factors (Yang et al, 2019), and estimate the population on a grid scale (balakrinhan, 2019) or a building scale (Han et al, 2019).
Estimating population distributions on a grid scale often requires consideration of the influence of the scale effect, different research goals and regions require different grid scales, and the grid itself can destroy the actual geographic boundaries. The method for solving the problem is to estimate the number of human mouths on the building scale, but most of the estimation methods do not take the spatial heterogeneity caused by different building types into consideration at present.
Due to the spatial diversity of building classes, different types of buildings have different population densities, for example, rural residential buildings typically have a lower population density than most urban residential buildings. Therefore, the invention estimates the human mouth number on the building scale and considers the spatial heterogeneity brought by the building category, thereby realizing the estimation of the human mouth number on the building scale.
And 102, calculating the total area of the building according to the classification data.
In addition, the influence of natural environmental factors and social and economic factors on human mouth space distribution is also considered, the natural environmental factors mainly include the factors of landform, vegetation and the like, and the total area of the building can be calculated through the factors of the landform, the vegetation and the like. The social and economic factors are mainly factors of traffic, infrastructure, service industry and the like.
Preferably, the method can use Point of Interest (POI) data and vegetation coverage vector data to describe the social geographic environment of the residential house, and the extracted social geographic environment characteristic population density is related to correlation coefficients (such as pearson correlation coefficients) and vegetation coverage area ratios between POIs of different categories.
Preferably, the POI is a spatial feature with geographic identification, and includes information such as name, category, longitude and latitude, and can intuitively and effectively reflect spatial distribution conditions among city elements, and the POI is closely related to human activities and thus closely related to population distribution.
And 103, constructing a random forest algorithm model, and estimating the population number on the building scale according to the total area of the building.
The random forest algorithm model is a machine learning model, such as a neural network. The random forest algorithm model is a forest established in a random mode, the forest is composed of a plurality of decision trees (the decision trees are in a tree structure and can be binary trees or non-binary trees), and each decision tree of the random forest is not related. After a forest is obtained, when a new input sample enters, each decision tree in the forest is judged respectively, which type the new input sample belongs to is judged, and then which type is selected much more, and the sample is predicted to be which type.
The invention estimates the population number on the scale of a building by training a constructed random forest model on the scale of a preset area (such as a street).
The above steps 101 to 103 are described below.
Fig. 2 is a schematic flow chart of acquiring classification data according to the present invention, as shown in the figure. In the step 101, the obtaining classification data of the building includes:
Wherein the city planning data is divided by administrative regions, such as provinces, cities, districts, streets and the like.
Buildings are an important feature of urban areas, and high-resolution satellite images contain abundant shape structure and texture information of earth surface targets, so that the high-resolution satellite images become important data sources for urban research. The building vector data may be acquired by prior art techniques.
Preferably, the remote sensing data corresponding to the building can be obtained through a satellite, and the remote sensing data comprises city planning data and building vector data.
And 203, classifying the urban residential data based on the urban functional area data to obtain single residential building data, common residential building data and dense residential building data.
The city functional area data comprises one or more combinations of city planning data, building vector data, university campus area data and city functional area product data.
Therefore, classifying the building data into the individual residential building data, the general residential building data, the dense residential building data, and the rural residential building data facilitates estimating different population densities based on different types of residential buildings.
Fig. 3 is a schematic flow chart of calculating the total area of the building provided by the present invention, as shown in the figure. In the step 102, calculating the total area of the building according to the classification data includes:
where n is the number of meshes intersecting the ith cell, pjIs the POI density, c, of the jth mesh intersecting cell ijIs the area that intersects the cell i with the jth grid.
Preferably, the respective POI densities are calculated using a kernel density analysis on a mesh with a resolution of 30m, the POI density of each cell being the average POI density of the mesh it intersects, and each cell being either a certain building or a certain street.
wherein, XiPOI density, Y, representing the ith celliRepresenting the population density of the ith cell.
Pearson correlation coefficient (Pearson correlation coefficient) is used to reflect the degree of linear correlation between two random variables. RhoX,YIs between-1 and 1, and when the value is 1, two random variables (X) are representedi、Yi) The two are in a complete positive correlation; when the value is-1, the complete negative correlation relationship is shown between the two random variables; and when the value is 0, the linear independence between the two random variables is shown.
Through the calculation formula of the Pearson correlation coefficient, the correlation between the POI density and the population density of each unit can be calculated, and the POI density with the highest correlation of the Pearson correlation coefficient can be selected to calculate the vegetation coverage area ratio.
wherein G isiPoint of interest POI density, a, representing the ith celliRepresenting the vegetation area of the ith cell, i.e. the floor area of the building, biRepresenting the total area of the ith cell.
The vegetation coverage ratio is generally the ratio of forest area to total area of land, and is generally expressed as a percentage. According to the invention, POI density data and vegetation coverage vector data are used for describing the social and geographic environment of residential houses, and correlation coefficients and vegetation coverage area ratios between the extracted social and geographic environment characteristic population density and POI densities of different categories are obtained.
Ai=aifi;
wherein A isiIs the total area of the building, aiIs the floor area of the building, fiIs the number of floors in the building.
From the above calculations, each unit possesses (1) a park density; (2) research institution density; (3) catering density; (4) density of tourist attractions; (5) (ii) recreational density; (6) density of landmarks; (7) market density; (8) traffic service quality; (9) a public quality of service; (10) vegetation coverage ratio; (11) the area of a single residential building; (12) the area of a common residential building; (13) the dense residential building area and the rural residential building area have 13 characteristics.
Specifically, the 13 street scale features obtained by the calculation are used as independent variables of the random forest model, and the population number on the street is used as a dependent variable of the random forest model for training and testing. The random forest algorithm is a typical ensemble learning method, and improves the fitting capability of a model by integrating a plurality of decision trees.
The decision tree is a machine learning classification regression method, and divides samples by selecting features which can maximize the current gain at each node through training data. A commonly used information gain calculation method is information entropy, and the calculation formula is as follows:
where pi represents the probability of a certain class occurring. If the samples input into the current node have characteristics [ f _1, f _ 2., f _ (n-1), f _ n ], the information gain of each characteristic is calculated respectively, and the characteristics which can enable the information gain of the current node to be maximum are selected as the division characteristics to classify the samples until a threshold condition set by an algorithm is reached.
The random forest algorithm is an integration of decision tree algorithms, and commonly used integration methods include Bagging (Bootstrap aggregation algorithm) and Boosting (Boosting method). The method uses a Bagging method, namely, a sample is randomly taken out from a data set of m samples and put into a sampling set, and then the sample is put back into an initial data set, so that the sample is possibly selected in the next sampling, the sampling set of m samples is obtained through m rounds of random sampling, m decision trees are obtained through calculation by using the m sample sets, and the classification results of the m decision trees are averaged to be used as the final prediction result of the random forest.
In the invention, the characteristics of the street and the building two-scale units are respectively calculated, and as the population distribution data of a single building scale is not easy to obtain and the population distribution data of the street scale is relatively easy to obtain, the unit characteristics of the street scale are used for training and testing the random forest model to predict the unit characteristics of the building scale.
In the training stage, 13 features of each street under the street scale are used as independent variables to be input into a random forest model, the population number of each street is used as a value to be predicted and output by the random forest, and the model is trained. And in the prediction stage, taking 13 features of each building under the building scale as input of a trained random forest model, and predicting the population number of each building.
The method and system for estimating the spatial distribution of the population based on the building scale according to the present invention are described in an embodiment below.
Taking the estimation of population distribution in a certain city as an example, the population quantity of residential buildings in the certain city is estimated by using multi-source spatial data such as building vector data, functional area dividing data, university campus boundary data, POI data and population distribution data in the certain city.
For example, the functional area classification data of a city may be divided into 12 types according to actual needs, as shown in the following table:
ID | class of class | Definition of |
1 is provided with | Woodlands | Forest land, grassland, etc |
2 are provided with | Water (W) | Natural and artificial water body |
3 | Has not been developed | Undeveloped land and bare soil in towns and villages |
4 | Transportation of | Urban roads, traffic facilities, etc |
5 | Of green colour | Public recreation land for park and protective green land |
6 | Of industrial interest | Industrial, mining, storage |
7 | Of mechanisms | Administration, culture, education, sports, hygiene, etc |
8 | Of commerce | Business and entertainment, etc |
9 | Residence 1 | Low-rise residence |
10 | Residence 2 | Multi-storey, middle-and high-rise residence |
11 | Residence 3 | Shed area, rural home base, etc |
12 | Agricultural production | Farmland, paddy field, orchard and the like |
Step one, building data of a certain city is obtained, such as city functional area data, building vector data, university campus data and city planning data of the certain city.
And step two, classifying the residential buildings in a certain city according to the acquired building data in the certain city to obtain data (shown in figure 4) of four categories, namely a single residential building, a common residential building, a dense residential building and a rural residential building, wherein the remote sensing images of the four residential areas are sequentially shown from a to d in figure 4. From fig. 4, it can be observed that there are great differences between residential buildings, and the population distributions have strong spatial heterogeneity, which indicates that the classification of residential buildings is necessary.
And step three, calculating each living index of the POI by using POI density data and vegetation coverage data of a certain city, including calculating the total area of the building.
The distribution density of the various POIs is calculated using kernel density analysis on a mesh with a resolution of 30m, the POI density for each cell (street or building) being the average POI density of the mesh it intersects, as follows:
where n is the number of meshes intersecting the ith cell, pjIs the POI density, c, of the jth mesh intersecting cell ijIs the area that intersects the cell i with the jth grid.
The pearson correlation coefficient between each type of POI and population density is calculated as follows:
wherein, XiPOI density, Y, representing the ith celliRepresenting the population density of the ith cell.
Calculating the vegetation coverage area ratio of each unit as follows:
wherein G isiDenotes the POI density of the ith cell, aiIndicates the vegetation area of the ith cell, biRepresenting the total area of the ith cell.
Calculating the attribute of the building, mainly calculating the total area of the building according to the building vector data, as follows:
Ai=aifi;
wherein A isiIs the total area of the building, aiIs the floor area of the building, fiIs the number of floors in the building.
Because the constructed random forest algorithm model is trained by using street scale data, and regression estimation is performed by using building scale population distribution data, each index of the random forest algorithm model needs to be calculated under two unit scales during calculation, namely, each unit contains 13 indexes for measuring the living environment through the calculation formula, as follows:
using POI data to calculate: (1) park density; (2) research institution density; (3) catering density; (4) density of tourist attractions; (5) (ii) recreational density; (6) density of landmarks; (7) market density; (8) traffic service quality; (9) a public quality of service;
calculating by using vegetation coverage data to obtain: (10) vegetation coverage ratio;
the data of the city functional area and the like are used for calculation to obtain: (11) the area of a single residential building; (12) the area of a common residential building; (13) dense residential building area and rural residential building area.
And step four, establishing a random forest algorithm model, training and testing by using data of street scales, and predicting and estimating population distribution density on the scale of the building.
Specifically, a random forest algorithm model is written by using Python or R Language (The R Programming Language), 13 features and population distribution of each street in The city under The street scale are respectively used as independent variables and dependent variables of The model, model parameters are adjusted and trained, and The model parameters are stored for The finally trained model.
In the prediction stage, the trained model is used, 13 characteristics of each building in the city under the single building scale are used as independent variable input of the model, and the population number of each building is predicted.
Through the calculation of the steps, the average absolute percentage error of 19% is finally obtained, and the effect is good. The calculation results are the population density estimation results of multi-span residential buildings, ordinary residential buildings, dense residential buildings and rural residential buildings respectively as shown in fig. 5.
The following describes the building-scale-based population space distribution estimation system provided by the present invention, and the building-scale-based population space distribution estimation system described below and the building-scale-based population space distribution estimation method described above can be referred to correspondingly.
Fig. 6 is a schematic structural diagram of a building-scale-based population space distribution estimation system provided by the present invention, as shown in the figure. A building-scale-based demographic spatial distribution estimation system 600 includes a classification module 610, a calculation module 620, and an estimation module 630. Wherein the content of the first and second substances,
the classification module 610 is configured to obtain classification data of the building, where the classification data is used to describe a category of the building.
A calculating module 620, configured to calculate a total area of the building according to the classification data.
And the estimation module 630 is used for constructing a random forest algorithm model and estimating the population number on the building scale according to the total area of the building.
Preferably, the classification module 610 is further configured to perform the following steps:
superposing the acquired city planning data and building vector data to obtain building data, wherein the building data comprises city functional area data;
classifying the building data to obtain urban residence data and rural residence data;
and classifying the urban residential data based on the urban functional area data to obtain single residential building data, common residential building data and dense residential building data.
Preferably, the city functional area data includes one or more combinations of city planning data, building vector data, university campus area data, and city functional area product data.
Preferably, the classification module 610 is further configured to perform the following steps:
obtaining remote sensing data corresponding to a building through a satellite;
and obtaining classification data of the building based on the remote sensing data and according to the classification characteristics of the building.
Preferably, the calculating module 620 is further configured to perform the following steps:
calculating the POI density on the grid of the classified data, wherein the calculation formula is as follows:
where n is the number of meshes intersecting the ith cell, pjIs the POI density, c, of the jth mesh intersecting cell ijIs the area of intersection with the cell i and the jth grid;
calculating a vegetation coverage area ratio based on the POI density, wherein the calculation formula is as follows:
wherein G isiPoint of interest POI density, a, representing the ith celliRepresenting the vegetation area of the ith cell, i.e. the floor area of the building, biRepresents the total area of the ith cell;
calculating the total area of the building according to the vegetation coverage area ratio, wherein the calculation formula is as follows:
Ai=aifi;
wherein A isiIs the total area of the building, aiIs the floor area of the building, fiIs the number of floors in the building.
Preferably, the calculating module 620 calculates the respective POI densities on the mesh with the resolution of 30m by using the kernel density analysis, and each POI density is an average POI density of the mesh intersected with the POI density.
Preferably, the calculating module 620 is further configured to perform the following steps:
and calculating a Pearson correlation coefficient between the POI density and the population density according to the POI density, wherein the calculation formula is as follows:
wherein, XiPoint of interest POI density, Y, representing the ith celliRepresents the population density of the ith cell;
calculating a vegetation coverage area ratio based on the Pearson correlation coefficient;
wherein the Pearson correlation coefficient represents the degree of correlation between the POI density and the population density.
Fig. 7 illustrates a physical structure diagram of an electronic device, and as shown in fig. 7, the electronic device may include: a processor (processor)710, a communication Interface (Communications Interface)720, a memory (memory)730, and a communication bus 740, wherein the processor 710, the communication Interface 720, and the memory 730 communicate with each other via the communication bus 740. The processor 710 may invoke logic instructions in the memory 730 to perform the building-scale-based demographic spatial distribution estimation method, comprising:
obtaining classification data of a building, the classification data describing a category of the building;
calculating a total area of the building from the classification data;
and constructing a random forest algorithm model, and estimating the population number on the building scale according to the total area of the building.
In addition, the logic instructions in the memory 730 can be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, enable the computer to perform the building scale-based population space distribution estimation method provided by the above methods, including:
obtaining classification data of a building, the classification data describing a category of the building;
calculating a total area of the building from the classification data;
and constructing a random forest algorithm model, and estimating the population number on the building scale according to the total area of the building.
In yet another aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the building scale-based population spatial distribution estimation method provided in the above aspects, including:
obtaining classification data of a building, the classification data describing a category of the building;
calculating a total area of the building from the classification data;
and constructing a random forest algorithm model, and estimating the population number on the building scale according to the total area of the building.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A method for estimating a spatial distribution of a population based on a building scale, comprising:
obtaining classification data of a building, the classification data describing a category of the building;
calculating a total area of the building from the classification data;
and constructing a random forest algorithm model, and estimating the population number on the building scale according to the total area of the building.
2. The building-scale-based population space distribution estimation method according to claim 1, wherein said obtaining classification data of buildings comprises:
superposing the acquired city planning data and building vector data to obtain building data, wherein the building data comprises city functional area data;
classifying the building data to obtain urban residence data and rural residence data;
and classifying the urban residential data based on the urban functional area data to obtain single residential building data, common residential building data and dense residential building data.
3. The building-scale based demographic spatial distribution estimation method of claim 2, the functional city zone data comprising one or more combinations of city planning data, building vector data, university campus zone data, and functional city zone product data.
4. The building-scale-based population space distribution estimation method according to claim 1, wherein the obtaining classification data of the building comprises:
obtaining remote sensing data corresponding to a building through a satellite;
and obtaining classification data of the building based on the remote sensing data and according to the classification characteristics of the building.
5. The building-scale-based population space distribution estimation method according to claim 1, wherein said calculating a total area of said building from said classification data comprises:
calculating the POI density on the grid of the classified data, wherein the calculation formula is as follows:
where n is the number of meshes intersecting the ith cell, pjIs the POI density, c, of the jth mesh intersecting cell ijIs the area of intersection with the cell i and the jth grid;
calculating a vegetation coverage area ratio based on the POI density, wherein the calculation formula is as follows:
wherein G isiPoint of interest POI density, a, representing the ith celliRepresenting the vegetation area of the ith cell, i.e. the floor area of the building, biRepresents the total area of the ith cell;
calculating the total area of the building according to the vegetation coverage area ratio, wherein the calculation formula is as follows:
Ai=aifi;
wherein A isiIs the total area of the building, aiIs the floor area of the building, fiIs the number of floors in the building.
6. The building-scale-based spatio-temporal distribution estimation method of claim 5, wherein calculating the POI density on the mesh of classification data comprises:
respective point-of-interest POI densities are calculated using kernel density analysis on a mesh having a resolution of 30m, each point-of-interest POI density being an average point-of-interest POI density of the mesh with which it intersects.
7. The building-scale-based demographic spatial distribution estimation method of claim 5, wherein the calculating a total area of the building from the classification data further comprises:
and calculating a Pearson correlation coefficient between the POI density and the population density according to the POI density, wherein the calculation formula is as follows:
wherein, XiPoint of interest POI density, Y, representing the ith celliRepresents the population density of the ith cell;
calculating a vegetation coverage area ratio based on the Pearson correlation coefficient;
wherein the Pearson correlation coefficient represents the degree of correlation between the POI density and the population density.
8. A building-scale-based demographic spatial distribution estimation system, comprising:
the classification module is used for acquiring classification data of the building, and the classification data is used for describing the category of the building;
a calculation module for calculating the total area of the building according to the classification data;
and the estimation module is used for constructing a random forest algorithm model and estimating the population number on the building scale according to the total area of the building.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the building-scale based demographic spatial distribution estimation method of any of claims 1 to 7.
10. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, performs the steps of the building-scale based demographic spatial distribution estimation method of any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110491470.2A CN113191553A (en) | 2021-05-06 | 2021-05-06 | Population space distribution estimation method and system based on building scale |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110491470.2A CN113191553A (en) | 2021-05-06 | 2021-05-06 | Population space distribution estimation method and system based on building scale |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113191553A true CN113191553A (en) | 2021-07-30 |
Family
ID=76984167
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110491470.2A Pending CN113191553A (en) | 2021-05-06 | 2021-05-06 | Population space distribution estimation method and system based on building scale |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113191553A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115238584A (en) * | 2022-07-29 | 2022-10-25 | 湖南大学 | Population distribution identification method based on multi-source big data |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106708962A (en) * | 2016-11-30 | 2017-05-24 | 中山大学 | Urban population distribution method based on building properties |
JP2019067224A (en) * | 2017-10-03 | 2019-04-25 | 日本電気株式会社 | Human flow pattern estimation system, human flow pattern estimation method, and human flow pattern estimation program |
CN109978249A (en) * | 2019-03-19 | 2019-07-05 | 广州大学 | Population spatial distribution method, system and medium based on two-zone model |
-
2021
- 2021-05-06 CN CN202110491470.2A patent/CN113191553A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106708962A (en) * | 2016-11-30 | 2017-05-24 | 中山大学 | Urban population distribution method based on building properties |
JP2019067224A (en) * | 2017-10-03 | 2019-04-25 | 日本電気株式会社 | Human flow pattern estimation system, human flow pattern estimation method, and human flow pattern estimation program |
CN109978249A (en) * | 2019-03-19 | 2019-07-05 | 广州大学 | Population spatial distribution method, system and medium based on two-zone model |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115238584A (en) * | 2022-07-29 | 2022-10-25 | 湖南大学 | Population distribution identification method based on multi-source big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Foroozesh et al. | Assessment of sustainable urban development based on a hybrid decision-making approach: Group fuzzy BWM, AHP, and TOPSIS–GIS | |
CN107247938B (en) | high-resolution remote sensing image urban building function classification method | |
Sánchez-Lozano et al. | GIS-based photovoltaic solar farms site selection using ELECTRE-TRI: Evaluating the case for Torre Pacheco, Murcia, Southeast of Spain | |
Hoshino et al. | Measuring the benefits of neighbourhood park amenities: Application and comparison of spatial hedonic approaches | |
CN106780089B (en) | Permanent basic farmland planning method based on neural network cellular automaton model | |
CN107872808B (en) | WLAN station address prediction analysis method and device | |
CN112954623B (en) | Resident occupancy rate estimation method based on mobile phone signaling big data | |
CN111784084A (en) | Travel generation prediction method, system and device based on gradient lifting decision tree | |
Agustina et al. | Cellular Automata for Cirebon City Land Cover and Development Prediction | |
CN113191553A (en) | Population space distribution estimation method and system based on building scale | |
Eckman et al. | Methods of geo-spatial sampling | |
Tian et al. | Suburban sprawl measurement and landscape analysis of cropland and ecological land: A case study of Jiangsu Province, China | |
Crols et al. | Downdating high-resolution population density maps using sealed surface cover time series | |
Singh et al. | Geospatial Approach for Decentralised Planning at Rajhana Panchayat, Himachal Pradesh | |
Li et al. | Urban land price assessment based on GIS and deep learning | |
Abujayyab et al. | A new framework for geospatial site selection using artificial neural networks as decision rules: a case study on landfill sites | |
CN116167254A (en) | Multidimensional city simulation deduction method and system based on city big data | |
Triantakonstantis et al. | Analyzing urban sprawl in Rethymno, Greece | |
Georgati et al. | Spatial Disaggregation of Population Subgroups Leveraging Self-Trained Multi-Output Gradient Boosting Regression Trees | |
CN109308313A (en) | Resource-rich regions recognition methods based on cultural resource evaluation | |
Xu et al. | A three-dimensional future land use simulation (FLUS-3D) model for simulating the 3D urban dynamics under the shared socio-economic pathways | |
Gorricha et al. | A framework for exploratory analysis of extreme weather events using geostatistical procedures and 3D self-organizing maps | |
Antoni | Urban sprawl modelling: a methodological approach | |
Fafchamps et al. | The evolution of built-up areas in Ghana since 1975 | |
Zhao et al. | Optimal site selection strategies for urban parks green spaces under the joint perspective of spatial equity and social equity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210730 |