CN113191553A - Population space distribution estimation method and system based on building scale - Google Patents

Population space distribution estimation method and system based on building scale Download PDF

Info

Publication number
CN113191553A
CN113191553A CN202110491470.2A CN202110491470A CN113191553A CN 113191553 A CN113191553 A CN 113191553A CN 202110491470 A CN202110491470 A CN 202110491470A CN 113191553 A CN113191553 A CN 113191553A
Authority
CN
China
Prior art keywords
building
data
scale
population
density
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110491470.2A
Other languages
Chinese (zh)
Inventor
杜世宏
商硕硕
白璐斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Research Center Of Digital City Engineering
Peking University
Original Assignee
Shenzhen Research Center Of Digital City Engineering
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Research Center Of Digital City Engineering, Peking University filed Critical Shenzhen Research Center Of Digital City Engineering
Priority to CN202110491470.2A priority Critical patent/CN113191553A/en
Publication of CN113191553A publication Critical patent/CN113191553A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/08Construction

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Operations Research (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Educational Administration (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method and a system for estimating population space distribution based on building dimension, belonging to the technical field of population space distribution, wherein the method comprises the following steps: obtaining classification data of a building, the classification data describing a category of the building; calculating a total area of the building from the classification data; and constructing a random forest algorithm model, and estimating the population number on the building scale according to the total area of the building. The invention realizes the estimation of the population number on the scale of the building by constructing a random forest algorithm model based on the classification data of the building, can be applied to the fields of social resource analysis, emergency evacuation, business decision, urban planning and the like, and has higher social value.

Description

Population space distribution estimation method and system based on building scale
Technical Field
The invention relates to the technical field of population space distribution, in particular to a population space distribution estimation method and system based on building dimensions.
Background
The population distribution data can reflect the natural conditions and the economic development level of a region, and the population space distribution data with fine scale can be applied to many fields, such as disaster management, resource allocation, smart city construction and the like. Traditional demographic data is often obtained by census, which, although very accurate, has some significant drawbacks: first, the cost is high and the time spent is long; second, census is often performed on a unit scale of administrative districts with low spatial resolution, and thus cannot fully represent the differences in demographics within an administrative district.
In order to better interact with other spatial data and truly reflect the distribution characteristics of the population on the space, it is necessary to use a highly automated method to estimate the population distribution on a fine scale.
Disclosure of Invention
The invention provides a method and a system for estimating population space distribution based on building scale, which are used for solving the problems that population estimation in the prior art is high in cost and long in time and cannot complete population distribution difference in a standard administrative district, and realizing fine population space distribution estimation.
The invention provides a method and a system for estimating population space distribution based on building dimension, comprising the following steps:
obtaining classification data of a building, the classification data describing a category of the building;
calculating a total area of the building from the classification data;
and constructing a random forest algorithm model, and estimating the population number on the building scale according to the total area of the building.
According to the building scale-based population space distribution estimation method, the acquiring classification data of the building comprises the following steps:
superposing the acquired city planning data and building vector data to obtain building data, wherein the building data comprises city functional area data;
classifying the building data to obtain urban residence data and rural residence data;
and classifying the urban residential data based on the urban functional area data to obtain single residential building data, common residential building data and dense residential building data.
According to the building scale-based population space distribution estimation method, the urban functional area data comprise one or more combinations of urban planning data, building vector data, university campus area data and urban functional area product data.
According to the building scale-based population space distribution estimation method, the obtaining of classification data of the building comprises the following steps:
obtaining remote sensing data corresponding to a building through a satellite;
and obtaining classification data of the building based on the remote sensing data and according to the classification characteristics of the building.
According to the building scale-based population space distribution estimation method, the calculating the total area of the building according to the classification data comprises the following steps:
calculating the POI density on the grid of the classified data, wherein the calculation formula is as follows:
Figure BDA0003052376970000021
where n is the number of meshes intersecting the ith cell, pjIs the POI density, c, of the jth mesh intersecting cell ijIs the area of intersection with the cell i and the jth grid;
calculating a vegetation coverage area ratio based on the POI density, wherein the calculation formula is as follows:
Figure BDA0003052376970000022
wherein G isiPoint of interest POI density, a, representing the ith celliRepresenting the vegetation area of the ith cell, i.e. the floor area of the building, biRepresents the total area of the ith cell;
calculating the total area of the building according to the vegetation coverage area ratio, wherein the calculation formula is as follows:
Ai=aifi
wherein A isiIs the total area of the building, aiIs the floor area of the building, fiIs the number of floors in the building.
According to the building scale-based population space distribution estimation method, the POI density is calculated on the grid of the classification data, and the method comprises the following steps:
respective point-of-interest POI densities are calculated using kernel density analysis on a mesh having a resolution of 30m, each point-of-interest POI density being an average point-of-interest POI density of the mesh with which it intersects.
According to the building scale-based population space distribution estimation method, the calculating the total area of the building according to the classification data further comprises:
and calculating a Pearson correlation coefficient between the POI density and the population density according to the POI density, wherein the calculation formula is as follows:
Figure BDA0003052376970000031
wherein, XiPoint of interest POI density, Y, representing the ith celliRepresents the population density of the ith cell;
calculating a vegetation coverage area ratio based on the Pearson correlation coefficient;
wherein the Pearson correlation coefficient represents the degree of correlation between the POI density and the population density.
The invention also provides a system for estimating the population space distribution based on the building scale, which comprises the following components:
the classification module is used for acquiring classification data of the building, and the classification data is used for describing the category of the building;
a calculation module for calculating the total area of the building according to the classification data;
and the estimation module is used for constructing a random forest algorithm model and estimating the population number on the building scale according to the total area of the building.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the building scale-based population space distribution estimation method.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the building-scale based population space distribution estimation method as described in any of the above.
According to the method and the system for estimating the population space distribution based on the building scale, the population number on the building scale is estimated by constructing the random forest algorithm model based on the classification data of the building, and the method and the system can be applied to the fields of social resource analysis, emergency evacuation, business decision, city planning and the like and have higher social value.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method for estimating a spatial distribution of a population based on a building scale according to the present invention;
FIG. 2 is a schematic diagram of a process for obtaining classification data according to the present invention;
FIG. 3 is a schematic flow chart of the present invention for calculating the total area of a building;
FIG. 4 is a schematic diagram of classification data of a building provided by an embodiment of the present invention;
FIG. 5 is a diagram illustrating population extraction results provided by an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a building-scale-based population space distribution estimation system provided by the present invention;
FIG. 7 is a schematic structural diagram of an electronic device provided by the present invention;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," and the like in the description and in the claims, and in the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein.
Urbanization in china is rapidly developing, resulting in bringing resources and elements into cities. Urbanization and human socioeconomic activities affect and change the geographic landscape of cities, creating different types of functional areas of urban landscape (e.g., commercial, residential, industrial, green, and water). The data of the urban landscape functional area is important for analyzing the urban spatial pattern, revealing the urbanization process, evaluating the urban ecological environment and promoting urban land planning and sustainable development.
The existing urban land coverage/utilization data cannot completely reflect the urban landscape function, the spatial resolution is low, and local detailed information cannot be provided, so that the development of fine-grained urban research is limited. In order to better interact with other spatial data and truly reflect the distribution characteristics of the population on the space, it is necessary to use a highly automated method to estimate the population distribution on a fine scale.
The building-scale-based population space distribution estimation method and system of the present invention are described below with reference to fig. 1-7.
Fig. 1 is a schematic flow chart of a method for estimating a spatial distribution of a population based on a building scale according to the present invention, as shown in fig. 1. A building-scale-based population spatial distribution estimation method, comprising:
step 101, obtaining classification data of a building, wherein the classification data is used for describing the category of the building.
Currently, methods commonly used for population estimation studies include symmetric mapping, multivariate regression, and multifactor fusion (Batista, Gallego and Lavalle, 2013; Steven et al, 2015; Zeng et al, 2011). These studies generally take into account the effects of natural environmental factors and socioeconomic factors (Yang et al, 2019), and estimate the population on a grid scale (balakrinhan, 2019) or a building scale (Han et al, 2019).
Estimating population distributions on a grid scale often requires consideration of the influence of the scale effect, different research goals and regions require different grid scales, and the grid itself can destroy the actual geographic boundaries. The method for solving the problem is to estimate the number of human mouths on the building scale, but most of the estimation methods do not take the spatial heterogeneity caused by different building types into consideration at present.
Due to the spatial diversity of building classes, different types of buildings have different population densities, for example, rural residential buildings typically have a lower population density than most urban residential buildings. Therefore, the invention estimates the human mouth number on the building scale and considers the spatial heterogeneity brought by the building category, thereby realizing the estimation of the human mouth number on the building scale.
And 102, calculating the total area of the building according to the classification data.
In addition, the influence of natural environmental factors and social and economic factors on human mouth space distribution is also considered, the natural environmental factors mainly include the factors of landform, vegetation and the like, and the total area of the building can be calculated through the factors of the landform, the vegetation and the like. The social and economic factors are mainly factors of traffic, infrastructure, service industry and the like.
Preferably, the method can use Point of Interest (POI) data and vegetation coverage vector data to describe the social geographic environment of the residential house, and the extracted social geographic environment characteristic population density is related to correlation coefficients (such as pearson correlation coefficients) and vegetation coverage area ratios between POIs of different categories.
Preferably, the POI is a spatial feature with geographic identification, and includes information such as name, category, longitude and latitude, and can intuitively and effectively reflect spatial distribution conditions among city elements, and the POI is closely related to human activities and thus closely related to population distribution.
And 103, constructing a random forest algorithm model, and estimating the population number on the building scale according to the total area of the building.
The random forest algorithm model is a machine learning model, such as a neural network. The random forest algorithm model is a forest established in a random mode, the forest is composed of a plurality of decision trees (the decision trees are in a tree structure and can be binary trees or non-binary trees), and each decision tree of the random forest is not related. After a forest is obtained, when a new input sample enters, each decision tree in the forest is judged respectively, which type the new input sample belongs to is judged, and then which type is selected much more, and the sample is predicted to be which type.
The invention estimates the population number on the scale of a building by training a constructed random forest model on the scale of a preset area (such as a street).
The above steps 101 to 103 are described below.
Fig. 2 is a schematic flow chart of acquiring classification data according to the present invention, as shown in the figure. In the step 101, the obtaining classification data of the building includes:
step 201, overlapping the acquired city planning data and building vector data to obtain building data, wherein the building data comprises city functional area data.
Wherein the city planning data is divided by administrative regions, such as provinces, cities, districts, streets and the like.
Buildings are an important feature of urban areas, and high-resolution satellite images contain abundant shape structure and texture information of earth surface targets, so that the high-resolution satellite images become important data sources for urban research. The building vector data may be acquired by prior art techniques.
Preferably, the remote sensing data corresponding to the building can be obtained through a satellite, and the remote sensing data comprises city planning data and building vector data.
Step 202, classifying the building data to obtain city residence data and rural residence data.
And 203, classifying the urban residential data based on the urban functional area data to obtain single residential building data, common residential building data and dense residential building data.
The city functional area data comprises one or more combinations of city planning data, building vector data, university campus area data and city functional area product data.
Therefore, classifying the building data into the individual residential building data, the general residential building data, the dense residential building data, and the rural residential building data facilitates estimating different population densities based on different types of residential buildings.
Fig. 3 is a schematic flow chart of calculating the total area of the building provided by the present invention, as shown in the figure. In the step 102, calculating the total area of the building according to the classification data includes:
step 301, calculating POI density on the grid of the classification data, wherein the calculation formula is as follows:
Figure BDA0003052376970000081
where n is the number of meshes intersecting the ith cell, pjIs the POI density, c, of the jth mesh intersecting cell ijIs the area that intersects the cell i with the jth grid.
Preferably, the respective POI densities are calculated using a kernel density analysis on a mesh with a resolution of 30m, the POI density of each cell being the average POI density of the mesh it intersects, and each cell being either a certain building or a certain street.
Step 302, calculating a pearson correlation coefficient between the POI density and the population density according to the POI density, wherein the calculation formula is as follows:
Figure BDA0003052376970000082
wherein, XiPOI density, Y, representing the ith celliRepresenting the population density of the ith cell.
Pearson correlation coefficient (Pearson correlation coefficient) is used to reflect the degree of linear correlation between two random variables. RhoX,YIs between-1 and 1, and when the value is 1, two random variables (X) are representedi、Yi) The two are in a complete positive correlation; when the value is-1, the complete negative correlation relationship is shown between the two random variables; and when the value is 0, the linear independence between the two random variables is shown.
Through the calculation formula of the Pearson correlation coefficient, the correlation between the POI density and the population density of each unit can be calculated, and the POI density with the highest correlation of the Pearson correlation coefficient can be selected to calculate the vegetation coverage area ratio.
Step 303, calculating a vegetation coverage area ratio based on the pearson correlation coefficient, wherein a calculation formula is as follows:
Figure BDA0003052376970000091
wherein G isiPoint of interest POI density, a, representing the ith celliRepresenting the vegetation area of the ith cell, i.e. the floor area of the building, biRepresenting the total area of the ith cell.
The vegetation coverage ratio is generally the ratio of forest area to total area of land, and is generally expressed as a percentage. According to the invention, POI density data and vegetation coverage vector data are used for describing the social and geographic environment of residential houses, and correlation coefficients and vegetation coverage area ratios between the extracted social and geographic environment characteristic population density and POI densities of different categories are obtained.
Step 304, calculating the total area of the building according to the vegetation coverage area ratio, wherein the calculation formula is as follows:
Ai=aifi
wherein A isiIs the total area of the building, aiIs the floor area of the building, fiIs the number of floors in the building.
From the above calculations, each unit possesses (1) a park density; (2) research institution density; (3) catering density; (4) density of tourist attractions; (5) (ii) recreational density; (6) density of landmarks; (7) market density; (8) traffic service quality; (9) a public quality of service; (10) vegetation coverage ratio; (11) the area of a single residential building; (12) the area of a common residential building; (13) the dense residential building area and the rural residential building area have 13 characteristics.
Specifically, the 13 street scale features obtained by the calculation are used as independent variables of the random forest model, and the population number on the street is used as a dependent variable of the random forest model for training and testing. The random forest algorithm is a typical ensemble learning method, and improves the fitting capability of a model by integrating a plurality of decision trees.
The decision tree is a machine learning classification regression method, and divides samples by selecting features which can maximize the current gain at each node through training data. A commonly used information gain calculation method is information entropy, and the calculation formula is as follows:
Figure BDA0003052376970000101
where pi represents the probability of a certain class occurring. If the samples input into the current node have characteristics [ f _1, f _ 2., f _ (n-1), f _ n ], the information gain of each characteristic is calculated respectively, and the characteristics which can enable the information gain of the current node to be maximum are selected as the division characteristics to classify the samples until a threshold condition set by an algorithm is reached.
The random forest algorithm is an integration of decision tree algorithms, and commonly used integration methods include Bagging (Bootstrap aggregation algorithm) and Boosting (Boosting method). The method uses a Bagging method, namely, a sample is randomly taken out from a data set of m samples and put into a sampling set, and then the sample is put back into an initial data set, so that the sample is possibly selected in the next sampling, the sampling set of m samples is obtained through m rounds of random sampling, m decision trees are obtained through calculation by using the m sample sets, and the classification results of the m decision trees are averaged to be used as the final prediction result of the random forest.
In the invention, the characteristics of the street and the building two-scale units are respectively calculated, and as the population distribution data of a single building scale is not easy to obtain and the population distribution data of the street scale is relatively easy to obtain, the unit characteristics of the street scale are used for training and testing the random forest model to predict the unit characteristics of the building scale.
In the training stage, 13 features of each street under the street scale are used as independent variables to be input into a random forest model, the population number of each street is used as a value to be predicted and output by the random forest, and the model is trained. And in the prediction stage, taking 13 features of each building under the building scale as input of a trained random forest model, and predicting the population number of each building.
The method and system for estimating the spatial distribution of the population based on the building scale according to the present invention are described in an embodiment below.
Taking the estimation of population distribution in a certain city as an example, the population quantity of residential buildings in the certain city is estimated by using multi-source spatial data such as building vector data, functional area dividing data, university campus boundary data, POI data and population distribution data in the certain city.
For example, the functional area classification data of a city may be divided into 12 types according to actual needs, as shown in the following table:
ID class of class Definition of
1 is provided with Woodlands Forest land, grassland, etc
2 are provided with Water (W) Natural and artificial water body
3 Has not been developed Undeveloped land and bare soil in towns and villages
4 Transportation of Urban roads, traffic facilities, etc
5 Of green colour Public recreation land for park and protective green land
6 Of industrial interest Industrial, mining, storage
7 Of mechanisms Administration, culture, education, sports, hygiene, etc
8 Of commerce Business and entertainment, etc
9 Residence 1 Low-rise residence
10 Residence 2 Multi-storey, middle-and high-rise residence
11 Residence 3 Shed area, rural home base, etc
12 Agricultural production Farmland, paddy field, orchard and the like
Step one, building data of a certain city is obtained, such as city functional area data, building vector data, university campus data and city planning data of the certain city.
And step two, classifying the residential buildings in a certain city according to the acquired building data in the certain city to obtain data (shown in figure 4) of four categories, namely a single residential building, a common residential building, a dense residential building and a rural residential building, wherein the remote sensing images of the four residential areas are sequentially shown from a to d in figure 4. From fig. 4, it can be observed that there are great differences between residential buildings, and the population distributions have strong spatial heterogeneity, which indicates that the classification of residential buildings is necessary.
And step three, calculating each living index of the POI by using POI density data and vegetation coverage data of a certain city, including calculating the total area of the building.
The distribution density of the various POIs is calculated using kernel density analysis on a mesh with a resolution of 30m, the POI density for each cell (street or building) being the average POI density of the mesh it intersects, as follows:
Figure BDA0003052376970000121
where n is the number of meshes intersecting the ith cell, pjIs the POI density, c, of the jth mesh intersecting cell ijIs the area that intersects the cell i with the jth grid.
The pearson correlation coefficient between each type of POI and population density is calculated as follows:
Figure BDA0003052376970000122
wherein, XiPOI density, Y, representing the ith celliRepresenting the population density of the ith cell.
Calculating the vegetation coverage area ratio of each unit as follows:
Figure BDA0003052376970000123
wherein G isiDenotes the POI density of the ith cell, aiIndicates the vegetation area of the ith cell, biRepresenting the total area of the ith cell.
Calculating the attribute of the building, mainly calculating the total area of the building according to the building vector data, as follows:
Ai=aifi
wherein A isiIs the total area of the building, aiIs the floor area of the building, fiIs the number of floors in the building.
Because the constructed random forest algorithm model is trained by using street scale data, and regression estimation is performed by using building scale population distribution data, each index of the random forest algorithm model needs to be calculated under two unit scales during calculation, namely, each unit contains 13 indexes for measuring the living environment through the calculation formula, as follows:
using POI data to calculate: (1) park density; (2) research institution density; (3) catering density; (4) density of tourist attractions; (5) (ii) recreational density; (6) density of landmarks; (7) market density; (8) traffic service quality; (9) a public quality of service;
calculating by using vegetation coverage data to obtain: (10) vegetation coverage ratio;
the data of the city functional area and the like are used for calculation to obtain: (11) the area of a single residential building; (12) the area of a common residential building; (13) dense residential building area and rural residential building area.
And step four, establishing a random forest algorithm model, training and testing by using data of street scales, and predicting and estimating population distribution density on the scale of the building.
Specifically, a random forest algorithm model is written by using Python or R Language (The R Programming Language), 13 features and population distribution of each street in The city under The street scale are respectively used as independent variables and dependent variables of The model, model parameters are adjusted and trained, and The model parameters are stored for The finally trained model.
In the prediction stage, the trained model is used, 13 characteristics of each building in the city under the single building scale are used as independent variable input of the model, and the population number of each building is predicted.
Through the calculation of the steps, the average absolute percentage error of 19% is finally obtained, and the effect is good. The calculation results are the population density estimation results of multi-span residential buildings, ordinary residential buildings, dense residential buildings and rural residential buildings respectively as shown in fig. 5.
The following describes the building-scale-based population space distribution estimation system provided by the present invention, and the building-scale-based population space distribution estimation system described below and the building-scale-based population space distribution estimation method described above can be referred to correspondingly.
Fig. 6 is a schematic structural diagram of a building-scale-based population space distribution estimation system provided by the present invention, as shown in the figure. A building-scale-based demographic spatial distribution estimation system 600 includes a classification module 610, a calculation module 620, and an estimation module 630. Wherein the content of the first and second substances,
the classification module 610 is configured to obtain classification data of the building, where the classification data is used to describe a category of the building.
A calculating module 620, configured to calculate a total area of the building according to the classification data.
And the estimation module 630 is used for constructing a random forest algorithm model and estimating the population number on the building scale according to the total area of the building.
Preferably, the classification module 610 is further configured to perform the following steps:
superposing the acquired city planning data and building vector data to obtain building data, wherein the building data comprises city functional area data;
classifying the building data to obtain urban residence data and rural residence data;
and classifying the urban residential data based on the urban functional area data to obtain single residential building data, common residential building data and dense residential building data.
Preferably, the city functional area data includes one or more combinations of city planning data, building vector data, university campus area data, and city functional area product data.
Preferably, the classification module 610 is further configured to perform the following steps:
obtaining remote sensing data corresponding to a building through a satellite;
and obtaining classification data of the building based on the remote sensing data and according to the classification characteristics of the building.
Preferably, the calculating module 620 is further configured to perform the following steps:
calculating the POI density on the grid of the classified data, wherein the calculation formula is as follows:
Figure BDA0003052376970000141
where n is the number of meshes intersecting the ith cell, pjIs the POI density, c, of the jth mesh intersecting cell ijIs the area of intersection with the cell i and the jth grid;
calculating a vegetation coverage area ratio based on the POI density, wherein the calculation formula is as follows:
Figure BDA0003052376970000142
wherein G isiPoint of interest POI density, a, representing the ith celliRepresenting the vegetation area of the ith cell, i.e. the floor area of the building, biRepresents the total area of the ith cell;
calculating the total area of the building according to the vegetation coverage area ratio, wherein the calculation formula is as follows:
Ai=aifi
wherein A isiIs the total area of the building, aiIs the floor area of the building, fiIs the number of floors in the building.
Preferably, the calculating module 620 calculates the respective POI densities on the mesh with the resolution of 30m by using the kernel density analysis, and each POI density is an average POI density of the mesh intersected with the POI density.
Preferably, the calculating module 620 is further configured to perform the following steps:
and calculating a Pearson correlation coefficient between the POI density and the population density according to the POI density, wherein the calculation formula is as follows:
Figure BDA0003052376970000151
wherein, XiPoint of interest POI density, Y, representing the ith celliRepresents the population density of the ith cell;
calculating a vegetation coverage area ratio based on the Pearson correlation coefficient;
wherein the Pearson correlation coefficient represents the degree of correlation between the POI density and the population density.
Fig. 7 illustrates a physical structure diagram of an electronic device, and as shown in fig. 7, the electronic device may include: a processor (processor)710, a communication Interface (Communications Interface)720, a memory (memory)730, and a communication bus 740, wherein the processor 710, the communication Interface 720, and the memory 730 communicate with each other via the communication bus 740. The processor 710 may invoke logic instructions in the memory 730 to perform the building-scale-based demographic spatial distribution estimation method, comprising:
obtaining classification data of a building, the classification data describing a category of the building;
calculating a total area of the building from the classification data;
and constructing a random forest algorithm model, and estimating the population number on the building scale according to the total area of the building.
In addition, the logic instructions in the memory 730 can be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, enable the computer to perform the building scale-based population space distribution estimation method provided by the above methods, including:
obtaining classification data of a building, the classification data describing a category of the building;
calculating a total area of the building from the classification data;
and constructing a random forest algorithm model, and estimating the population number on the building scale according to the total area of the building.
In yet another aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the building scale-based population spatial distribution estimation method provided in the above aspects, including:
obtaining classification data of a building, the classification data describing a category of the building;
calculating a total area of the building from the classification data;
and constructing a random forest algorithm model, and estimating the population number on the building scale according to the total area of the building.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for estimating a spatial distribution of a population based on a building scale, comprising:
obtaining classification data of a building, the classification data describing a category of the building;
calculating a total area of the building from the classification data;
and constructing a random forest algorithm model, and estimating the population number on the building scale according to the total area of the building.
2. The building-scale-based population space distribution estimation method according to claim 1, wherein said obtaining classification data of buildings comprises:
superposing the acquired city planning data and building vector data to obtain building data, wherein the building data comprises city functional area data;
classifying the building data to obtain urban residence data and rural residence data;
and classifying the urban residential data based on the urban functional area data to obtain single residential building data, common residential building data and dense residential building data.
3. The building-scale based demographic spatial distribution estimation method of claim 2, the functional city zone data comprising one or more combinations of city planning data, building vector data, university campus zone data, and functional city zone product data.
4. The building-scale-based population space distribution estimation method according to claim 1, wherein the obtaining classification data of the building comprises:
obtaining remote sensing data corresponding to a building through a satellite;
and obtaining classification data of the building based on the remote sensing data and according to the classification characteristics of the building.
5. The building-scale-based population space distribution estimation method according to claim 1, wherein said calculating a total area of said building from said classification data comprises:
calculating the POI density on the grid of the classified data, wherein the calculation formula is as follows:
Figure FDA0003052376960000021
where n is the number of meshes intersecting the ith cell, pjIs the POI density, c, of the jth mesh intersecting cell ijIs the area of intersection with the cell i and the jth grid;
calculating a vegetation coverage area ratio based on the POI density, wherein the calculation formula is as follows:
Figure FDA0003052376960000022
wherein G isiPoint of interest POI density, a, representing the ith celliRepresenting the vegetation area of the ith cell, i.e. the floor area of the building, biRepresents the total area of the ith cell;
calculating the total area of the building according to the vegetation coverage area ratio, wherein the calculation formula is as follows:
Ai=aifi
wherein A isiIs the total area of the building, aiIs the floor area of the building, fiIs the number of floors in the building.
6. The building-scale-based spatio-temporal distribution estimation method of claim 5, wherein calculating the POI density on the mesh of classification data comprises:
respective point-of-interest POI densities are calculated using kernel density analysis on a mesh having a resolution of 30m, each point-of-interest POI density being an average point-of-interest POI density of the mesh with which it intersects.
7. The building-scale-based demographic spatial distribution estimation method of claim 5, wherein the calculating a total area of the building from the classification data further comprises:
and calculating a Pearson correlation coefficient between the POI density and the population density according to the POI density, wherein the calculation formula is as follows:
Figure FDA0003052376960000023
wherein, XiPoint of interest POI density, Y, representing the ith celliRepresents the population density of the ith cell;
calculating a vegetation coverage area ratio based on the Pearson correlation coefficient;
wherein the Pearson correlation coefficient represents the degree of correlation between the POI density and the population density.
8. A building-scale-based demographic spatial distribution estimation system, comprising:
the classification module is used for acquiring classification data of the building, and the classification data is used for describing the category of the building;
a calculation module for calculating the total area of the building according to the classification data;
and the estimation module is used for constructing a random forest algorithm model and estimating the population number on the building scale according to the total area of the building.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the building-scale based demographic spatial distribution estimation method of any of claims 1 to 7.
10. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, performs the steps of the building-scale based demographic spatial distribution estimation method of any of claims 1 to 7.
CN202110491470.2A 2021-05-06 2021-05-06 Population space distribution estimation method and system based on building scale Pending CN113191553A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110491470.2A CN113191553A (en) 2021-05-06 2021-05-06 Population space distribution estimation method and system based on building scale

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110491470.2A CN113191553A (en) 2021-05-06 2021-05-06 Population space distribution estimation method and system based on building scale

Publications (1)

Publication Number Publication Date
CN113191553A true CN113191553A (en) 2021-07-30

Family

ID=76984167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110491470.2A Pending CN113191553A (en) 2021-05-06 2021-05-06 Population space distribution estimation method and system based on building scale

Country Status (1)

Country Link
CN (1) CN113191553A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115238584A (en) * 2022-07-29 2022-10-25 湖南大学 Population distribution identification method based on multi-source big data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106708962A (en) * 2016-11-30 2017-05-24 中山大学 Urban population distribution method based on building properties
JP2019067224A (en) * 2017-10-03 2019-04-25 日本電気株式会社 Human flow pattern estimation system, human flow pattern estimation method, and human flow pattern estimation program
CN109978249A (en) * 2019-03-19 2019-07-05 广州大学 Population spatial distribution method, system and medium based on two-zone model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106708962A (en) * 2016-11-30 2017-05-24 中山大学 Urban population distribution method based on building properties
JP2019067224A (en) * 2017-10-03 2019-04-25 日本電気株式会社 Human flow pattern estimation system, human flow pattern estimation method, and human flow pattern estimation program
CN109978249A (en) * 2019-03-19 2019-07-05 广州大学 Population spatial distribution method, system and medium based on two-zone model

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115238584A (en) * 2022-07-29 2022-10-25 湖南大学 Population distribution identification method based on multi-source big data

Similar Documents

Publication Publication Date Title
Foroozesh et al. Assessment of sustainable urban development based on a hybrid decision-making approach: Group fuzzy BWM, AHP, and TOPSIS–GIS
CN107247938B (en) high-resolution remote sensing image urban building function classification method
Sánchez-Lozano et al. GIS-based photovoltaic solar farms site selection using ELECTRE-TRI: Evaluating the case for Torre Pacheco, Murcia, Southeast of Spain
Hoshino et al. Measuring the benefits of neighbourhood park amenities: Application and comparison of spatial hedonic approaches
CN106780089B (en) Permanent basic farmland planning method based on neural network cellular automaton model
CN107872808B (en) WLAN station address prediction analysis method and device
CN112954623B (en) Resident occupancy rate estimation method based on mobile phone signaling big data
CN111784084A (en) Travel generation prediction method, system and device based on gradient lifting decision tree
Agustina et al. Cellular Automata for Cirebon City Land Cover and Development Prediction
CN113191553A (en) Population space distribution estimation method and system based on building scale
Eckman et al. Methods of geo-spatial sampling
Tian et al. Suburban sprawl measurement and landscape analysis of cropland and ecological land: A case study of Jiangsu Province, China
Crols et al. Downdating high-resolution population density maps using sealed surface cover time series
Singh et al. Geospatial Approach for Decentralised Planning at Rajhana Panchayat, Himachal Pradesh
Li et al. Urban land price assessment based on GIS and deep learning
Abujayyab et al. A new framework for geospatial site selection using artificial neural networks as decision rules: a case study on landfill sites
CN116167254A (en) Multidimensional city simulation deduction method and system based on city big data
Triantakonstantis et al. Analyzing urban sprawl in Rethymno, Greece
Georgati et al. Spatial Disaggregation of Population Subgroups Leveraging Self-Trained Multi-Output Gradient Boosting Regression Trees
CN109308313A (en) Resource-rich regions recognition methods based on cultural resource evaluation
Xu et al. A three-dimensional future land use simulation (FLUS-3D) model for simulating the 3D urban dynamics under the shared socio-economic pathways
Gorricha et al. A framework for exploratory analysis of extreme weather events using geostatistical procedures and 3D self-organizing maps
Antoni Urban sprawl modelling: a methodological approach
Fafchamps et al. The evolution of built-up areas in Ghana since 1975
Zhao et al. Optimal site selection strategies for urban parks green spaces under the joint perspective of spatial equity and social equity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210730