CN110689055B - Cross-scale statistical index spatialization method considering grid unit attribute grading - Google Patents
Cross-scale statistical index spatialization method considering grid unit attribute grading Download PDFInfo
- Publication number
- CN110689055B CN110689055B CN201910854444.4A CN201910854444A CN110689055B CN 110689055 B CN110689055 B CN 110689055B CN 201910854444 A CN201910854444 A CN 201910854444A CN 110689055 B CN110689055 B CN 110689055B
- Authority
- CN
- China
- Prior art keywords
- grid
- administrative unit
- statistical
- unit
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 88
- 239000013598 vector Substances 0.000 claims abstract description 40
- 238000012549 training Methods 0.000 claims abstract description 22
- 238000007637 random forest analysis Methods 0.000 claims description 12
- 238000011156 evaluation Methods 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 description 7
- 238000013508 migration Methods 0.000 description 6
- 230000005012 migration Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000010219 correlation analysis Methods 0.000 description 4
- 238000012546 transfer Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Development Economics (AREA)
- Physics & Mathematics (AREA)
- Marketing (AREA)
- Educational Administration (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Quality & Reliability (AREA)
- Game Theory and Decision Science (AREA)
- Data Mining & Analysis (AREA)
- Operations Research (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明公开了一种顾及格网单元属性分级的跨尺度统计指标空间化方法,首先在粗粒度行政单元尺度分析待空间化统计指标与多源数据之间的相关性,选取与待空间化统计指标具有较高相关性的数据作为建模辅助数据;然后采用分级方法对各类建模辅助数据的格网统计值进行分级,并确定每一类建模辅助数据的最佳分级数量;接着,在行政单元尺度,构建等级占比特征向量并输入回归模型进行训练;接下来在细粒度格网单元尺度,按照各类辅助数据的最佳等级划分为各格网单元构建特征向量,输入回归模型得到各格网单元的统计指标权重;最后将行政单元内待空间化的统计指标总值按权重分配到各个格网单元中得到最终格网统计值。本发明的方法可以大大提高预测精度。
The invention discloses a cross-scale statistical index spatialization method considering the attribute classification of grid cells. First, the correlation between statistical indicators to be spatialized and multi-source data is analyzed at the scale of coarse-grained administrative units, and the statistical indicators to be spatialized are selected. The data with high correlation of the indicators is used as the modeling auxiliary data; then the grid statistics of various types of modeling auxiliary data are classified by the classification method, and the optimal number of classifications for each type of modeling auxiliary data is determined; then, At the administrative unit scale, construct the feature vector of grade proportion and input it into the regression model for training; then at the fine-grained grid unit scale, divide each grid unit according to the best grade of various auxiliary data to construct a feature vector and input it into the regression model The weights of the statistical indicators of each grid unit are obtained; finally, the total value of the statistical indicators to be spatialized in the administrative unit is distributed to each grid unit according to the weight to obtain the final grid statistical value. The method of the present invention can greatly improve the prediction accuracy.
Description
技术领域technical field
本发明涉及地理信息科学领域,包括经济地理学、人口地理学以及环境地理学等,具体涉及一种顾及格网单元属性分级的跨尺度统计指标空间化方法。The invention relates to the field of geographic information science, including economic geography, population geography, environmental geography, etc., in particular to a cross-scale statistical index spatialization method that takes into account the classification of grid cell attributes.
背景技术Background technique
统计指标空间化旨在以地理格网或其他划分形式(如,六边形、建筑物或社区等划分形式)再现统计指标的空间分布,通常是将统计指标的空间表达方式由粗粒度行政单元向细粒度地理格网进行转换。统计指标空间化在人口、GDP和粮食产量等数据上已有广泛的研究,对于精细刻画统计指标空间分布、辅助资源合理配置以及指导政府决策等方面具有重要的科学意义和广阔的应用前景。Spatialization of statistical indicators aims to reproduce the spatial distribution of statistical indicators in geographic grids or other divisions (such as hexagons, buildings, or communities), usually by converting the spatial expression of statistical indicators into coarse-grained administrative units. Convert to a fine-grained geographic grid. Spatialization of statistical indicators has been extensively studied on data such as population, GDP and grain output. It has important scientific significance and broad application prospects for finely depicting the spatial distribution of statistical indicators, rational allocation of auxiliary resources, and guidance for government decision-making.
本申请发明人在实施本发明的过程中,发现现有技术的方法,至少存在如下技术问题:In the process of implementing the present invention, the inventor of the present application found that the method of the prior art has at least the following technical problems:
由于统计指标缺乏格网尺度训练数据,传统空间化通常通过构建行政单元级别的建模因子和统计指标之间的关联关系,并将这种规律从行政单元迁移到格网单元;但二者之间存在跨数量级的尺度差异,引发模型迁移的降尺度问题,从而导致统计指标空间化精度偏低。Due to the lack of grid-scale training data for statistical indicators, traditional spatialization usually builds the correlation between administrative unit-level modeling factors and statistical indicators, and transfers this law from administrative units to grid units; There are scale differences across orders of magnitude between them, which leads to the downscaling problem of model migration, resulting in low spatialization accuracy of statistical indicators.
由此可知,现有技术中的方法存在精度较低的技术问题。It can be seen from this that the method in the prior art has a technical problem of low precision.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本发明提供了一种顾及格网单元属性分级的跨尺度统计指标空间化方法,用以解决或者至少部分解决现有技术中的方法存在的精度较低的技术问题。In view of this, the present invention provides a cross-scale statistical index spatialization method considering grid cell attribute classification, so as to solve or at least partially solve the technical problem of low precision existing in the methods in the prior art.
为了解决上述技术问题,本发明提供了一种顾及格网单元属性分级的跨尺度统计指标空间化方法,包括:In order to solve the above technical problems, the present invention provides a cross-scale statistical index spatialization method that takes into account the classification of grid cell attributes, including:
步骤S1:在粗粒度行政单元尺度分析待空间化统计指标与多源数据之间的相关性,从多源数据中选取与待空间化统计指标具有符合预设程度相关性的数据作为建模辅助数据;Step S1: Analyze the correlation between the statistical indicators to be spatialized and the multi-source data at the coarse-grained administrative unit scale, and select data from the multi-source data that have a predetermined degree of correlation with the statistical indicators to be spatialized as modeling aids data;
步骤S2:采用分级方法对各类建模辅助数据的格网统计值进行分级,通过分级评价指标确定每一类建模辅助数据的最优的分级结果;Step S2: grading the grid statistics of various types of modeling auxiliary data by using a classification method, and determining the optimal classification result of each type of modeling auxiliary data through classification evaluation indicators;
步骤S3:在粗粒度行政单元尺度,统计每类建模辅助数据中各等级格网单元的数量占比,构建等级占比特征向量,并将等级占比特征向量输入回归模型进行训练,得到训练后的回归模型;Step S3: at the coarse-grained administrative unit scale, count the number and proportion of each level grid unit in each type of modeling auxiliary data, construct a level proportion feature vector, and input the level proportion feature vector into the regression model for training, and obtain the training After the regression model;
S4:在细粒度格网单元尺度,按照各类辅助数据的最优的分级结果为各格网单元构建特征向量,并输入训练后的回归模型,得到各格网单元的统计指标权重;S4: At the fine-grained grid unit scale, construct feature vectors for each grid unit according to the optimal classification results of various auxiliary data, and input the trained regression model to obtain the statistical index weight of each grid unit;
S5:对各行政单元包含的格网单元的统计指标权重进行归一化处理,将行政单元内待空间化的统计指标总值按权重分配到各个格网单元中,得到最终格网统计值。S5: Normalize the statistical index weights of the grid units included in each administrative unit, and distribute the total value of the statistical indicators to be spatialized in the administrative unit to each grid unit according to the weight, to obtain the final grid statistical value.
在一种实施方式中,步骤S1具体包括:In one embodiment, step S1 specifically includes:
步骤S1.1:统计所有n个行政单元上m类可在格网单元尺度量化的多源数据属性值,其中,n和m都为大于0的正整数;Step S1.1: Count m types of multi-source data attribute values that can be quantified at the grid unit scale on all n administrative units, where both n and m are positive integers greater than 0;
步骤S1.2:计算待空间化统计指标与多源数据属性值在行政单元级别的皮尔森相关系数;Step S1.2: Calculate the Pearson correlation coefficient between the statistical index to be spatialized and the attribute value of the multi-source data at the administrative unit level;
步骤S1.3:选择相关系数大于阈值T的多源数据作为最终的M类建模辅助数据,M为大于0的正整数。Step S1.3: Select multi-source data with a correlation coefficient greater than a threshold T as the final M-type modeling auxiliary data, where M is a positive integer greater than 0.
在一种实施方式中,步骤S2具体包括:In one embodiment, step S2 specifically includes:
步骤S2.1:统计所有N个格网单元上M类建模辅助数据的量化数值,则每一个格网单元对应M个统计值;Step S2.1: Count the quantified values of M types of modeling auxiliary data on all N grid units, then each grid unit corresponds to M statistical values;
步骤S2.2:对于第t类建模辅助数据的格网统计值用分级方法将格网划分到不同等级,对每次分级的结果采取预设评价指标进行度量以确定最优的分级结果。Step S2.2: Grid statistics for the t-th type of modeling auxiliary data The grid is divided into different grades by a grading method, and a preset evaluation index is used to measure the results of each classification to determine the optimal classification result.
在一种实施方式中,步骤S3具体包括:In one embodiment, step S3 specifically includes:
步骤S3.1:在粗粒度行政单元尺度,统计每类建模辅助数据中各等级格网单元的数量占比,构建等级占比特征向量βi,Step S3.1: At the coarse-grained administrative unit scale, count the number and proportion of each level grid unit in each type of modeling auxiliary data, and construct a level proportion feature vector β i ,
其中,Ni表示第i个行政单元的总格网数,表示属于第t类建模辅助数据第k个等级的格网数量,一个行政单元对应一个特征向量,特征向量中包含多种建模辅助数据的多个等级占比;Among them, Ni represents the total grid number of the ith administrative unit, Indicates the number of grids belonging to the k-th level of the t-th type of modeling auxiliary data, one administrative unit corresponds to one eigenvector, and the eigenvector contains the proportions of multiple levels of various modeling auxiliary data;
步骤S3.2:将一个行政单元作为一个样本,将行政单元的等级占比特征向量作为输入,待空间化统计指标作为输出,对随机森林回归模型进行训练,得到训练后的随机森林回归模型。Step S3.2: Take an administrative unit as a sample, take the rank proportion feature vector of the administrative unit as the input, and the spatialized statistical index as the output, train the random forest regression model, and obtain the trained random forest regression model.
在一种实施方式中,步骤S4具体包括:In one embodiment, step S4 specifically includes:
步骤S4.1:对于一个格网单元,按照最优的分级结果,确定该格网在各类建模辅助数据中的所属等级,根据格网在各类建模辅助数据中的所属等级构建格网单元的特征向量;Step S4.1: For a grid unit, according to the optimal classification result, determine the level of the grid in various types of auxiliary modeling data, and construct a grid according to the level of the grid in various auxiliary modeling data. the eigenvectors of the net elements;
步骤S4.2:将构建的所有格网单元的特征向量,输入训练后的随机森林回归模型,输出得到格网的统计指标权重。Step S4.2: Input the constructed feature vectors of all grid cells into the trained random forest regression model, and output the statistical index weights of the grid.
在一种实施方式中,步骤S5中将行政单元内待空间化的统计指标总值按权重分配到各个格网单元中,计算方法如下:In one embodiment, in step S5, the total value of the statistical index to be spatialized in the administrative unit is allocated to each grid unit according to the weight, and the calculation method is as follows:
其中,i代表第i个行政单元,j代表第j个格网,SIij表示第i个行政单元第j个格网的最终统计指标值,SIi代表第i个行政单元待空间化的统计指标总值,Wij、Wiu分别为第i个行政单元的第j个和第u个格网权重值,Ni表示第i个行政单元的格网总数。Among them, i represents the ith administrative unit, j represents the jth grid, SI ij represents the final statistical index value of the jth grid of the ith administrative unit, and SI i represents the statistics of the ith administrative unit to be spatialized The total value of the index, W ij and W iu are the j-th and u-th grid weights of the ith administrative unit, respectively, and Ni represents the total number of grids of the ith administrative unit.
在一种实施方式中,待空间化指标包括但不限于人口、粮食,将行政单元内待空间化的统计指标总值按权重分配到各个格网单元中,得到最终格网统计值,包括:In one embodiment, the indicators to be spatialized include but are not limited to population and grain, and the total value of the statistical indicators to be spatialized in the administrative unit is allocated to each grid unit according to the weight to obtain the final grid statistical value, including:
将行政单元的人口、粮食统计数值按权重分配到各个格网单元中,得到最终的格网人口数量和粮食产量。The population and grain statistics of the administrative unit are distributed to each grid unit according to the weight, and the final grid population and grain output are obtained.
在一种实施方式中,所述方法还包括:In one embodiment, the method further includes:
在得到格网统计指标后,在模型训练尺度的下一级行政单元将得到的格网预测值进行汇总,与实际的行政单元统计指标值进行对比以验证精度。After the grid statistical indicators are obtained, the predicted grid values obtained are summarized in the next-level administrative unit of the model training scale, and compared with the actual statistical indicators of the administrative unit to verify the accuracy.
在一种实施方式中,验证精度的方法具体为:In one embodiment, the method for verifying the accuracy is specifically:
衡量误差的指标是平均绝对误差MAE和均方根误差RMSE,计算公式如下:The indicators to measure the error are the mean absolute error MAE and the root mean square error RMSE, and the calculation formula is as follows:
其中,表示第i个下一级行政单元的预测统计指标值,表示第i个下一级行政单元的真实统计指标值,′表示下一级行政单元的数量。in, represents the predicted statistical index value of the i-th next-level administrative unit, Represents the true statistical index value of the i-th next-level administrative unit, and ′ represents the number of the next-level administrative unit.
本申请实施例中的上述一个或多个技术方案,至少具有如下一种或多种技术效果:The above-mentioned one or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
本发明提供的一种顾及格网单元属性分级的跨尺度统计指标空间化方法,首先在粗粒度行政单元尺度(如,区县)分析待空间化统计指标与多源数据(可在格网单元尺度量化)的相关性,从中选取与待空间化统计指标具有较高相关性的数据作为建模辅助数据;并采用分级方法对各类建模辅助数据的格网统计值进行分级,通过分级评价指标确定每一类建模辅助数据的最优的分级结果;接着在行政单元尺度,统计每类建模辅助数据中各等级格网单元的数量占比,构建等级占比特征向量并输入回归模型进行训练;接下来在细粒度格网单元尺度,按照各类辅助数据的最佳等级划分为各格网单元构建特征向量,并输入回归模型,据此得到The present invention provides a cross-scale statistical index spatialization method that takes into account the classification of grid cell attributes. First, at the coarse-grained administrative unit scale (eg, districts and counties), the statistical indicators to be spatialized and multi-source data (which can be located in grid cells) are analyzed. Scale and quantification) correlation, and select the data with high correlation with the statistical indicators to be spatialized as the modeling auxiliary data; and use the classification method to classify the grid statistical values of various modeling auxiliary data. The index determines the optimal grading result of each type of auxiliary modeling data; then, at the administrative unit scale, count the proportion of the number of grid cells of each grade in each type of auxiliary modeling data, construct the characteristic vector of grade proportion and input it into the regression model Then, at the fine-grained grid unit scale, according to the best level of various auxiliary data, the grid units are divided into each grid unit to construct feature vectors, and input the regression model, according to which we get
各格网单元的统计指标权重;然后以各行政单元为单位,对其所包含格网单元的统计指标权重归一化,将行政单元内待空间化的统计指标总值按权重分配到各个格网单元中得到最终格网统计值。Statistical index weights of each grid unit; then take each administrative unit as a unit, normalize the statistical index weights of the grid units contained in it, and assign the total value of the statistical indexes to be spatialized in the administrative unit to each grid according to the weight. The final grid statistics are obtained in the grid cells.
由于本发明提供的方法,将统计指标空间化的回归建模过程统一在格网尺度上,从而避免了现有空间化方法由于统计指标缺乏格网尺度训练数据,直接采用粗粒度行政单元数据进行训练并迁移到细粒度格网单元时所导致的跨尺度问题。本方法顾及了格网单元尺度属性的等级分布特点,使得模型迁移较为平滑;相对传统空间化方法精度较高,可为大数据背景下基于多源数据融合的统计指标空间化提供新的解决方案,特别是人口、GDP、粮食产量以及气象气候要素数据等统计指标的空间化。Due to the method provided by the present invention, the regression modeling process of the spatialization of statistical indicators is unified on the grid scale, thereby avoiding that the existing spatialization method directly uses coarse-grained administrative unit data due to the lack of grid-scale training data for statistical indicators. Cross-scale issues caused when training and migrating to fine-grained grid cells. This method takes into account the hierarchical distribution characteristics of grid cell scale attributes, which makes the model migration smoother. Compared with the traditional spatialization method, the accuracy is higher, and it can provide a new solution for the spatialization of statistical indicators based on multi-source data fusion in the context of big data. , especially the spatialization of statistical indicators such as population, GDP, grain output, and meteorological and climatic factors.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are For some embodiments of the present invention, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.
图1为本发明提供的一种顾及格网单元属性分级的跨尺度统计指标空间化方法的流程示意图;FIG. 1 is a schematic flowchart of a method for spatializing cross-scale statistical indicators in consideration of grid cell attribute classification provided by the present invention;
图2为本发明提供的方法的技术路线图;Fig. 2 is the technical roadmap of the method provided by the present invention;
图3为本发明方法的计算流程示意图;Fig. 3 is the calculation flow schematic diagram of the method of the present invention;
图4为本发明中跨尺度特征提取方法的流程图;Fig. 4 is the flow chart of the cross-scale feature extraction method in the present invention;
图5为具体示例中基于高、中、低相关性POI数据的武汉市人口空间化结果在街道级别的平均绝对误差对比图(采用1000米、500米、200米三种格网单元大小);Figure 5 is a comparison chart of the average absolute error of the spatialization results of Wuhan population based on high, medium and low correlation POI data at the street level in a specific example (using three grid cell sizes of 1000 meters, 500 meters, and 200 meters);
图6为利用DBI指数确定分级与任意分级的武汉市人口空间化结果在街道级别的平均绝对误差对比图(采用1000米、500米、200米三种格网单元大小,误差从低到高排列);Figure 6 is a comparison chart of the average absolute error at the street level of the spatialization results of Wuhan population determined by the DBI index and at the street level (three grid cell sizes of 1000 meters, 500 meters and 200 meters are used, and the errors are arranged from low to high );
图7为传统方法与本发明方法的武汉人口空间化结果在街道级别的平均绝对误差对比图(采用1000米、500米、200米三种格网单元大小);Fig. 7 is a comparison chart of the mean absolute error at street level of the spatialization results of Wuhan population of the traditional method and the method of the present invention (using three grid cell sizes of 1000 meters, 500 meters and 200 meters);
图8为传统方法与本发明方法的武汉市人口空间化的结果及误差分布对比图(200米格网单元大小)。FIG. 8 is a comparison diagram of the results and error distribution of the population spatialization of Wuhan City by the traditional method and the method of the present invention (200-meter grid unit size).
具体实施方式Detailed ways
本发明的目的在于针对统计指标缺乏格网尺度训练数据,而直接将行政单元学习到的规律迁移到格网时因降尺度而导致统计指标空间化精度偏低的问题,提供一种顾及格网单元属性分级的跨尺度统计指标空间化方法。The purpose of the present invention is to solve the problem that statistical indicators lack grid-scale training data, and directly transfer the laws learned by administrative units to grids, resulting in low spatialization accuracy of statistical indicators due to downscaling, and to provide a grid-based method. Spatialization of cross-scale statistical indicators for cell attribute classification.
为达到上述目的,本发明的主要构思如下:For achieving the above object, main design of the present invention is as follows:
首先,在粗粒度行政单元尺度(如,区县)分析待空间化统计指标与多源数据(可在格网单元尺度量化)的相关性,从中选取与待空间化统计指标具有较高相关性的数据作为建模辅助数据;接着,采用分级方法对各类建模辅助数据的格网统计值进行分级,通过分级评价指标确定每一类建模辅助数据的最佳分级数量;接下来,在行政单元尺度,统计每类建模辅助数据中各等级格网单元的数量占比,构建等级占比特征向量并输入回归模型进行训练;然后,在细粒度格网单元尺度,按照各类辅助数据的最佳等级划分为各格网单元构建特征向量,并输入回归模型,据此得到各格网单元的统计指标权重;再以各行政单元为单位,对其所包含格网单元的统计指标权重归一化,将行政单元内待空间化的统计指标总值按权重分配到各个格网单元中得到最终格网统计值。在模型训练阶段所采用行政单元尺度的下一级行政单元(如,街道)将格网预测值进行汇总,与真实的行政单元统计指标值进行对比以验证精度。First, analyze the correlation between the statistical indicators to be spatialized and multi-source data (which can be quantified at the grid unit scale) at the coarse-grained administrative unit scale (such as districts and counties), and select the statistical indicators to be spatialized with high correlation. Then, use the classification method to classify the grid statistics of various types of modeling auxiliary data, and determine the optimal number of classifications for each type of modeling auxiliary data through the classification evaluation index; At the administrative unit scale, count the number and proportion of each level grid unit in each type of modeling auxiliary data, construct the level proportion feature vector and input it into the regression model for training; then, at the fine-grained grid unit scale, according to various auxiliary data The optimal level is divided into each grid unit to construct a feature vector, and input the regression model to obtain the statistical index weight of each grid unit; Normalization is to assign the total value of the statistical indicators to be spatialized in the administrative unit to each grid unit according to the weight to obtain the final grid statistical value. The next-level administrative unit (eg, street) of the administrative unit scale used in the model training phase aggregates the grid prediction values and compares them with the actual statistical index values of the administrative unit to verify the accuracy.
本发明将统计指标空间化的回归建模过程统一在格网尺度上,从而避免了现有空间化方法由于统计指标缺乏格网尺度训练数据,直接采用粗粒度行政单元数据进行训练并迁移到细粒度格网单元时所导致的跨尺度问题。本方法顾及了格网单元尺度属性的等级分布特点,使得模型迁移较为平滑;相对传统空间化方法精度较高,可为大数据背景下基于多源数据融合的统计指标空间化提供新的解决方案,特别是人口、GDP、粮食产量以及气象气候要素数据等统计指标的空间化。The present invention unifies the regression modeling process of statistical index spatialization on the grid scale, thereby avoiding that the existing spatialization method directly uses coarse-grained administrative unit data for training and migrates to fine-grained administrative unit data due to lack of grid-scale training data for statistical indicators. Cross-scale problems caused by granular grid cells. This method takes into account the hierarchical distribution characteristics of grid cell scale attributes, which makes the model migration smoother. Compared with the traditional spatialization method, the accuracy is higher, and it can provide a new solution for the spatialization of statistical indicators based on multi-source data fusion in the context of big data. , especially the spatialization of statistical indicators such as population, GDP, grain output, and meteorological and climatic factors.
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
本实施例提供了一种顾及格网单元属性分级的跨尺度统计指标空间化方法,请参见图1,该方法包括:This embodiment provides a cross-scale statistical index spatialization method that considers grid cell attribute classification, see FIG. 1 , and the method includes:
步骤S1:在粗粒度行政单元尺度分析待空间化统计指标与多源数据之间的相关性,从多源数据中选取与待空间化统计指标具有符合预设程度相关性的数据作为建模辅助数据。Step S1: Analyze the correlation between the statistical indicators to be spatialized and the multi-source data at the coarse-grained administrative unit scale, and select data from the multi-source data that have a predetermined degree of correlation with the statistical indicators to be spatialized as modeling aids data.
具体来说,粗粒度行政单元尺度可以为区、县级等,以人口空间化为例,多源数据可以包括POI数据、夜光遥感数据、土地利用类型数据、DEM数据及路网密度数据等,且多源数据可以在格网单元尺度量化。分析待空间化统计指标与多源数据之间的相关性可以采用不同的方法,例如图表相关分析、协方差及协方差矩阵、相关系数等。Specifically, the scale of coarse-grained administrative units can be district, county, etc. Taking population spatialization as an example, multi-source data can include POI data, night light remote sensing data, land use type data, DEM data, and road network density data, etc. And multi-source data can be quantified at the grid cell scale. Different methods can be used to analyze the correlation between statistical indicators to be spatialized and multi-source data, such as graph correlation analysis, covariance and covariance matrix, correlation coefficient, etc.
在一种实施方式中,步骤S1具体包括:In one embodiment, step S1 specifically includes:
步骤S1.1:统计所有n个行政单元上m类可在格网单元尺度量化的多源数据属性值,其中,n和m都为大于0的正整数;Step S1.1: Count m types of multi-source data attribute values that can be quantified at the grid unit scale on all n administrative units, where both n and m are positive integers greater than 0;
步骤S1.2:计算待空间化统计指标与多源数据属性值在行政单元级别的皮尔森相关系数;Step S1.2: Calculate the Pearson correlation coefficient between the statistical index to be spatialized and the attribute value of the multi-source data at the administrative unit level;
步骤S1.3:选择相关系数大于阈值T的多源数据作为最终的M类建模辅助数据,M为大于0的正整数。Step S1.3: Select multi-source data with a correlation coefficient greater than a threshold T as the final M-type modeling auxiliary data, where M is a positive integer greater than 0.
本实施方式中采取的是相关系数法。具体来说,计算待空间化统计指标与多源数据属性值在行政单元级别的皮尔森相关系数:In this embodiment, the correlation coefficient method is adopted. Specifically, calculate the Pearson correlation coefficient between the statistical indicators to be spatialized and the attribute values of multi-source data at the administrative unit level:
其中,ρt是第t类多源数据与待空间化统计指标在行政单元级别的皮尔森相关系数,n是行政单元的总数量,SIi是第i个行政单元的统计指标(statistical indicator),是所有行政单元统计指标的平均值;DVi,t是第i个行政单元上第t类多源数据的量化数值,是第t类多源数据在所有行政单元的平均量化数值。ρ的取值在-1与+1之间,若ρ>0,表明两个变量是正相关,ρ<0,则表明两个变量是负相关,皮尔森相关系数的绝对值越大则表示该两个变量间的相关性程度越大。Among them, ρ t is the Pearson correlation coefficient between the t-th multi-source data and the statistical indicators to be spatialized at the administrative unit level, n is the total number of administrative units, and SI i is the statistical indicator of the i-th administrative unit. , is the average of the statistical indicators of all administrative units; DV i,t is the quantified value of the t-th multi-source data on the i-th administrative unit, is the average quantitative value of the t-th multi-source data in all administrative units. The value of ρ is between -1 and +1. If ρ>0, it indicates that the two variables are positively correlated, and ρ<0, it indicates that the two variables are negatively correlated. The greater the degree of correlation between the two variables.
然后,选择相关系数大于某个阈值T的数据作为最终的M类建模辅助数据,一般认为,皮尔森系数大于0.6的两变量具有强相关性,因此将皮尔森系数大于0.6的多源数据作为建模辅助数据。从而可以选取与待空间化统计指标具有较高相关性的数据作为建模辅助数据。Then, the data with the correlation coefficient greater than a certain threshold T is selected as the final M-type modeling auxiliary data. It is generally believed that the two variables with the Pearson coefficient greater than 0.6 have strong correlation, so the multi-source data with the Pearson coefficient greater than 0.6 are used as Modeling auxiliary data. Therefore, the data with high correlation with the statistical index to be spatialized can be selected as the modeling auxiliary data.
步骤S2:采用分级方法对各类建模辅助数据的格网统计值进行分级,通过分级评价指标确定每一类建模辅助数据的最优的分级结果。Step S2: using a grading method to classify the grid statistical values of various types of modeling auxiliary data, and determining the optimal classification result of each type of modeling auxiliary data through the classification evaluation index.
其中,步骤S2具体包括:Wherein, step S2 specifically includes:
步骤S2.1:统计所有N个格网单元上M类建模辅助数据的量化数值,则每一个格网单元对应M个统计值;Step S2.1: Count the quantified values of M types of modeling auxiliary data on all N grid units, then each grid unit corresponds to M statistical values;
步骤S2.2:对于第t类建模辅助数据的格网统计值用分级方法将格网划分到不同等级,对每次分级的结果采取预设评价指标进行度量以确定最优的分级结果。Step S2.2: Grid statistics for the t-th type of modeling auxiliary data The grid is divided into different grades by a grading method, and a preset evaluation index is used to measure the results of each classification to determine the optimal classification result.
具体来说,分级方法可以采用不同的方法。例如自然断点法,将格网划分到不同等级。设定分级数量为k∈[2,K],通过改变分级数量,对每次分级的结果采取合适的评价指标进行度量以确定最优的分级结果,如DBI指数:Specifically, the grading method can take different approaches. For example, the natural breakpoint method divides the grid into different levels. The number of classifications is set as k∈[2,K], and by changing the number of classifications, the results of each classification are measured by appropriate evaluation indicators to determine the optimal classification results, such as the DBI index:
其中,是第t类建模辅助数据分级数量为k时的戴维森堡丁指数,和分别是分级结果中第x个和第y个等级的等级内平均距离,σx和σy分别是第x和第y两个等级中心间的距离。戴维森堡丁指数取值范围为[0,+∞),DB越小意味着等级内距离越小,同时等级间距离越大。在划分等级后,选取戴维森堡丁指数最小的分级数量作为最优的分级数量。对所有M类数据重复步骤S2.2的步骤,为每一类建模辅助数据确定最优分级方案,从而得到最优分级数量Ct∈[2,K](t=1,2,…,M)。in, is the Davidson Pottinger index when the number of classifications of the t-th type of modeling auxiliary data is k, and are the average distances within the grades of the xth and yth grades, respectively, and σx and σy are the distances between the centers of the xth and yth grades, respectively. The value range of the Davidson Pottinger exponent is [0, +∞), and the smaller the DB is, the smaller the intra-level distance is, and the larger the inter-level distance is. After classifying the grades, the smallest number of grades with the Davidson Bodding Index is selected as the optimal number of grades. Repeat step S2.2 for all M types of data, and determine the optimal classification scheme for each type of modeling auxiliary data, so as to obtain the optimal classification number C t ∈ [2, K] (t=1,2,…, M).
步骤S3:在粗粒度行政单元尺度,统计每类建模辅助数据中各等级格网单元的数量占比,构建等级占比特征向量,并将等级占比特征向量输入回归模型进行训练,得到训练后的回归模型.Step S3: at the coarse-grained administrative unit scale, count the number and proportion of each level grid unit in each type of modeling auxiliary data, construct a level proportion feature vector, and input the level proportion feature vector into the regression model for training, and obtain the training The latter regression model.
其中,步骤S3具体包括:Wherein, step S3 specifically includes:
步骤S3.1:在粗粒度行政单元尺度,统计每类建模辅助数据中各等级格网单元的数量占比,构建等级占比特征向量βi,Step S3.1: At the coarse-grained administrative unit scale, count the number and proportion of each level grid unit in each type of modeling auxiliary data, and construct a level proportion feature vector β i ,
其中,Ni表示第i个行政单元的总格网数,表示属于第t类建模辅助数据第k个等级的格网数量,一个行政单元对应一个特征向量,特征向量中包含多种建模辅助数据的多个等级占比;Among them, Ni represents the total grid number of the ith administrative unit, Indicates the number of grids belonging to the k-th level of the t-th type of modeling auxiliary data, one administrative unit corresponds to one eigenvector, and the eigenvector contains the proportions of multiple levels of various modeling auxiliary data;
步骤S3.2:将一个行政单元作为一个样本,将行政单元的等级占比特征向量作为输入,待空间化统计指标作为输出,对随机森林回归模型进行训练,得到训练后的随机森林回归模型。Step S3.2: Take an administrative unit as a sample, take the rank proportion feature vector of the administrative unit as the input, and the spatialized statistical index as the output, train the random forest regression model, and obtain the trained random forest regression model.
具体来说,等级占比特征向量βi中表示第1类建模辅助数据第1个等级的格网数量的占比,表示第1类建模辅助数据第C1个等级的格网数量的占比,通过步骤S3.1可以得到所有行政单元的特征向量,然后将其作为样本,将构建的等级占比特征向量作为输入对随机森林回归模型进行训练。Specifically, in the rank proportion feature vector β i represents the proportion of the grid number of the first level of the first type of modeling auxiliary data, Indicates the proportion of the number of grids of the first level of the first-class modeling auxiliary data. Through step S3.1, the feature vectors of all administrative units can be obtained, and then they are used as samples, and the constructed feature vector of level proportion is used as Input to train a random forest regression model.
为了验证方法的有效性,所述方法还包括:In order to verify the effectiveness of the method, the method also includes:
在得到格网统计指标后,在模型训练尺度的下一级行政单元将得到的格网预测值进行汇总,与实际的行政单元统计指标值进行对比以验证精度。After the grid statistical indicators are obtained, the predicted grid values obtained are summarized in the next-level administrative unit of the model training scale, and compared with the actual statistical indicators of the administrative unit to verify the accuracy.
其中,验证精度的方法具体为:Among them, the method of verifying the accuracy is as follows:
衡量误差的指标是平均绝对误差MAE和均方根误差RMSE,计算公式如下:The indicators to measure the error are the mean absolute error MAE and the root mean square error RMSE, and the calculation formula is as follows:
其中,表示第i个下一级行政单元的预测统计指标值,表示第i个下一级行政单元的真实统计指标值,’表示下一级行政单元的数量。in, represents the predicted statistical index value of the i-th next-level administrative unit, Represents the true statistical indicator value of the i-th lower-level administrative unit, ' represents the number of lower-level administrative units.
S4:在细粒度格网单元尺度,按照各类辅助数据的最优的分级结果为各格网单元构建特征向量,并输入训练后的回归模型,得到各格网单元的统计指标权重。S4: At the fine-grained grid unit scale, construct feature vectors for each grid unit according to the optimal classification results of various auxiliary data, and input the trained regression model to obtain the statistical index weight of each grid unit.
具体来说,步骤S3中得到了训练好的回归模型,本步骤在细粒度格网单元尺度上,根据辅助数据的最优的分级结果为各格网单元构建特征向量,从而可以得到格网单元的统计指标权重。Specifically, the trained regression model is obtained in step S3. In this step, on the fine-grained grid unit scale, feature vectors are constructed for each grid unit according to the optimal classification result of the auxiliary data, so that the grid unit can be obtained. Statistical indicator weights.
在一种实施方式中,步骤S4具体包括:In one embodiment, step S4 specifically includes:
步骤S4.1:对于一个格网单元,按照最优的分级结果,确定该格网在各类建模辅助数据中的所属等级,根据格网在各类建模辅助数据中的所属等级构建格网单元的特征向量;Step S4.1: For a grid unit, according to the optimal classification result, determine the level of the grid in various types of auxiliary modeling data, and construct a grid according to the level of the grid in various auxiliary modeling data. the eigenvectors of the net elements;
步骤S4.2:将构建的所有格网单元的特征向量,输入训练后的随机森林回归模型,输出得到格网的统计指标权重。Step S4.2: Input the constructed feature vectors of all grid cells into the trained random forest regression model, and output the statistical index weights of the grid.
具体地,若格网属于第t类建模辅助数据第k个等级,就在第t类数据第k个等级对应的特征向量编码处将特征值赋为1,在第t类数据其他等级编码处赋为0,从而可以构建所有格网单元的特征向量。Specifically, if the grid belongs to the k-th level of the t-th type of modeling auxiliary data, the eigenvalue is assigned to 1 at the feature vector code corresponding to the k-th level of the t-th type of data, and the other levels of the t-th type of data are coded is set to 0, so that the eigenvectors of all grid cells can be constructed.
本发明在通过回归模型进行预测时,不是直接将预测值作为最终的统计值,而是将其作为统计值的权重,为下一步的指标空间转换提供基础,从而可以提高预测精度。以人口空间化为例,统计指标即指人口,需要将行政单元的统计人口最后转换成格网人口。模型的输出值可以看作是每个格网的人口权重大小,最后再在各行政单元内根据人口权重的大小将行政单元的统计人口按权重分配到格网上。之所以在预测阶段不直接将模型输出值当作格网人口值,是因为通过在行政单元内部按权重来分配人口可以保证在该行政单元级别格网人口总数没有误差,精度更高。When predicting through the regression model, the present invention does not directly take the predicted value as the final statistical value, but takes it as the weight of the statistical value, so as to provide a basis for the next index space conversion, thereby improving the prediction accuracy. Taking population spatialization as an example, the statistical index refers to the population, and the statistical population of the administrative unit needs to be finally converted into the grid population. The output value of the model can be regarded as the population weight of each grid, and finally the statistical population of the administrative unit is allocated to the grid according to the weight of the population in each administrative unit. The reason why the model output value is not directly regarded as the grid population value in the prediction stage is that by distributing the population according to the weight within the administrative unit, it can ensure that there is no error in the total grid population at the administrative unit level, and the accuracy is higher.
S5:对各行政单元包含的格网单元的统计指标权重进行归一化处理,将行政单元内待空间化的统计指标总值按权重分配到各个格网单元中,得到最终格网统计值。S5: Normalize the statistical index weights of the grid units included in each administrative unit, and distribute the total value of the statistical indicators to be spatialized in the administrative unit to each grid unit according to the weight, to obtain the final grid statistical value.
具体来说,在得到格网单元的统计指标权重后,对其进行归一化处理,然后就可以将这些统计指标按照权重分配到格网单元中,从而实现了粗粒度级(行政单元)到细粒度级(格网单元)的转换。Specifically, after obtaining the weights of the statistical indicators of the grid cells, they are normalized, and then these statistical indicators can be allocated to the grid cells according to the weights, thus realizing the coarse-grained level (administrative unit) to Fine-grained level (grid cell) conversion.
在一种实施方式中,步骤S5中将行政单元内待空间化的统计指标总值按权重分配到各个格网单元中,计算方法如下:In one embodiment, in step S5, the total value of the statistical index to be spatialized in the administrative unit is allocated to each grid unit according to the weight, and the calculation method is as follows:
其中,i代表第i个行政单元,j代表第j个格网,SIij表示第i个行政单元第j个格网的最终统计指标值,SIi代表第i个行政单元待空间化的统计指标总值,Wij、Wiu分别为第i个行政单元的第j个和第u个格网权重值,Ni表示第i个行政单元的格网总数。Among them, i represents the ith administrative unit, j represents the jth grid, SI ij represents the final statistical index value of the jth grid of the ith administrative unit, and SI i represents the statistics of the ith administrative unit to be spatialized The total value of the index, W ij and W iu are the j-th and u-th grid weights of the ith administrative unit, respectively, and Ni represents the total number of grids of the ith administrative unit.
其中,待空间化指标包括但不限于人口、粮食,将行政单元内待空间化的统计指标总值按权重分配到各个格网单元中,得到最终格网统计值,包括:Among them, the indicators to be spatialized include but are not limited to population and grain. The total value of the statistical indicators to be spatialized in the administrative unit is allocated to each grid unit according to the weight, and the final grid statistical value is obtained, including:
将行政单元的人口、粮食统计数值按权重分配到各个格网单元中,得到最终的格网人口数量和粮食产量。The population and grain statistics of the administrative unit are distributed to each grid unit according to the weight, and the final grid population and grain output are obtained.
请参见图2,为本发明提供的方法的技术路线图,首先是选取建模数据(包括行政单元尺度数据统计、相关性分析以及确定建模辅助数据),然后确定格网最优分级(格网单元尺度数据统计、格网分级、根据评价指标确定最优分级),接下来是特征建模与迁移(模型建模、模型预测以及格网统计指标权重),最后还进行格网分配与精度的检验(按权分配、格网统计指标以及精度检验)。Please refer to FIG. 2, which is a technical roadmap of the method provided by the present invention. First, select modeling data (including administrative unit scale data statistics, correlation analysis, and determine modeling auxiliary data), and then determine the optimal grid classification (grid Grid cell scale data statistics, grid grading, and optimal grading based on evaluation indicators), followed by feature modeling and migration (model modeling, model prediction, and grid statistical index weights), and finally grid allocation and accuracy. tests (distribution by weight, grid statistics, and accuracy tests).
图3示出了本发明方法的计算流程,在进行行政单元级别相关性分析时,可以通过计算皮尔森相关系数、斯皮尔曼相关性系数、肯德尔相关性系数来实现,格网分级时,可以采用等间隔分级、自然段点法分级或者等分位数分级。评价分级结果可以采用轮廓系数、DBI系数、手肘法等指标。Figure 3 shows the calculation flow of the method of the present invention. When performing the administrative unit level correlation analysis, it can be realized by calculating the Pearson correlation coefficient, the Spearman correlation coefficient, and the Kendall correlation coefficient. Equal interval grading, natural segment point method grading or equal quantile grading can be used. The evaluation and grading results can use the contour coefficient, DBI coefficient, elbow method and other indicators.
为了更清楚地说明本发明提供的方法的实现过程和有益效果,下面通过具体的示例来予以详细介绍。In order to more clearly illustrate the implementation process and beneficial effects of the method provided by the present invention, specific examples will be used below to describe in detail.
现有武汉街道行政区划数据、街道人口数据以及10类POI数据,需要根据POI数据将武汉市街道人口进行空间化,分配到200m的格网中去,得到更精细的人口空间分布。由于缺少真实格网人口训练数据,传统人口空间化建模方法中往往将从行政单元学习到的规律直接运用于格网,导致模型迁移过程存在降尺度的问题。The existing Wuhan street administrative division data, street population data and 10 types of POI data need to be spatialized according to the POI data and allocated to a 200m grid to obtain a more refined spatial distribution of the population. Due to the lack of real grid population training data, traditional population spatialization modeling methods often directly apply the rules learned from administrative units to the grid, resulting in downscaling problems in the model migration process.
本发明通过采用一种跨尺度人口空间化方法,根据统计信息将格网单元进行分级的方式,克服人口空间化传统方法在训练和预测的迁移过程中出现的特征跨尺度问题,实现更精细的人口空间化。By adopting a cross-scale population spatialization method and grading grid cells according to statistical information, the present invention overcomes the feature cross-scale problem that occurs in the migration process of training and prediction in the traditional population spatialization method, and realizes a more refined Population Spatialization.
下面将结合本发明中的附图,对本发明的算法过程进行详细阐述,具体步骤如下:The algorithm process of the present invention will be described in detail below in conjunction with the accompanying drawings in the present invention, and the specific steps are as follows:
1)统计武汉市所有街道的10类POI的数量以及人口的数量,分别计算10类POI与人口在街道级别的皮尔森系数,选择皮尔森系数大于0.6的4类POI作为最终的建模辅助数据;1) Count the number of 10 types of POIs and the number of population in all streets in Wuhan, calculate the Pearson coefficients of the 10 types of POIs and the population at the street level, and select 4 types of POIs with a Pearson coefficient greater than 0.6 as the final modeling auxiliary data ;
2)将4类POI点映射到每个格网单元当中,统计并记录每个格网单元的各类POI数量。利用自然断点法和DBI指数为每类POI确定最佳的格网分级数量,方法如下:2) Map the four types of POI points to each grid cell, and count and record the number of various POIs in each grid cell. Use the natural breakpoint method and DBI index to determine the optimal number of grid classifications for each type of POI as follows:
①将一类POI的所有格网数量统计值组织成一维向量,设定分级数量为[2,10],利用自然断点法对其进行分级;①Organize the statistical values of all grids of a class of POIs into a one-dimensional vector, set the number of classifications to [2,10], and use the natural breakpoint method to classify them;
②不断改变分级数量,重复步骤①,对每次分级结果计算DBI指数,选取DBI指数最小的分级结果作为最优的分级结果;②Continuously change the number of classifications, repeat
③对4类POI均进行①②步骤,为4类POI都确定最优分级数量和分级结果;③
3)在街道级别构建特征向量并进行训练,步骤如下:3) Construct feature vectors at street level and train them as follows:
①统计一个街道所包含的总格网数以及根据第2)步所确定的各辅助数据各等级的格网数量;① Count the total number of grids contained in a street and the number of grids at each level of each auxiliary data determined according to step 2);
②用各等级的格网数除以总格网数,得到4类POI各等级格网的数量占比,构建等级占比特征向量;② Divide the number of grids at each level by the total number of grids to obtain the proportion of the number of grids of each level of the four types of POI, and construct the feature vector of the level proportion;
③对武汉市所有街道进行①②步骤,得到所有街道的特征向量;③ Perform
④将各街道的特征向量和街道人口数量输入回归随机森林模型中进行训练。④ Input the feature vector of each street and the number of street population into the regression random forest model for training.
4)在格网级别构建特征向量并进行预测,步骤如下:4) Construct feature vectors at grid level and make predictions, the steps are as follows:
①对于一个格网单元,根据格网内各类POI的数量,格网属于哪一等级就在该等级对应的特征向量编码处将特征值赋为1,其余为0① For a grid unit, according to the number of various POIs in the grid, which level the grid belongs to, assign the eigenvalue to 1 at the eigenvector code corresponding to the level, and the rest are 0
②根据步骤①构建得到武汉市所有格网的特征向量;②Construct according to
③将格网的特征向量输入训练好的随机森林模型,输出得到各格网的人口权重;③ Input the feature vector of the grid into the trained random forest model, and output the population weight of each grid;
其中3)4)步的特征向量构建方法如图4所示;Wherein 3) the feature vector construction method of 4) step is as shown in Figure 4;
5)在所有街道内,将其包含的格网进行人口权重归一化,然后根据权重值将街道的人口分配到各个格网上去。5) In all streets, normalize the population weights of the grids it contains, and then assign the population of the streets to each grid according to the weights.
6)在社区级别,将其包含的格网人口进行汇总,与真实社区统计人口进行比较来衡量人口空间化的精度。6) At the community level, the grid population it contains is aggregated and compared with the real community statistical population to measure the accuracy of population spatialization.
本发明产生的有益效果是:本发明提出一种跨尺度的统计指标空间化方法,根据统计信息将格网单元进行分级,使粗粒度行政单元中的统计特征具有细粒度格网单元的属性,然后通过将行政单元建模得到的规律迁移到格网上来得到最终的空间化结果。The beneficial effects of the invention are as follows: the invention proposes a cross-scale statistical index spatialization method, which classifies the grid cells according to the statistical information, so that the statistical features in the coarse-grained administrative units have the attributes of the fine-grained grid cells, The final spatialization result is then obtained by transferring the rules obtained from the administrative unit modeling to the grid.
该方法通过相关性分析能够有效选出与待空间化统计指标空间分布模式较为相似的辅助数据,空间化精度较高;如图5所示,分别用三组相关性不同的POI作为建模辅助数据来进行人口空间化,结果显示,虽然在1000m格网单元大小下有反常情况出现,但空间化精度总体随着建模数据相关性的提高而提高。同时,通过基于格网单元属性等级分布特征来确定最优分级,相较任意分级等未顾及格网单元属性等级特征的方法具有更低的误差;如图6利用DBI指数来选取格网的分级数量,结果显示,没有绝对不变的最优分级,而通过合适的评价指标可以辅助确定较好的分级方案。最后,相比于传统方法的特征统计与模型迁移,本方法一定程度上克服了由于统计指标缺乏格网尺度训练数据,而直接将行政单元建模所学习到的规律迁移到格网而产生的降尺度问题;如图7所示,在1000m的格网单元大小,传统方法表现得比本方法要好,而在500m和200m格网大小,本方法要优于传统方法,并且随着格网越来越小,体现的优势越来越明显;如图8所示,以武汉市200m格网单元人口空间化结果对比为例,可以看出传统方法将处于武汉边缘的格网人口普遍高估,并且可视化效果易呈现斑块状,而本发明方法结果对两者皆有改善,从街道人口误差图来看,本发明方法相对传统方法在如图三个矩形框区域的预测精度明显提高;结果表明,本发明方法更适用于尺度较小的格网,在尺度跨度越大的统计指标空间化中相对于传统方法具有更大的优势。This method can effectively select auxiliary data that is similar to the spatial distribution pattern of the statistical indicators to be spatialized through correlation analysis, and the spatialization accuracy is high; as shown in Figure 5, three groups of POIs with different correlations are used as modeling aids. The results show that although there are anomalies under the 1000m grid cell size, the spatialization accuracy generally improves with the improvement of the correlation of the modeling data. At the same time, the optimal classification is determined based on the attribute level distribution characteristics of grid cells, which has lower errors compared with methods such as arbitrary classification that do not consider the attribute level characteristics of grid cells; as shown in Figure 6, the DBI index is used to select the classification of grid cells The results show that there is no absolutely constant optimal grading, and appropriate evaluation indicators can assist in determining a better grading scheme. Finally, compared with the feature statistics and model transfer of the traditional method, this method overcomes the problem of directly transferring the laws learned by administrative unit modeling to the grid due to the lack of grid-scale training data for statistical indicators to a certain extent. downscaling problem; as shown in Fig. 7, the traditional method outperforms the present method at a grid cell size of 1000m, while at 500m and 200m grid sizes, the present method outperforms the traditional method, and as the grid becomes larger As shown in Figure 8, taking the comparison of the spatialization results of the 200m grid cell population in Wuhan as an example, it can be seen that the traditional method generally overestimates the grid population at the edge of Wuhan. In addition, the visualization effect is easy to appear patchy, and the results of the method of the present invention improve both. From the street population error map, the method of the present invention has significantly improved the prediction accuracy in the three rectangular box areas as shown in the figure compared with the traditional method; the results It is shown that the method of the present invention is more suitable for grids with smaller scales, and has greater advantages over the traditional methods in the spatialization of statistical indicators with larger scale spans.
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。Although the preferred embodiments of the present invention have been described, additional changes and modifications to these embodiments may occur to those skilled in the art once the basic inventive concepts are known. Therefore, the appended claims are intended to be construed to include the preferred embodiment and all changes and modifications that fall within the scope of the present invention.
显然,本领域的技术人员可以对本发明实施例进行各种改动和变型而不脱离本发明实施例的精神和范围。这样,倘若本发明实施例的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the embodiments of the present invention without departing from the spirit and scope of the embodiments of the present invention. Thus, provided that these modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910854444.4A CN110689055B (en) | 2019-09-10 | 2019-09-10 | Cross-scale statistical index spatialization method considering grid unit attribute grading |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910854444.4A CN110689055B (en) | 2019-09-10 | 2019-09-10 | Cross-scale statistical index spatialization method considering grid unit attribute grading |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110689055A CN110689055A (en) | 2020-01-14 |
CN110689055B true CN110689055B (en) | 2022-07-19 |
Family
ID=69107960
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910854444.4A Active CN110689055B (en) | 2019-09-10 | 2019-09-10 | Cross-scale statistical index spatialization method considering grid unit attribute grading |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110689055B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112527867B (en) * | 2020-12-18 | 2023-10-13 | 重庆师范大学 | A method, storage device, and server for identifying non-agricultural job supply capabilities |
CN115272025A (en) * | 2021-04-30 | 2022-11-01 | 华为技术有限公司 | Method, device and storage medium for determining population distribution thermal data |
CN114331790B (en) * | 2022-03-09 | 2022-07-12 | 中国测绘科学研究院 | Grid processing method and system for incomplete edges of population data |
CN114912760B (en) * | 2022-04-14 | 2024-07-05 | 华南理工大学 | Method, system and medium for assigning take-out packaging garbage population weight |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103218517A (en) * | 2013-03-22 | 2013-07-24 | 南京信息工程大学 | GIS (Geographic Information System)-based region-meshed spatial population density computing method |
CN105740325A (en) * | 2016-01-20 | 2016-07-06 | 国家基础地理信息中心 | Trans-scale geographic information linkage updating technical method based on spatial automatic matching |
CN107092680A (en) * | 2017-04-21 | 2017-08-25 | 中国测绘科学研究院 | A kind of government information resources integration method based on geographic grid |
CN107730099A (en) * | 2017-09-30 | 2018-02-23 | 四川师范大学 | A kind of space planning method for establishing model |
CN108154193A (en) * | 2018-01-16 | 2018-06-12 | 黄河水利委员会黄河水利科学研究院 | A kind of long-term sequence precipitation data NO emissions reduction method |
CN109934617A (en) * | 2019-01-28 | 2019-06-25 | 浙江工业大学 | A hierarchical display system for the actual hinterland of a shopping center |
CN109978249A (en) * | 2019-03-19 | 2019-07-05 | 广州大学 | Population spatial distribution method, system and medium based on two-zone model |
-
2019
- 2019-09-10 CN CN201910854444.4A patent/CN110689055B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103218517A (en) * | 2013-03-22 | 2013-07-24 | 南京信息工程大学 | GIS (Geographic Information System)-based region-meshed spatial population density computing method |
CN105740325A (en) * | 2016-01-20 | 2016-07-06 | 国家基础地理信息中心 | Trans-scale geographic information linkage updating technical method based on spatial automatic matching |
CN107092680A (en) * | 2017-04-21 | 2017-08-25 | 中国测绘科学研究院 | A kind of government information resources integration method based on geographic grid |
CN107730099A (en) * | 2017-09-30 | 2018-02-23 | 四川师范大学 | A kind of space planning method for establishing model |
CN108154193A (en) * | 2018-01-16 | 2018-06-12 | 黄河水利委员会黄河水利科学研究院 | A kind of long-term sequence precipitation data NO emissions reduction method |
CN109934617A (en) * | 2019-01-28 | 2019-06-25 | 浙江工业大学 | A hierarchical display system for the actual hinterland of a shopping center |
CN109978249A (en) * | 2019-03-19 | 2019-07-05 | 广州大学 | Population spatial distribution method, system and medium based on two-zone model |
Non-Patent Citations (2)
Title |
---|
Mei Yang等.Population Spatialization in Gansu Province Based on RS and GIS.《2009 Joint Urban Remote Sensing Event》.2009, * |
王宇.中国化石能源碳排放统计数据跨尺度空间化方法研究.《中国优秀硕士学位论文全文数据库工程科技I辑》.2018, * |
Also Published As
Publication number | Publication date |
---|---|
CN110689055A (en) | 2020-01-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110689055B (en) | Cross-scale statistical index spatialization method considering grid unit attribute grading | |
CN110059385B (en) | A Grid Dynamics Scenario Simulation Method and Terminal Equipment for Coupled Allometric Growth | |
CN116337146A (en) | Ecological quality evaluation and partitioning method and device based on improved remote sensing ecological index | |
CN110176141B (en) | Traffic cell division method and system based on POI and traffic characteristics | |
CN111665575B (en) | Medium-and-long-term rainfall grading coupling forecasting method and system based on statistical power | |
CN106651036A (en) | Air quality forecasting system | |
CN110471131B (en) | High-spatial-resolution automatic prediction method and system for refined atmospheric horizontal visibility | |
CN108875242A (en) | A kind of urban cellular automata Scene Simulation method, terminal device and storage medium | |
CN112990976A (en) | Commercial network site selection method, system, equipment and medium based on open source data mining | |
CN111401692B (en) | Method for measuring urban space function compactness | |
CN110889196B (en) | Water environment bearing capacity assessment method and device based on water quality model and storage medium | |
CN113112068A (en) | Method and system for addressing and layout of public facilities in villages and small towns | |
CN111008870A (en) | Regional logistics demand prediction method based on PCA-BP neural network model | |
CN114723283A (en) | Ecological bearing capacity remote sensing evaluation method and device for urban group | |
CN114048920A (en) | Site selection layout method, device, equipment and storage medium for charging facility construction | |
CN107169878A (en) | A kind of method based on information independence collection space load basic data of increasing income | |
CN110738232A (en) | A method for diagnosing the causes of grid voltage over-limit based on data mining technology | |
CN107230350A (en) | A kind of urban transportation amount acquisition methods based on bayonet socket Yu mobile phone flow call bill data | |
CN117114176A (en) | Land utilization change prediction method and system based on data analysis and machine learning | |
CN111798032B (en) | Fine grid evaluation method for supporting dual evaluation of homeland space planning | |
Jiang et al. | Short-term pm2. 5 forecasting with a hybrid model based on ensemble gru neural network | |
CN109615119B (en) | Space load prediction method based on rank set pair analysis theory | |
CN110120154B (en) | Traffic road condition prediction method using detector data under large-scale road network | |
CN115187134A (en) | Grid-based power distribution network planning method and device and terminal equipment | |
CN115393148A (en) | Data monitoring system, monitoring method, device, medium and terminal for natural resources |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |