CN110689055B - Cross-scale statistical index spatialization method considering grid unit attribute grading - Google Patents
Cross-scale statistical index spatialization method considering grid unit attribute grading Download PDFInfo
- Publication number
- CN110689055B CN110689055B CN201910854444.4A CN201910854444A CN110689055B CN 110689055 B CN110689055 B CN 110689055B CN 201910854444 A CN201910854444 A CN 201910854444A CN 110689055 B CN110689055 B CN 110689055B
- Authority
- CN
- China
- Prior art keywords
- grid
- unit
- statistical
- auxiliary data
- scale
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 90
- 239000013598 vector Substances 0.000 claims abstract description 56
- 238000012549 training Methods 0.000 claims abstract description 22
- 238000011156 evaluation Methods 0.000 claims description 12
- 238000007637 random forest analysis Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 4
- 238000013508 migration Methods 0.000 description 7
- 230000005012 migration Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000007796 conventional method Methods 0.000 description 6
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000010219 correlation analysis Methods 0.000 description 4
- 230000009467 reduction Effects 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 235000015220 hamburgers Nutrition 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Educational Administration (AREA)
- Game Theory and Decision Science (AREA)
- Data Mining & Analysis (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Life Sciences & Earth Sciences (AREA)
- Primary Health Care (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a cross-scale statistical index spatialization method considering grid unit attribute grading, which comprises the steps of firstly analyzing the correlation between a statistical index to be spatialized and multi-source data in a coarse-grained administrative unit scale, and selecting data with higher correlation with the statistical index to be spatialized as modeling auxiliary data; then, classifying the grid statistical values of various modeling auxiliary data by adopting a classification method, and determining the optimal classification quantity of each type of modeling auxiliary data; secondly, constructing a grade proportion feature vector on a political unit scale, and inputting the grade proportion feature vector into a regression model for training; secondly, dividing the fine-grained grid unit scale into grid units according to the optimal grade of various auxiliary data to construct feature vectors, and inputting a regression model to obtain the statistical index weight of each grid unit; and finally, distributing the total value of the statistical indexes to be spatialized in the administrative unit to each grid unit according to the weight to obtain the final grid statistical value. The method of the invention can greatly improve the prediction precision.
Description
Technical Field
The invention relates to the field of geographic information science, including economic geography, demographic geography, environmental geography and the like, in particular to a cross-scale statistical index spatialization method considering grid unit attribute grading.
Background
Statistical index spatialization aims at reproducing the spatial distribution of a statistical index in a geographic grid or in other divisions (for example, in hexagons, buildings or communities, etc.), and generally converts the spatial expression of the statistical index from a coarse-grained administrative unit to a fine-grained geographic grid. The spatialization of the statistical indexes has been widely researched on data such as population, GDP (gross distribution product) and grain yield, and the method has important scientific significance and wide application prospect in the aspects of finely depicting the spatial distribution of the statistical indexes, assisting the reasonable allocation of resources, guiding government decision-making and the like.
The inventor of the present application finds that the method of the prior art has at least the following technical problems in the process of implementing the present invention:
because the statistical indexes lack grid scale training data, the traditional spatialization is generally realized by constructing an incidence relation between a modeling factor at an administrative unit level and the statistical indexes and transferring the rule from the administrative unit to the grid unit; however, the two have scale difference across orders of magnitude, which causes the scale reduction problem of model migration, thereby causing the low spatialization precision of statistical indexes.
Therefore, the method in the prior art has the technical problem of low precision.
Disclosure of Invention
In view of the above, the present invention provides a cross-scale statistical index spatialization method considering mesh unit attribute classification, so as to solve or at least partially solve the technical problem of low precision in the prior art.
In order to solve the technical problem, the invention provides a cross-scale statistical index spatialization method considering grid unit attribute grading, which comprises the following steps:
step S1: analyzing the correlation between the statistical index to be spatialized and the multi-source data in the coarse-grained administrative unit scale, and selecting data which has correlation meeting a preset degree with the statistical index to be spatialized from the multi-source data as modeling auxiliary data;
step S2: grading the grid statistic values of various modeling auxiliary data by adopting a grading method, and determining the optimal grading result of each type of modeling auxiliary data through grading evaluation indexes;
step S3: counting the number proportion of each grade grid unit in each type of modeling auxiliary data in the coarse-grained administrative unit scale, constructing a grade proportion feature vector, inputting the grade proportion feature vector into a regression model, and training to obtain a trained regression model;
s4: constructing a feature vector for each grid unit according to the optimal grading result of various auxiliary data in the unit scale of the fine-grained grid, and inputting the trained regression model to obtain the statistical index weight of each grid unit;
s5: and carrying out normalization processing on the statistical index weights of the grid units contained in each administrative unit, and distributing the total statistical index values to be spatialized in the administrative unit to each grid unit according to the weights to obtain the final grid statistical values.
In one embodiment, step S1 specifically includes:
step S1.1: counting m types of multi-source data attribute values which can be quantized in the grid unit scale on all n administrative units, wherein n and m are positive integers larger than 0;
step S1.2: calculating a Pearson correlation coefficient of the statistical index to be spatialized and the multi-source data attribute value at the level of the administrative unit;
step S1.3: and selecting the multi-source data with the correlation coefficient larger than a threshold value T as final M-type modeling auxiliary data, wherein M is a positive integer larger than 0.
In one embodiment, step S2 specifically includes:
step S2.1: counting the quantitative values of the M-type modeling auxiliary data on all N grid units, wherein each grid unit corresponds to M statistical values;
step S2.2: mesh statistics for class t modeling assistance dataAnd dividing the grid into different grades by using a grading method, and measuring the grading result each time by adopting a preset evaluation index to determine the optimal grading result.
In one embodiment, step S3 specifically includes:
step S3.1: counting the number proportion of each grade grid unit in each type of modeling auxiliary data in the coarse-grained administrative unit scale, and constructing a grade proportion feature vector betai,
Wherein, NiRepresents the total grid number of the ith administration unit,the grid quantity of the kth level of the t-th modeling auxiliary data is represented, one administrative unit corresponds to one feature vector, and the feature vector comprises a plurality of level ratios of a plurality of types of modeling auxiliary data;
step S3.2: and taking an administrative unit as a sample, taking the grade proportion characteristic vector of the administrative unit as input, taking the spatial statistical index as output, and training the random forest regression model to obtain the trained random forest regression model.
In one embodiment, step S4 specifically includes:
step S4.1: for a grid unit, determining the belonging grade of the grid in various modeling auxiliary data according to the optimal grading result, and constructing the feature vector of the grid unit according to the belonging grade of the grid in various modeling auxiliary data;
step S4.2: and inputting the constructed feature vectors of all grid units into the trained random forest regression model, and outputting to obtain the statistical index weight of the grid.
In one embodiment, in step S5, the total value of the statistical indexes to be spatialized in the administration unit is assigned to each grid cell according to the weight, and the calculation method is as follows:
wherein i represents the ith administrative unit, j represents the jth grid, and SIijThe final statistical index value, SI, of the jth grid of the ith administration unitiA total value of statistical indicators, W, representing the to-be-spatialized state of the ith administrative Unitij、WiuRespectively the jth and u grid weight values, N, of the ith administration unitiRepresenting the total number of grids for the ith administration.
In one embodiment, the index to be spatialized includes, but is not limited to, population and grain, and the total value of the statistical index to be spatialized in the administrative unit is assigned to each grid unit according to the weight to obtain the final grid statistical value, including:
and distributing the population and grain statistics of the administrative unit to each grid unit according to the weight to obtain the final grid population number and grain yield.
In one embodiment, the method further comprises:
after the grid statistical index is obtained, summarizing the obtained grid predicted value in the next-level administrative unit of the model training scale, and comparing the collected predicted value with the actual administrative unit statistical index value to verify the precision.
In one embodiment, the method for verifying the accuracy specifically includes:
the indexes for measuring the errors are an average absolute error MAE and a root mean square error RMSE, and the calculation formula is as follows:
wherein,a predicted statistical index value representing the ith next-level administration unit,and the real statistical index value of the ith next-level administration unit is represented, and the' number of the next-level administration units is represented.
One or more technical solutions in the embodiments of the present application at least have one or more of the following technical effects:
the invention provides a cross-scale statistical index spatialization method considering grid unit attribute grading, which comprises the steps of firstly analyzing the correlation between a statistical index to be spatialized and multi-source data (quantized in the grid unit scale) in a coarse-grained administrative unit scale (such as a district and a county), and selecting data with higher correlation with the statistical index to be spatialized as modeling auxiliary data; grading the grid statistical values of various types of modeling auxiliary data by adopting a grading method, and determining the optimal grading result of each type of modeling auxiliary data through grading evaluation indexes; then, on the scale of administrative units, counting the number proportion of each grade grid unit in each type of modeling auxiliary data, constructing a grade proportion feature vector and inputting the grade proportion feature vector into a regression model for training; then, on the scale of fine-grained grid unit, dividing each grid unit according to the optimal grade of each type of auxiliary data to construct feature vector, and inputting regression model to obtain feature vector
Statistical index weights of each grid unit; and then, taking each administrative unit as a unit, normalizing the weight of the statistical indexes of the grid units contained in the administrative unit, and distributing the total value of the statistical indexes to be spatialized in the administrative unit to each grid unit according to the weight to obtain the final grid statistical value.
Because the method provided by the invention unifies the regression modeling process of statistical index spatialization on the grid scale, the problem of cross-scale caused by the existing spatialization method that the statistical index lacks grid scale training data and is directly trained by coarse-grained administrative unit data and transferred to fine-grained grid units is avoided. The method considers the grade distribution characteristic of the grid unit scale attribute, so that the model migration is smooth; compared with the traditional spatialization method, the method has higher precision, and can provide a new solution for spatialization of statistical indexes based on multi-source data fusion under the background of big data, particularly spatialization of statistical indexes such as population, GDP, grain yield, weather and climate factor data and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a cross-scale statistical index spatialization method considering grid element attribute classification according to the present invention;
FIG. 2 is a technical roadmap for the method provided by the present invention;
FIG. 3 is a schematic diagram of the calculation process of the method of the present invention;
FIG. 4 is a flow chart of a cross-scale feature extraction method of the present invention;
FIG. 5 is a graph of the average absolute error contrast at street level for the Wuhan City population spatialization results based on POI data of high, medium and low relevance in a specific example (using three grid cell sizes of 1000 meters, 500 meters and 200 meters);
FIG. 6 is a graph of comparison of average absolute errors at street level for Wuhan city population spatialization results with DBI index determined for ranking and arbitrary ranking (three grid cell sizes of 1000m, 500m and 200m are used, the errors are arranged from low to high);
FIG. 7 is a comparison graph of the average absolute error at street level of the space results of the Wuhan population using the conventional method and the method of the present invention (using three grid cell sizes of 1000m, 500m and 200 m);
FIG. 8 is a graph comparing the results of the spatialization of the population in Wuhan city and the error distribution of the conventional method and the method of the present invention (200 m grid cell size).
Detailed Description
The invention aims to provide a cross-scale statistical index spatialization method considering grid unit attribute grading, aiming at the problem that the spatialization precision of statistical indexes is low due to scale reduction when statistical indexes lack grid scale training data and rules learned by administrative units are directly transferred to grids.
In order to achieve the above purpose, the main concept of the invention is as follows:
firstly, analyzing the correlation between the statistical indexes to be spatialized and multi-source data (which can be quantized in the grid unit scale) in the coarse-grained administrative unit scale (such as a district and a county), and selecting data with higher correlation with the statistical indexes to be spatialized as modeling auxiliary data; then, grading the grid statistical values of various types of modeling auxiliary data by adopting a grading method, and determining the optimal grading number of each type of modeling auxiliary data through grading evaluation indexes; secondly, on the scale of administrative units, counting the number proportion of each grade grid unit in each type of modeling auxiliary data, constructing a grade proportion feature vector and inputting the grade proportion feature vector into a regression model for training; then, on the scale of fine-grained grid units, dividing the fine-grained grid units into all grid units according to the optimal grade of each type of auxiliary data to construct a feature vector, inputting a regression model, and accordingly obtaining the statistical index weight of each grid unit; and then, taking each administrative unit as a unit, normalizing the weight of the statistical indexes of the grid units contained in the administrative unit, and distributing the total value of the statistical indexes to be spatialized in the administrative unit to each grid unit according to the weight to obtain the final grid statistical value. And summarizing grid predicted values by using a next-level administrative unit (such as a street) of an administrative unit scale in the model training stage, and comparing the grid predicted values with the real administrative unit statistical index values to verify the precision.
The invention unifies the regression modeling process of statistical index spatialization on grid scale, thereby avoiding the cross-scale problem caused by the existing spatialization method that the statistical index lacks grid scale training data and directly adopts coarse-grained administrative unit data to train and migrate to fine-grained grid units. The method considers the grade distribution characteristic of the grid unit scale attribute, so that the model migration is smooth; compared with the traditional spatialization method, the method has higher precision, and can provide a new solution for spatialization of statistical indexes based on multi-source data fusion under the background of big data, particularly spatialization of statistical indexes such as population, GDP, grain yield, weather and climate factor data and the like.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
The embodiment provides a cross-scale statistical index spatialization method considering grid unit attribute classification, please refer to fig. 1, the method includes:
step S1: and analyzing the correlation between the statistical index to be spatialized and the multi-source data in the coarse-grained administrative unit scale, and selecting the data which has the correlation meeting the preset degree with the statistical index to be spatialized from the multi-source data as modeling auxiliary data.
Specifically, the coarse-grained administrative unit scale may be at a district level, a county level, and the like, taking population space as an example, the multi-source data may include POI data, noctilucent remote sensing data, land utilization type data, DEM data, road network density data, and the like, and the multi-source data may be quantized in the grid unit scale. Different methods can be adopted for analyzing the correlation between the statistical indexes to be spatialized and the multi-source data, such as chart correlation analysis, covariance and covariance matrix, correlation coefficient and the like.
In one embodiment, step S1 specifically includes:
step S1.1: counting m types of multi-source data attribute values which can be quantized in the grid unit scale on all n administrative units, wherein n and m are positive integers larger than 0;
step S1.2: calculating a Pearson correlation coefficient of the statistical index to be spatialized and the multi-source data attribute value at the level of the administrative unit;
step S1.3: and selecting the multi-source data with the correlation coefficient larger than a threshold value T as final M-type modeling auxiliary data, wherein M is a positive integer larger than 0.
In the present embodiment, a correlation coefficient method is adopted. Specifically, a Pearson correlation coefficient of the statistical index to be spatialized and the multi-source data attribute value at the administrative unit level is calculated:
wherein ρtThe correlation coefficient of the t-type multi-source data and the statistical index to be spatialized in the level of the administrative unit is Pearson, n is the total amount of the administrative unit, and SI isiIs a statistical indicator (statistical indicator) of the ith administration unit,is the average value of all administrative unit statistical indexes; DV (distance vector)i,tIs the quantized value of the t-th class multi-source data on the ith administration unit,the average quantized value of the t-th multi-source data in all administration units is obtained. The value of rho is between-1 and +1 if rho>0, indicating that the two variables are positively correlated, p<0, indicating that the two variables are negatively correlated, and the larger the absolute value of the pearson correlation coefficient, the larger the degree of correlation between the two variables.
Then, data with a correlation coefficient larger than a certain threshold value T is selected as final M-type modeling auxiliary data, and it is generally considered that two variables with a Pearson coefficient larger than 0.6 have strong correlation, so that multi-source data with the Pearson coefficient larger than 0.6 is used as modeling auxiliary data. Therefore, data with high correlation with the statistical indexes to be spatialized can be selected as modeling auxiliary data.
Step S2: and classifying the grid statistical values of various types of modeling auxiliary data by adopting a classification method, and determining the optimal classification result of each type of modeling auxiliary data through classification evaluation indexes.
Wherein, step S2 specifically includes:
step S2.1: counting the quantitative values of the M-type modeling auxiliary data on all N grid units, wherein each grid unit corresponds to M statistical values;
step S2.2: mesh statistics for class t modeling assistance dataAnd dividing the grid into different grades by using a grading method, and measuring the grading result in each time by adopting a preset evaluation index to determine the optimal grading result.
In particular, the classification method may employ different methods. Such as natural breakpoint methods, divide the mesh into different levels. Setting the grading quantity as K belongs to [2, K ], and measuring the result of each grading by adopting a proper evaluation index to determine the optimal grading result, such as DBI index, by changing the grading quantity:
wherein,is the davison burger index for a classification number k of class t modeling assistance data,andwithin-class average distances, σ, of the x-th and y-th classes, respectively, in the classification resultxAnd σyThe distance between the centers of the x-th and y-th grades, respectively. The value range of the davison baud index is [0, + ∞ ]), and the smaller DB means the smaller the distance in the grade is, and the larger the distance between grades is. And after grading, selecting the grading number with the minimum Davignean burger index as the optimal grading number. Repeating the step S2.2 for all M types of data, and determining the optimal grading scheme for each type of modeling auxiliary data, thereby obtaining the optimal grading quantity Ct∈[2,K](t=1,2,…,M)。
Step S3: and on the scale of coarse-grained administrative units, counting the number proportion of each grade grid unit in each type of modeling auxiliary data, constructing a grade proportion feature vector, inputting the grade proportion feature vector into a regression model, and training to obtain the trained regression model.
Wherein, step S3 specifically includes:
step S3.1: in the coarse-grained administrative unit scale, counting the number proportion of each grade grid unit in each type of modeling auxiliary data, and constructing a grade proportion feature vector betai,
Wherein N isiRepresents the total grid number of the ith administration unit,the grid quantity of the kth level of the t-th modeling auxiliary data is represented, one administrative unit corresponds to one feature vector, and the feature vector comprises a plurality of level ratios of a plurality of types of modeling auxiliary data;
step S3.2: and taking an administrative unit as a sample, taking the grade proportion characteristic vector of the administrative unit as input, taking the spatial statistical index as output, and training the random forest regression model to obtain the trained random forest regression model.
Specifically, the rank-proportional feature vector βiInThe ratio of the number of grids representing the 1 st level of the type 1 modeling assistance data,representing type 1 modelling assistance data item C1The proportion of the number of grids of each level can obtain the feature vectors of all administrative units through step S3.1, then the feature vectors are used as samples, and the constructed level proportion feature vectors are used as input to train a random forest regression model.
To verify the validity of the method, the method further comprises:
after the grid statistical index is obtained, summarizing the obtained grid predicted value in the next-level administrative unit of the model training scale, and comparing the collected grid predicted value with the actual administrative unit statistical index value to verify the precision.
The method for verifying the precision specifically comprises the following steps:
the indexes for measuring the errors are an average absolute error MAE and a root mean square error RMSE, and the calculation formula is as follows:
wherein,a predicted statistical index value representing the ith next-level administration unit,and the real statistical index value of the ith next-level administration unit is represented, and the real statistical index value' represents the number of the next-level administration units.
S4: and constructing a feature vector for each grid unit according to the optimal grading result of various auxiliary data in the unit scale of the fine-grained grid, and inputting the trained regression model to obtain the statistical index weight of each grid unit.
Specifically, in step S3, a trained regression model is obtained, and in this step, a feature vector is constructed for each grid unit according to the optimal classification result of the auxiliary data on the scale of the fine-grained grid unit, so that the statistical index weight of the grid unit can be obtained.
In one embodiment, step S4 specifically includes:
step S4.1: for a grid unit, determining the belonging grade of the grid in various modeling auxiliary data according to the optimal grading result, and constructing the feature vector of the grid unit according to the belonging grade of the grid in various modeling auxiliary data;
step S4.2: and inputting the constructed feature vectors of all grid units into the trained random forest regression model, and outputting to obtain the statistical index weight of the grid.
Specifically, if the mesh belongs to the kth level of the t-th-class modeling auxiliary data, the feature value is assigned to 1 at the feature vector code corresponding to the kth level of the t-th-class data, and is assigned to 0 at other level codes of the t-th-class data, so that the feature vectors of all mesh units can be constructed.
When the prediction is carried out through the regression model, the prediction value is not directly used as the final statistical value, but is used as the weight of the statistical value, so that the basis is provided for the index space conversion of the next step, and the prediction precision can be improved. Taking the population space as an example, the statistical index refers to the population, and the statistical population of the administrative unit needs to be converted into the grid population. The output value of the model can be regarded as the population weight of each grid, and finally, the statistical population of the administration units is distributed to the grids according to the population weight in each administration unit. The reason why the model output value is not directly used as the grid population value in the prediction stage is that the population is distributed according to the weight in the administrative unit, so that the total number of the grid population at the administrative unit level has no error and higher precision.
S5: and carrying out normalization processing on the statistical index weights of the grid units contained in each administrative unit, and distributing the total statistical index values to be spatialized in the administrative unit to each grid unit according to the weights to obtain the final grid statistical values.
Specifically, after the weights of the statistical indexes of the grid units are obtained, normalization processing is performed on the statistical indexes, and then the statistical indexes can be distributed to the grid units according to the weights, so that conversion from coarse-grained level (administrative units) to fine-grained level (grid units) is achieved.
In one embodiment, in step S5, the total value of the statistical indexes to be spatialized in the administrative unit is assigned to each grid unit according to the weight, and the calculation method is as follows:
wherein i represents the ith administrative unit, j represents the jth grid, and SIijThe final statistical index value, SI, of the jth grid of the ith administration unitiA total value of statistical indicators, W, representing the to-be-spatialized state of the ith administrative Unitij、WiuRespectively the jth and u grid weight values, N, of the ith administration unitiRepresenting the total number of grids for the ith administration.
Wherein, the index to be spatialized includes but is not limited to population, grain, distribute the total value of statistical index to be spatialized in administrative unit to each grid unit according to weight, obtain final grid statistical value, include:
and distributing the population and grain statistics of the administrative unit to each grid unit according to the weight to obtain the final grid population number and grain yield.
Referring to fig. 2, a technical route diagram of the method provided by the present invention is firstly selecting modeling data (including administrative unit scale data statistics, correlation analysis, and determination of modeling auxiliary data), then determining an optimal classification of a grid (grid unit scale data statistics, grid classification, and determination of an optimal classification according to an evaluation index), then performing feature modeling and migration (model modeling, model prediction, and grid statistical index weight), and finally performing a check on grid allocation and accuracy (weight allocation, grid statistical index, and accuracy check).
Fig. 3 shows a calculation flow of the method of the present invention, which may be implemented by calculating a pearson correlation coefficient, a spearman correlation coefficient, and a kender correlation coefficient when performing administrative unit level correlation analysis, and may be implemented by using an equal interval classification, a natural segment point method classification, or an equal quantile classification when classifying a grid. The evaluation grading result can adopt indexes such as contour coefficient, DBI coefficient, elbow method and the like.
In order to more clearly illustrate the implementation and beneficial effects of the method provided by the invention, the following detailed description is given by specific examples.
The traditional Wuhan street administrative division data, street population data and 10 types of POI data need to spatialize the Wuhan street population according to the POI data and distribute the Wuhan street population to a 200m grid to obtain more detailed population space distribution. Due to the lack of training data of real grid population, the rule learned from administrative units is often directly applied to grids in the traditional population spatialization modeling method, so that the problem of scale reduction exists in the model migration process.
The invention adopts a cross-scale population spatialization method, and a mode of grading grid units according to statistical information, thereby overcoming the problem of cross-scale characteristics in the training and predicting migration process of the traditional population spatialization method and realizing more fine population spatialization.
The algorithm process of the present invention will be described in detail below with reference to the accompanying drawings, and the specific steps are as follows:
1) counting the number of 10 POIs and the number of population of all streets in Wuhan city, respectively calculating the 10 POIs and the Pearson coefficients of the population at the street level, and selecting 4 POIs with the Pearson coefficients larger than 0.6 as final modeling auxiliary data;
2) and mapping the 4 types of POI points to each grid cell, and counting and recording the number of the types of POI points of each grid cell. And determining the optimal grid grading number for each type of POI by using a natural breakpoint method and a DBI index, wherein the method comprises the following steps:
organizing the statistics of the quantity of all grids of a POI into a one-dimensional vector, setting the grading quantity as [2,10], and grading the POI by utilizing a natural breakpoint method;
continuously changing the grading number, repeating the step I, calculating the DBI index of each grading result, and selecting the grading result with the minimum DBI index as the optimal grading result;
thirdly, performing step two on the 4 types of POI to determine the optimal grading quantity and grading result for the 4 types of POI;
3) constructing a feature vector at a street level and training the feature vector, wherein the method comprises the following steps:
counting the total grid number contained in a street and the grid number of each level of each auxiliary data determined according to the step 2);
dividing the number of grids of each level by the total number of grids to obtain the number ratio of grids of each level of the 4 types of POI, and constructing a level ratio feature vector;
carrying out a step two on all streets in Wuhan city to obtain feature vectors of all the streets;
and fourthly, inputting the feature vector of each street and the population number of the street into a regression random forest model for training.
4) Constructing a feature vector at a grid level and predicting, wherein the method comprises the following steps:
for a grid unit, according to the number of various POIs in the grid, the level to which the grid belongs is assigned with a characteristic value of 1 at the characteristic vector coding position corresponding to the level, and the rest are 0
Secondly, constructing and obtaining feature vectors of all grids in Wuhan city according to the step I;
inputting the feature vectors of the grids into the trained random forest model, and outputting to obtain population weights of the grids;
wherein, the feature vector construction method of the step 3) and 4) is shown in FIG. 4;
5) and in all streets, carrying out population weight normalization on grids contained in the streets, and then distributing the population of the streets to each grid according to the weight values.
6) At the community level, the contained grid population is summarized and compared with the real community statistical population to measure the accuracy of population spatialization.
The invention has the following beneficial effects: the invention provides a cross-scale statistical index spatialization method, which is characterized in that grid units are graded according to statistical information, so that statistical characteristics in coarse-grained administrative units have the attributes of fine-grained grid units, and then final spatialization results are obtained by transferring rules obtained by modeling of the administrative units to the grid.
The method can effectively select auxiliary data which is similar to the spatial distribution mode of the statistical index to be spatialized through correlation analysis, and the spatialization precision is high; as shown in fig. 5, population spatialization is performed by using three groups of POIs with different correlations as modeling auxiliary data, and the result shows that although an abnormal condition occurs in the size of 1000m grid cell, the spatialization accuracy generally increases with the improvement of the correlation of the modeling data. Meanwhile, the optimal classification is determined based on the grid cell attribute grade distribution characteristics, and compared with methods such as any classification and the like which do not take the grid cell attribute grade characteristics into consideration, the method has lower error; as shown in fig. 6, the DBI index is used to select the number of grades of the grid, and the result shows that there is no absolutely invariable optimal grade, and a better grading scheme can be determined by an appropriate evaluation index. Finally, compared with the characteristic statistics and model migration of the traditional method, the method overcomes the scale reduction problem caused by directly migrating the rule learned by the administrative unit modeling to the grid due to the lack of grid scale training data of the statistical index to a certain extent; as shown in fig. 7, at a size of 1000m of the grid cells, the conventional method performs better than the method, while at sizes of 500m and 200m of the grid, the method is better than the conventional method, and as the grid becomes smaller, the embodied advantages become more and more obvious; as shown in fig. 8, taking the comparison of the spatialization results of 200m grid cell population in wuhan city as an example, it can be seen that the grid population at the edge of wuhan is generally overestimated by the conventional method, and the visualization effect is likely to present a patch shape, whereas the results of the method of the present invention are improved for both, and from the error map of the street population, the prediction accuracy of the method of the present invention is significantly improved in the three rectangular frame regions as shown in the figure, compared with the conventional method; the result shows that the method is more suitable for grids with smaller scale, and has greater advantages in the statistical index spatialization with larger scale span compared with the traditional method.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.
Claims (9)
1. A cross-scale statistical index spatialization method considering grid unit attribute classification is characterized by comprising the following steps:
step S1: analyzing the correlation between the statistical index to be spatialized and the multi-source data in the coarse-grained administrative unit scale, and selecting data which has correlation meeting a preset degree with the statistical index to be spatialized from the multi-source data as modeling auxiliary data;
step S2: grading the grid statistic values of various modeling auxiliary data by adopting a grading method, and determining the optimal grading result of each type of modeling auxiliary data through grading evaluation indexes;
step S3: counting the number proportion of each grade grid unit in each type of modeling auxiliary data in the coarse-grained administrative unit scale, constructing a grade proportion feature vector, inputting the grade proportion feature vector into a regression model, and training to obtain a trained regression model;
s4: constructing a feature vector for each grid unit according to the optimal grading result of various auxiliary data in the unit scale of the fine-grained grid, and inputting the trained regression model to obtain the statistical index weight of each grid unit;
s5: and carrying out normalization processing on the weights of the statistical indexes of the grid units contained in each administrative unit, and distributing the total value of the statistical indexes to be spatialized in the administrative unit to each grid unit according to the weights to obtain the final grid statistical value.
2. The method according to claim 1, wherein step S1 specifically includes:
step S1.1: counting m types of multi-source data attribute values which can be quantized in the grid unit scale on all n administrative units, wherein n and m are positive integers larger than 0;
step S1.2: calculating a Pearson correlation coefficient of the statistical index to be spatialized and the multi-source data attribute value at the level of the administrative unit;
step S1.3: and selecting the multi-source data with the correlation coefficient larger than a threshold value T as final M-type modeling auxiliary data, wherein M is a positive integer larger than 0.
3. The method according to claim 1, wherein step S2 specifically includes:
step S2.1: counting the quantitative values of the M-type modeling auxiliary data on all N grid units, wherein each grid unit corresponds to M statistical values;
4. The method according to claim 1, wherein step S3 specifically comprises:
step S3.1: in the coarse-grained administrative unit scale, counting the number proportion of each grade grid unit in each type of modeling auxiliary data, and constructing a grade proportion feature vector betai,
Wherein N isiRepresents the total grid number of the ith administration unit,the grid quantity of the kth level of the t-th modeling auxiliary data is represented, one administrative unit corresponds to one feature vector, and the feature vector comprises a plurality of level ratios of a plurality of types of modeling auxiliary data;
step S3.2: and taking an administrative unit as a sample, taking the grade proportion characteristic vector of the administrative unit as input, taking the statistical index to be spatialized as output, and training the random forest regression model to obtain the trained random forest regression model.
5. The method according to claim 1, wherein step S4 specifically comprises:
step S4.1: for a grid unit, determining the belonging grade of the grid in various modeling auxiliary data according to the optimal grading result, and constructing the feature vector of the grid unit according to the belonging grade of the grid in various modeling auxiliary data;
step S4.2: and inputting the constructed feature vectors of all grid units into the trained random forest regression model, and outputting to obtain the statistical index weight of the grid.
6. The method according to claim 1, wherein the total value of the statistical measures to be spatialized in the administration unit is assigned to each grid cell by weight in step S5, and the calculation method is as follows:
wherein i represents the ith administrative unit, j represents the jth grid, and SIijThe final statistical index value, SI, of the jth grid of the ith administration unitiA total statistical indicator W representing the i-th administrative unit to be spatializedij、WiuRespectively the jth and u grid weight values, N, of the ith administration unitiRepresenting the total number of grids for the ith administration.
7. The method of claim 1, wherein the index to be spatialized includes but is not limited to population and grain, and the step of assigning the total value of the statistical index to be spatialized in the administrative unit to each grid unit according to the weight to obtain the final grid statistical value comprises:
and distributing the population and grain statistics of the administrative unit to each grid unit according to the weight to obtain the final grid population and grain yield.
8. The method of claim 1, wherein the method further comprises:
after the grid statistical index is obtained, summarizing the obtained grid predicted value in the next-level administrative unit of the model training scale, and comparing the collected grid predicted value with the actual administrative unit statistical index value to verify the precision.
9. The method according to claim 8, wherein the method of verifying the accuracy is embodied as:
the indexes for measuring the errors are an average absolute error MAE and a root mean square error RMSE, and the calculation formula is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910854444.4A CN110689055B (en) | 2019-09-10 | 2019-09-10 | Cross-scale statistical index spatialization method considering grid unit attribute grading |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910854444.4A CN110689055B (en) | 2019-09-10 | 2019-09-10 | Cross-scale statistical index spatialization method considering grid unit attribute grading |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110689055A CN110689055A (en) | 2020-01-14 |
CN110689055B true CN110689055B (en) | 2022-07-19 |
Family
ID=69107960
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910854444.4A Active CN110689055B (en) | 2019-09-10 | 2019-09-10 | Cross-scale statistical index spatialization method considering grid unit attribute grading |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110689055B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112527867B (en) * | 2020-12-18 | 2023-10-13 | 重庆师范大学 | Non-agriculture employment post supply capability identification method, storage device and server |
CN115272025A (en) * | 2021-04-30 | 2022-11-01 | 华为技术有限公司 | Method, device and storage medium for determining population distribution thermodynamic data |
CN114331790B (en) * | 2022-03-09 | 2022-07-12 | 中国测绘科学研究院 | Grid processing method and system for incomplete edges of population data |
CN114912760B (en) * | 2022-04-14 | 2024-07-05 | 华南理工大学 | Method, system and medium for assigning take-out packaging garbage population weight |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103218517A (en) * | 2013-03-22 | 2013-07-24 | 南京信息工程大学 | GIS (Geographic Information System)-based region-meshed spatial population density computing method |
CN105740325A (en) * | 2016-01-20 | 2016-07-06 | 国家基础地理信息中心 | Trans-scale geographic information linkage updating technical method based on spatial automatic matching |
CN107092680A (en) * | 2017-04-21 | 2017-08-25 | 中国测绘科学研究院 | A kind of government information resources integration method based on geographic grid |
CN107730099A (en) * | 2017-09-30 | 2018-02-23 | 四川师范大学 | A kind of space planning method for establishing model |
CN108154193A (en) * | 2018-01-16 | 2018-06-12 | 黄河水利委员会黄河水利科学研究院 | A kind of long-term sequence precipitation data NO emissions reduction method |
CN109934617A (en) * | 2019-01-28 | 2019-06-25 | 浙江工业大学 | A kind of classification display system on the practical innerland in shopping center |
CN109978249A (en) * | 2019-03-19 | 2019-07-05 | 广州大学 | Population spatial distribution method, system and medium based on two-zone model |
-
2019
- 2019-09-10 CN CN201910854444.4A patent/CN110689055B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103218517A (en) * | 2013-03-22 | 2013-07-24 | 南京信息工程大学 | GIS (Geographic Information System)-based region-meshed spatial population density computing method |
CN105740325A (en) * | 2016-01-20 | 2016-07-06 | 国家基础地理信息中心 | Trans-scale geographic information linkage updating technical method based on spatial automatic matching |
CN107092680A (en) * | 2017-04-21 | 2017-08-25 | 中国测绘科学研究院 | A kind of government information resources integration method based on geographic grid |
CN107730099A (en) * | 2017-09-30 | 2018-02-23 | 四川师范大学 | A kind of space planning method for establishing model |
CN108154193A (en) * | 2018-01-16 | 2018-06-12 | 黄河水利委员会黄河水利科学研究院 | A kind of long-term sequence precipitation data NO emissions reduction method |
CN109934617A (en) * | 2019-01-28 | 2019-06-25 | 浙江工业大学 | A kind of classification display system on the practical innerland in shopping center |
CN109978249A (en) * | 2019-03-19 | 2019-07-05 | 广州大学 | Population spatial distribution method, system and medium based on two-zone model |
Non-Patent Citations (2)
Title |
---|
Mei Yang等.Population Spatialization in Gansu Province Based on RS and GIS.《2009 Joint Urban Remote Sensing Event》.2009, * |
王宇.中国化石能源碳排放统计数据跨尺度空间化方法研究.《中国优秀硕士学位论文全文数据库工程科技I辑》.2018, * |
Also Published As
Publication number | Publication date |
---|---|
CN110689055A (en) | 2020-01-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110689055B (en) | Cross-scale statistical index spatialization method considering grid unit attribute grading | |
CN106844781B (en) | Data processing method and device | |
Biard et al. | Automated detection of weather fronts using a deep learning neural network | |
CN107679734A (en) | It is a kind of to be used for the method and system without label data classification prediction | |
CN113361742A (en) | Hydrologic simulation-based regional comprehensive drought identification method | |
CN112990976A (en) | Commercial network site selection method, system, equipment and medium based on open source data mining | |
CN114048920A (en) | Site selection layout method, device, equipment and storage medium for charging facility construction | |
CN113177857A (en) | Resource elastic allocation method based on typhoon disaster estimation | |
CN112668238A (en) | Rainfall processing method, device, equipment and storage medium | |
CN116933946A (en) | Rail transit OD passenger flow prediction method and system based on passenger flow destination structure | |
CN113808015B (en) | Spatial interpolation method, device and equipment for meteorological parameters of power transmission line region | |
CN112200363B (en) | Landslide prediction method, landslide prediction device, landslide prediction equipment and storage medium | |
CN117455551A (en) | Industry electricity consumption prediction method based on industry relation complex network | |
CN116258279B (en) | Landslide vulnerability evaluation method and device based on comprehensive weighting | |
CN117036781A (en) | Image classification method based on tree comprehensive diversity depth forests | |
CN115270904B (en) | Method and system for spatialization of proper-age permanent population in compulsory education stage | |
CN115906669A (en) | Dense residual error network landslide susceptibility evaluation method considering negative sample selection strategy | |
CN112860824B (en) | Scale adaptability evaluation method for high-resolution DEM terrain feature extraction | |
CN114880954A (en) | Landslide sensitivity evaluation method based on machine learning | |
CN114784795A (en) | Wind power prediction method and device, electronic equipment and storage medium | |
CN113689048A (en) | Method, system and computer-readable storage medium for predicting refined spatial distribution of future population | |
CN114153899A (en) | Method and electronic equipment for acquiring employment posts of unit building area of different land types | |
CN114118512A (en) | Grouping public medical facility configuration method based on dynamic population data model | |
Barna et al. | Regional index flood estimation at multiple durations with generalized additive models | |
Hasanudin et al. | a Comparative Study of Iconnet Jabodetabek and Banten Using Linear Regression and Support Vector Regression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |