CN110689055B - Cross-scale statistical index spatialization method considering grid unit attribute grading - Google Patents

Cross-scale statistical index spatialization method considering grid unit attribute grading Download PDF

Info

Publication number
CN110689055B
CN110689055B CN201910854444.4A CN201910854444A CN110689055B CN 110689055 B CN110689055 B CN 110689055B CN 201910854444 A CN201910854444 A CN 201910854444A CN 110689055 B CN110689055 B CN 110689055B
Authority
CN
China
Prior art keywords
grid
unit
statistical
auxiliary data
scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910854444.4A
Other languages
Chinese (zh)
Other versions
CN110689055A (en
Inventor
桂志鹏
梅宇翱
吴京航
刘正廉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201910854444.4A priority Critical patent/CN110689055B/en
Publication of CN110689055A publication Critical patent/CN110689055A/en
Application granted granted Critical
Publication of CN110689055B publication Critical patent/CN110689055B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a cross-scale statistical index spatialization method considering grid unit attribute grading, which comprises the steps of firstly analyzing the correlation between a statistical index to be spatialized and multi-source data in a coarse-grained administrative unit scale, and selecting data with higher correlation with the statistical index to be spatialized as modeling auxiliary data; then, classifying the grid statistical values of various modeling auxiliary data by adopting a classification method, and determining the optimal classification quantity of each type of modeling auxiliary data; secondly, constructing a grade proportion feature vector on a political unit scale, and inputting the grade proportion feature vector into a regression model for training; secondly, dividing the fine-grained grid unit scale into grid units according to the optimal grade of various auxiliary data to construct feature vectors, and inputting a regression model to obtain the statistical index weight of each grid unit; and finally, distributing the total value of the statistical indexes to be spatialized in the administrative unit to each grid unit according to the weight to obtain the final grid statistical value. The method of the invention can greatly improve the prediction precision.

Description

Cross-scale statistical index spatialization method considering grid unit attribute grading
Technical Field
The invention relates to the field of geographic information science, including economic geography, demographic geography, environmental geography and the like, in particular to a cross-scale statistical index spatialization method considering grid unit attribute grading.
Background
Statistical index spatialization aims at reproducing the spatial distribution of a statistical index in a geographic grid or in other divisions (for example, in hexagons, buildings or communities, etc.), and generally converts the spatial expression of the statistical index from a coarse-grained administrative unit to a fine-grained geographic grid. The spatialization of the statistical indexes has been widely researched on data such as population, GDP (gross distribution product) and grain yield, and the method has important scientific significance and wide application prospect in the aspects of finely depicting the spatial distribution of the statistical indexes, assisting the reasonable allocation of resources, guiding government decision-making and the like.
The inventor of the present application finds that the method of the prior art has at least the following technical problems in the process of implementing the present invention:
because the statistical indexes lack grid scale training data, the traditional spatialization is generally realized by constructing an incidence relation between a modeling factor at an administrative unit level and the statistical indexes and transferring the rule from the administrative unit to the grid unit; however, the two have scale difference across orders of magnitude, which causes the scale reduction problem of model migration, thereby causing the low spatialization precision of statistical indexes.
Therefore, the method in the prior art has the technical problem of low precision.
Disclosure of Invention
In view of the above, the present invention provides a cross-scale statistical index spatialization method considering mesh unit attribute classification, so as to solve or at least partially solve the technical problem of low precision in the prior art.
In order to solve the technical problem, the invention provides a cross-scale statistical index spatialization method considering grid unit attribute grading, which comprises the following steps:
step S1: analyzing the correlation between the statistical index to be spatialized and the multi-source data in the coarse-grained administrative unit scale, and selecting data which has correlation meeting a preset degree with the statistical index to be spatialized from the multi-source data as modeling auxiliary data;
step S2: grading the grid statistic values of various modeling auxiliary data by adopting a grading method, and determining the optimal grading result of each type of modeling auxiliary data through grading evaluation indexes;
step S3: counting the number proportion of each grade grid unit in each type of modeling auxiliary data in the coarse-grained administrative unit scale, constructing a grade proportion feature vector, inputting the grade proportion feature vector into a regression model, and training to obtain a trained regression model;
s4: constructing a feature vector for each grid unit according to the optimal grading result of various auxiliary data in the unit scale of the fine-grained grid, and inputting the trained regression model to obtain the statistical index weight of each grid unit;
s5: and carrying out normalization processing on the statistical index weights of the grid units contained in each administrative unit, and distributing the total statistical index values to be spatialized in the administrative unit to each grid unit according to the weights to obtain the final grid statistical values.
In one embodiment, step S1 specifically includes:
step S1.1: counting m types of multi-source data attribute values which can be quantized in the grid unit scale on all n administrative units, wherein n and m are positive integers larger than 0;
step S1.2: calculating a Pearson correlation coefficient of the statistical index to be spatialized and the multi-source data attribute value at the level of the administrative unit;
step S1.3: and selecting the multi-source data with the correlation coefficient larger than a threshold value T as final M-type modeling auxiliary data, wherein M is a positive integer larger than 0.
In one embodiment, step S2 specifically includes:
step S2.1: counting the quantitative values of the M-type modeling auxiliary data on all N grid units, wherein each grid unit corresponds to M statistical values;
step S2.2: mesh statistics for class t modeling assistance data
Figure GDA0003670584780000021
And dividing the grid into different grades by using a grading method, and measuring the grading result each time by adopting a preset evaluation index to determine the optimal grading result.
In one embodiment, step S3 specifically includes:
step S3.1: counting the number proportion of each grade grid unit in each type of modeling auxiliary data in the coarse-grained administrative unit scale, and constructing a grade proportion feature vector betai
Figure GDA0003670584780000022
Wherein, NiRepresents the total grid number of the ith administration unit,
Figure GDA0003670584780000023
the grid quantity of the kth level of the t-th modeling auxiliary data is represented, one administrative unit corresponds to one feature vector, and the feature vector comprises a plurality of level ratios of a plurality of types of modeling auxiliary data;
step S3.2: and taking an administrative unit as a sample, taking the grade proportion characteristic vector of the administrative unit as input, taking the spatial statistical index as output, and training the random forest regression model to obtain the trained random forest regression model.
In one embodiment, step S4 specifically includes:
step S4.1: for a grid unit, determining the belonging grade of the grid in various modeling auxiliary data according to the optimal grading result, and constructing the feature vector of the grid unit according to the belonging grade of the grid in various modeling auxiliary data;
step S4.2: and inputting the constructed feature vectors of all grid units into the trained random forest regression model, and outputting to obtain the statistical index weight of the grid.
In one embodiment, in step S5, the total value of the statistical indexes to be spatialized in the administration unit is assigned to each grid cell according to the weight, and the calculation method is as follows:
Figure GDA0003670584780000031
wherein i represents the ith administrative unit, j represents the jth grid, and SIijThe final statistical index value, SI, of the jth grid of the ith administration unitiA total value of statistical indicators, W, representing the to-be-spatialized state of the ith administrative Unitij、WiuRespectively the jth and u grid weight values, N, of the ith administration unitiRepresenting the total number of grids for the ith administration.
In one embodiment, the index to be spatialized includes, but is not limited to, population and grain, and the total value of the statistical index to be spatialized in the administrative unit is assigned to each grid unit according to the weight to obtain the final grid statistical value, including:
and distributing the population and grain statistics of the administrative unit to each grid unit according to the weight to obtain the final grid population number and grain yield.
In one embodiment, the method further comprises:
after the grid statistical index is obtained, summarizing the obtained grid predicted value in the next-level administrative unit of the model training scale, and comparing the collected predicted value with the actual administrative unit statistical index value to verify the precision.
In one embodiment, the method for verifying the accuracy specifically includes:
the indexes for measuring the errors are an average absolute error MAE and a root mean square error RMSE, and the calculation formula is as follows:
Figure GDA0003670584780000032
Figure GDA0003670584780000033
wherein,
Figure GDA0003670584780000034
a predicted statistical index value representing the ith next-level administration unit,
Figure GDA0003670584780000035
and the real statistical index value of the ith next-level administration unit is represented, and the' number of the next-level administration units is represented.
One or more technical solutions in the embodiments of the present application at least have one or more of the following technical effects:
the invention provides a cross-scale statistical index spatialization method considering grid unit attribute grading, which comprises the steps of firstly analyzing the correlation between a statistical index to be spatialized and multi-source data (quantized in the grid unit scale) in a coarse-grained administrative unit scale (such as a district and a county), and selecting data with higher correlation with the statistical index to be spatialized as modeling auxiliary data; grading the grid statistical values of various types of modeling auxiliary data by adopting a grading method, and determining the optimal grading result of each type of modeling auxiliary data through grading evaluation indexes; then, on the scale of administrative units, counting the number proportion of each grade grid unit in each type of modeling auxiliary data, constructing a grade proportion feature vector and inputting the grade proportion feature vector into a regression model for training; then, on the scale of fine-grained grid unit, dividing each grid unit according to the optimal grade of each type of auxiliary data to construct feature vector, and inputting regression model to obtain feature vector
Statistical index weights of each grid unit; and then, taking each administrative unit as a unit, normalizing the weight of the statistical indexes of the grid units contained in the administrative unit, and distributing the total value of the statistical indexes to be spatialized in the administrative unit to each grid unit according to the weight to obtain the final grid statistical value.
Because the method provided by the invention unifies the regression modeling process of statistical index spatialization on the grid scale, the problem of cross-scale caused by the existing spatialization method that the statistical index lacks grid scale training data and is directly trained by coarse-grained administrative unit data and transferred to fine-grained grid units is avoided. The method considers the grade distribution characteristic of the grid unit scale attribute, so that the model migration is smooth; compared with the traditional spatialization method, the method has higher precision, and can provide a new solution for spatialization of statistical indexes based on multi-source data fusion under the background of big data, particularly spatialization of statistical indexes such as population, GDP, grain yield, weather and climate factor data and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a cross-scale statistical index spatialization method considering grid element attribute classification according to the present invention;
FIG. 2 is a technical roadmap for the method provided by the present invention;
FIG. 3 is a schematic diagram of the calculation process of the method of the present invention;
FIG. 4 is a flow chart of a cross-scale feature extraction method of the present invention;
FIG. 5 is a graph of the average absolute error contrast at street level for the Wuhan City population spatialization results based on POI data of high, medium and low relevance in a specific example (using three grid cell sizes of 1000 meters, 500 meters and 200 meters);
FIG. 6 is a graph of comparison of average absolute errors at street level for Wuhan city population spatialization results with DBI index determined for ranking and arbitrary ranking (three grid cell sizes of 1000m, 500m and 200m are used, the errors are arranged from low to high);
FIG. 7 is a comparison graph of the average absolute error at street level of the space results of the Wuhan population using the conventional method and the method of the present invention (using three grid cell sizes of 1000m, 500m and 200 m);
FIG. 8 is a graph comparing the results of the spatialization of the population in Wuhan city and the error distribution of the conventional method and the method of the present invention (200 m grid cell size).
Detailed Description
The invention aims to provide a cross-scale statistical index spatialization method considering grid unit attribute grading, aiming at the problem that the spatialization precision of statistical indexes is low due to scale reduction when statistical indexes lack grid scale training data and rules learned by administrative units are directly transferred to grids.
In order to achieve the above purpose, the main concept of the invention is as follows:
firstly, analyzing the correlation between the statistical indexes to be spatialized and multi-source data (which can be quantized in the grid unit scale) in the coarse-grained administrative unit scale (such as a district and a county), and selecting data with higher correlation with the statistical indexes to be spatialized as modeling auxiliary data; then, grading the grid statistical values of various types of modeling auxiliary data by adopting a grading method, and determining the optimal grading number of each type of modeling auxiliary data through grading evaluation indexes; secondly, on the scale of administrative units, counting the number proportion of each grade grid unit in each type of modeling auxiliary data, constructing a grade proportion feature vector and inputting the grade proportion feature vector into a regression model for training; then, on the scale of fine-grained grid units, dividing the fine-grained grid units into all grid units according to the optimal grade of each type of auxiliary data to construct a feature vector, inputting a regression model, and accordingly obtaining the statistical index weight of each grid unit; and then, taking each administrative unit as a unit, normalizing the weight of the statistical indexes of the grid units contained in the administrative unit, and distributing the total value of the statistical indexes to be spatialized in the administrative unit to each grid unit according to the weight to obtain the final grid statistical value. And summarizing grid predicted values by using a next-level administrative unit (such as a street) of an administrative unit scale in the model training stage, and comparing the grid predicted values with the real administrative unit statistical index values to verify the precision.
The invention unifies the regression modeling process of statistical index spatialization on grid scale, thereby avoiding the cross-scale problem caused by the existing spatialization method that the statistical index lacks grid scale training data and directly adopts coarse-grained administrative unit data to train and migrate to fine-grained grid units. The method considers the grade distribution characteristic of the grid unit scale attribute, so that the model migration is smooth; compared with the traditional spatialization method, the method has higher precision, and can provide a new solution for spatialization of statistical indexes based on multi-source data fusion under the background of big data, particularly spatialization of statistical indexes such as population, GDP, grain yield, weather and climate factor data and the like.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
The embodiment provides a cross-scale statistical index spatialization method considering grid unit attribute classification, please refer to fig. 1, the method includes:
step S1: and analyzing the correlation between the statistical index to be spatialized and the multi-source data in the coarse-grained administrative unit scale, and selecting the data which has the correlation meeting the preset degree with the statistical index to be spatialized from the multi-source data as modeling auxiliary data.
Specifically, the coarse-grained administrative unit scale may be at a district level, a county level, and the like, taking population space as an example, the multi-source data may include POI data, noctilucent remote sensing data, land utilization type data, DEM data, road network density data, and the like, and the multi-source data may be quantized in the grid unit scale. Different methods can be adopted for analyzing the correlation between the statistical indexes to be spatialized and the multi-source data, such as chart correlation analysis, covariance and covariance matrix, correlation coefficient and the like.
In one embodiment, step S1 specifically includes:
step S1.1: counting m types of multi-source data attribute values which can be quantized in the grid unit scale on all n administrative units, wherein n and m are positive integers larger than 0;
step S1.2: calculating a Pearson correlation coefficient of the statistical index to be spatialized and the multi-source data attribute value at the level of the administrative unit;
step S1.3: and selecting the multi-source data with the correlation coefficient larger than a threshold value T as final M-type modeling auxiliary data, wherein M is a positive integer larger than 0.
In the present embodiment, a correlation coefficient method is adopted. Specifically, a Pearson correlation coefficient of the statistical index to be spatialized and the multi-source data attribute value at the administrative unit level is calculated:
Figure GDA0003670584780000061
wherein ρtThe correlation coefficient of the t-type multi-source data and the statistical index to be spatialized in the level of the administrative unit is Pearson, n is the total amount of the administrative unit, and SI isiIs a statistical indicator (statistical indicator) of the ith administration unit,
Figure GDA0003670584780000062
is the average value of all administrative unit statistical indexes; DV (distance vector)i,tIs the quantized value of the t-th class multi-source data on the ith administration unit,
Figure GDA0003670584780000063
the average quantized value of the t-th multi-source data in all administration units is obtained. The value of rho is between-1 and +1 if rho>0, indicating that the two variables are positively correlated, p<0, indicating that the two variables are negatively correlated, and the larger the absolute value of the pearson correlation coefficient, the larger the degree of correlation between the two variables.
Then, data with a correlation coefficient larger than a certain threshold value T is selected as final M-type modeling auxiliary data, and it is generally considered that two variables with a Pearson coefficient larger than 0.6 have strong correlation, so that multi-source data with the Pearson coefficient larger than 0.6 is used as modeling auxiliary data. Therefore, data with high correlation with the statistical indexes to be spatialized can be selected as modeling auxiliary data.
Step S2: and classifying the grid statistical values of various types of modeling auxiliary data by adopting a classification method, and determining the optimal classification result of each type of modeling auxiliary data through classification evaluation indexes.
Wherein, step S2 specifically includes:
step S2.1: counting the quantitative values of the M-type modeling auxiliary data on all N grid units, wherein each grid unit corresponds to M statistical values;
step S2.2: mesh statistics for class t modeling assistance data
Figure GDA0003670584780000064
And dividing the grid into different grades by using a grading method, and measuring the grading result in each time by adopting a preset evaluation index to determine the optimal grading result.
In particular, the classification method may employ different methods. Such as natural breakpoint methods, divide the mesh into different levels. Setting the grading quantity as K belongs to [2, K ], and measuring the result of each grading by adopting a proper evaluation index to determine the optimal grading result, such as DBI index, by changing the grading quantity:
Figure GDA0003670584780000071
wherein,
Figure GDA0003670584780000072
is the davison burger index for a classification number k of class t modeling assistance data,
Figure GDA0003670584780000073
and
Figure GDA0003670584780000074
within-class average distances, σ, of the x-th and y-th classes, respectively, in the classification resultxAnd σyThe distance between the centers of the x-th and y-th grades, respectively. The value range of the davison baud index is [0, + ∞ ]), and the smaller DB means the smaller the distance in the grade is, and the larger the distance between grades is. And after grading, selecting the grading number with the minimum Davignean burger index as the optimal grading number. Repeating the step S2.2 for all M types of data, and determining the optimal grading scheme for each type of modeling auxiliary data, thereby obtaining the optimal grading quantity Ct∈[2,K](t=1,2,…,M)。
Step S3: and on the scale of coarse-grained administrative units, counting the number proportion of each grade grid unit in each type of modeling auxiliary data, constructing a grade proportion feature vector, inputting the grade proportion feature vector into a regression model, and training to obtain the trained regression model.
Wherein, step S3 specifically includes:
step S3.1: in the coarse-grained administrative unit scale, counting the number proportion of each grade grid unit in each type of modeling auxiliary data, and constructing a grade proportion feature vector betai
Figure GDA0003670584780000075
Wherein N isiRepresents the total grid number of the ith administration unit,
Figure GDA0003670584780000076
the grid quantity of the kth level of the t-th modeling auxiliary data is represented, one administrative unit corresponds to one feature vector, and the feature vector comprises a plurality of level ratios of a plurality of types of modeling auxiliary data;
step S3.2: and taking an administrative unit as a sample, taking the grade proportion characteristic vector of the administrative unit as input, taking the spatial statistical index as output, and training the random forest regression model to obtain the trained random forest regression model.
Specifically, the rank-proportional feature vector βiIn
Figure GDA0003670584780000077
The ratio of the number of grids representing the 1 st level of the type 1 modeling assistance data,
Figure GDA0003670584780000078
representing type 1 modelling assistance data item C1The proportion of the number of grids of each level can obtain the feature vectors of all administrative units through step S3.1, then the feature vectors are used as samples, and the constructed level proportion feature vectors are used as input to train a random forest regression model.
To verify the validity of the method, the method further comprises:
after the grid statistical index is obtained, summarizing the obtained grid predicted value in the next-level administrative unit of the model training scale, and comparing the collected grid predicted value with the actual administrative unit statistical index value to verify the precision.
The method for verifying the precision specifically comprises the following steps:
the indexes for measuring the errors are an average absolute error MAE and a root mean square error RMSE, and the calculation formula is as follows:
Figure GDA0003670584780000081
Figure GDA0003670584780000082
wherein,
Figure GDA0003670584780000083
a predicted statistical index value representing the ith next-level administration unit,
Figure GDA0003670584780000084
and the real statistical index value of the ith next-level administration unit is represented, and the real statistical index value' represents the number of the next-level administration units.
S4: and constructing a feature vector for each grid unit according to the optimal grading result of various auxiliary data in the unit scale of the fine-grained grid, and inputting the trained regression model to obtain the statistical index weight of each grid unit.
Specifically, in step S3, a trained regression model is obtained, and in this step, a feature vector is constructed for each grid unit according to the optimal classification result of the auxiliary data on the scale of the fine-grained grid unit, so that the statistical index weight of the grid unit can be obtained.
In one embodiment, step S4 specifically includes:
step S4.1: for a grid unit, determining the belonging grade of the grid in various modeling auxiliary data according to the optimal grading result, and constructing the feature vector of the grid unit according to the belonging grade of the grid in various modeling auxiliary data;
step S4.2: and inputting the constructed feature vectors of all grid units into the trained random forest regression model, and outputting to obtain the statistical index weight of the grid.
Specifically, if the mesh belongs to the kth level of the t-th-class modeling auxiliary data, the feature value is assigned to 1 at the feature vector code corresponding to the kth level of the t-th-class data, and is assigned to 0 at other level codes of the t-th-class data, so that the feature vectors of all mesh units can be constructed.
When the prediction is carried out through the regression model, the prediction value is not directly used as the final statistical value, but is used as the weight of the statistical value, so that the basis is provided for the index space conversion of the next step, and the prediction precision can be improved. Taking the population space as an example, the statistical index refers to the population, and the statistical population of the administrative unit needs to be converted into the grid population. The output value of the model can be regarded as the population weight of each grid, and finally, the statistical population of the administration units is distributed to the grids according to the population weight in each administration unit. The reason why the model output value is not directly used as the grid population value in the prediction stage is that the population is distributed according to the weight in the administrative unit, so that the total number of the grid population at the administrative unit level has no error and higher precision.
S5: and carrying out normalization processing on the statistical index weights of the grid units contained in each administrative unit, and distributing the total statistical index values to be spatialized in the administrative unit to each grid unit according to the weights to obtain the final grid statistical values.
Specifically, after the weights of the statistical indexes of the grid units are obtained, normalization processing is performed on the statistical indexes, and then the statistical indexes can be distributed to the grid units according to the weights, so that conversion from coarse-grained level (administrative units) to fine-grained level (grid units) is achieved.
In one embodiment, in step S5, the total value of the statistical indexes to be spatialized in the administrative unit is assigned to each grid unit according to the weight, and the calculation method is as follows:
Figure GDA0003670584780000091
wherein i represents the ith administrative unit, j represents the jth grid, and SIijThe final statistical index value, SI, of the jth grid of the ith administration unitiA total value of statistical indicators, W, representing the to-be-spatialized state of the ith administrative Unitij、WiuRespectively the jth and u grid weight values, N, of the ith administration unitiRepresenting the total number of grids for the ith administration.
Wherein, the index to be spatialized includes but is not limited to population, grain, distribute the total value of statistical index to be spatialized in administrative unit to each grid unit according to weight, obtain final grid statistical value, include:
and distributing the population and grain statistics of the administrative unit to each grid unit according to the weight to obtain the final grid population number and grain yield.
Referring to fig. 2, a technical route diagram of the method provided by the present invention is firstly selecting modeling data (including administrative unit scale data statistics, correlation analysis, and determination of modeling auxiliary data), then determining an optimal classification of a grid (grid unit scale data statistics, grid classification, and determination of an optimal classification according to an evaluation index), then performing feature modeling and migration (model modeling, model prediction, and grid statistical index weight), and finally performing a check on grid allocation and accuracy (weight allocation, grid statistical index, and accuracy check).
Fig. 3 shows a calculation flow of the method of the present invention, which may be implemented by calculating a pearson correlation coefficient, a spearman correlation coefficient, and a kender correlation coefficient when performing administrative unit level correlation analysis, and may be implemented by using an equal interval classification, a natural segment point method classification, or an equal quantile classification when classifying a grid. The evaluation grading result can adopt indexes such as contour coefficient, DBI coefficient, elbow method and the like.
In order to more clearly illustrate the implementation and beneficial effects of the method provided by the invention, the following detailed description is given by specific examples.
The traditional Wuhan street administrative division data, street population data and 10 types of POI data need to spatialize the Wuhan street population according to the POI data and distribute the Wuhan street population to a 200m grid to obtain more detailed population space distribution. Due to the lack of training data of real grid population, the rule learned from administrative units is often directly applied to grids in the traditional population spatialization modeling method, so that the problem of scale reduction exists in the model migration process.
The invention adopts a cross-scale population spatialization method, and a mode of grading grid units according to statistical information, thereby overcoming the problem of cross-scale characteristics in the training and predicting migration process of the traditional population spatialization method and realizing more fine population spatialization.
The algorithm process of the present invention will be described in detail below with reference to the accompanying drawings, and the specific steps are as follows:
1) counting the number of 10 POIs and the number of population of all streets in Wuhan city, respectively calculating the 10 POIs and the Pearson coefficients of the population at the street level, and selecting 4 POIs with the Pearson coefficients larger than 0.6 as final modeling auxiliary data;
2) and mapping the 4 types of POI points to each grid cell, and counting and recording the number of the types of POI points of each grid cell. And determining the optimal grid grading number for each type of POI by using a natural breakpoint method and a DBI index, wherein the method comprises the following steps:
organizing the statistics of the quantity of all grids of a POI into a one-dimensional vector, setting the grading quantity as [2,10], and grading the POI by utilizing a natural breakpoint method;
continuously changing the grading number, repeating the step I, calculating the DBI index of each grading result, and selecting the grading result with the minimum DBI index as the optimal grading result;
thirdly, performing step two on the 4 types of POI to determine the optimal grading quantity and grading result for the 4 types of POI;
3) constructing a feature vector at a street level and training the feature vector, wherein the method comprises the following steps:
counting the total grid number contained in a street and the grid number of each level of each auxiliary data determined according to the step 2);
dividing the number of grids of each level by the total number of grids to obtain the number ratio of grids of each level of the 4 types of POI, and constructing a level ratio feature vector;
carrying out a step two on all streets in Wuhan city to obtain feature vectors of all the streets;
and fourthly, inputting the feature vector of each street and the population number of the street into a regression random forest model for training.
4) Constructing a feature vector at a grid level and predicting, wherein the method comprises the following steps:
for a grid unit, according to the number of various POIs in the grid, the level to which the grid belongs is assigned with a characteristic value of 1 at the characteristic vector coding position corresponding to the level, and the rest are 0
Secondly, constructing and obtaining feature vectors of all grids in Wuhan city according to the step I;
inputting the feature vectors of the grids into the trained random forest model, and outputting to obtain population weights of the grids;
wherein, the feature vector construction method of the step 3) and 4) is shown in FIG. 4;
5) and in all streets, carrying out population weight normalization on grids contained in the streets, and then distributing the population of the streets to each grid according to the weight values.
6) At the community level, the contained grid population is summarized and compared with the real community statistical population to measure the accuracy of population spatialization.
The invention has the following beneficial effects: the invention provides a cross-scale statistical index spatialization method, which is characterized in that grid units are graded according to statistical information, so that statistical characteristics in coarse-grained administrative units have the attributes of fine-grained grid units, and then final spatialization results are obtained by transferring rules obtained by modeling of the administrative units to the grid.
The method can effectively select auxiliary data which is similar to the spatial distribution mode of the statistical index to be spatialized through correlation analysis, and the spatialization precision is high; as shown in fig. 5, population spatialization is performed by using three groups of POIs with different correlations as modeling auxiliary data, and the result shows that although an abnormal condition occurs in the size of 1000m grid cell, the spatialization accuracy generally increases with the improvement of the correlation of the modeling data. Meanwhile, the optimal classification is determined based on the grid cell attribute grade distribution characteristics, and compared with methods such as any classification and the like which do not take the grid cell attribute grade characteristics into consideration, the method has lower error; as shown in fig. 6, the DBI index is used to select the number of grades of the grid, and the result shows that there is no absolutely invariable optimal grade, and a better grading scheme can be determined by an appropriate evaluation index. Finally, compared with the characteristic statistics and model migration of the traditional method, the method overcomes the scale reduction problem caused by directly migrating the rule learned by the administrative unit modeling to the grid due to the lack of grid scale training data of the statistical index to a certain extent; as shown in fig. 7, at a size of 1000m of the grid cells, the conventional method performs better than the method, while at sizes of 500m and 200m of the grid, the method is better than the conventional method, and as the grid becomes smaller, the embodied advantages become more and more obvious; as shown in fig. 8, taking the comparison of the spatialization results of 200m grid cell population in wuhan city as an example, it can be seen that the grid population at the edge of wuhan is generally overestimated by the conventional method, and the visualization effect is likely to present a patch shape, whereas the results of the method of the present invention are improved for both, and from the error map of the street population, the prediction accuracy of the method of the present invention is significantly improved in the three rectangular frame regions as shown in the figure, compared with the conventional method; the result shows that the method is more suitable for grids with smaller scale, and has greater advantages in the statistical index spatialization with larger scale span compared with the traditional method.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims (9)

1. A cross-scale statistical index spatialization method considering grid unit attribute classification is characterized by comprising the following steps:
step S1: analyzing the correlation between the statistical index to be spatialized and the multi-source data in the coarse-grained administrative unit scale, and selecting data which has correlation meeting a preset degree with the statistical index to be spatialized from the multi-source data as modeling auxiliary data;
step S2: grading the grid statistic values of various modeling auxiliary data by adopting a grading method, and determining the optimal grading result of each type of modeling auxiliary data through grading evaluation indexes;
step S3: counting the number proportion of each grade grid unit in each type of modeling auxiliary data in the coarse-grained administrative unit scale, constructing a grade proportion feature vector, inputting the grade proportion feature vector into a regression model, and training to obtain a trained regression model;
s4: constructing a feature vector for each grid unit according to the optimal grading result of various auxiliary data in the unit scale of the fine-grained grid, and inputting the trained regression model to obtain the statistical index weight of each grid unit;
s5: and carrying out normalization processing on the weights of the statistical indexes of the grid units contained in each administrative unit, and distributing the total value of the statistical indexes to be spatialized in the administrative unit to each grid unit according to the weights to obtain the final grid statistical value.
2. The method according to claim 1, wherein step S1 specifically includes:
step S1.1: counting m types of multi-source data attribute values which can be quantized in the grid unit scale on all n administrative units, wherein n and m are positive integers larger than 0;
step S1.2: calculating a Pearson correlation coefficient of the statistical index to be spatialized and the multi-source data attribute value at the level of the administrative unit;
step S1.3: and selecting the multi-source data with the correlation coefficient larger than a threshold value T as final M-type modeling auxiliary data, wherein M is a positive integer larger than 0.
3. The method according to claim 1, wherein step S2 specifically includes:
step S2.1: counting the quantitative values of the M-type modeling auxiliary data on all N grid units, wherein each grid unit corresponds to M statistical values;
step S2.2: mesh statistics for class t modeling assistance data
Figure FDA0003670584770000011
And dividing the grid into different grades by using a grading method, and measuring the grading result in each time by adopting a preset evaluation index to determine the optimal grading result.
4. The method according to claim 1, wherein step S3 specifically comprises:
step S3.1: in the coarse-grained administrative unit scale, counting the number proportion of each grade grid unit in each type of modeling auxiliary data, and constructing a grade proportion feature vector betai
Figure FDA0003670584770000012
Wherein N isiRepresents the total grid number of the ith administration unit,
Figure FDA0003670584770000013
the grid quantity of the kth level of the t-th modeling auxiliary data is represented, one administrative unit corresponds to one feature vector, and the feature vector comprises a plurality of level ratios of a plurality of types of modeling auxiliary data;
step S3.2: and taking an administrative unit as a sample, taking the grade proportion characteristic vector of the administrative unit as input, taking the statistical index to be spatialized as output, and training the random forest regression model to obtain the trained random forest regression model.
5. The method according to claim 1, wherein step S4 specifically comprises:
step S4.1: for a grid unit, determining the belonging grade of the grid in various modeling auxiliary data according to the optimal grading result, and constructing the feature vector of the grid unit according to the belonging grade of the grid in various modeling auxiliary data;
step S4.2: and inputting the constructed feature vectors of all grid units into the trained random forest regression model, and outputting to obtain the statistical index weight of the grid.
6. The method according to claim 1, wherein the total value of the statistical measures to be spatialized in the administration unit is assigned to each grid cell by weight in step S5, and the calculation method is as follows:
Figure FDA0003670584770000021
wherein i represents the ith administrative unit, j represents the jth grid, and SIijThe final statistical index value, SI, of the jth grid of the ith administration unitiA total statistical indicator W representing the i-th administrative unit to be spatializedij、WiuRespectively the jth and u grid weight values, N, of the ith administration unitiRepresenting the total number of grids for the ith administration.
7. The method of claim 1, wherein the index to be spatialized includes but is not limited to population and grain, and the step of assigning the total value of the statistical index to be spatialized in the administrative unit to each grid unit according to the weight to obtain the final grid statistical value comprises:
and distributing the population and grain statistics of the administrative unit to each grid unit according to the weight to obtain the final grid population and grain yield.
8. The method of claim 1, wherein the method further comprises:
after the grid statistical index is obtained, summarizing the obtained grid predicted value in the next-level administrative unit of the model training scale, and comparing the collected grid predicted value with the actual administrative unit statistical index value to verify the precision.
9. The method according to claim 8, wherein the method of verifying the accuracy is embodied as:
the indexes for measuring the errors are an average absolute error MAE and a root mean square error RMSE, and the calculation formula is as follows:
Figure FDA0003670584770000022
Figure FDA0003670584770000023
wherein,
Figure FDA0003670584770000031
a predicted statistical index value representing the ith next-level administration unit,
Figure FDA0003670584770000032
and the real statistical index value of the ith next-level administration unit is represented, and the' number of the next-level administration units is represented.
CN201910854444.4A 2019-09-10 2019-09-10 Cross-scale statistical index spatialization method considering grid unit attribute grading Active CN110689055B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910854444.4A CN110689055B (en) 2019-09-10 2019-09-10 Cross-scale statistical index spatialization method considering grid unit attribute grading

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910854444.4A CN110689055B (en) 2019-09-10 2019-09-10 Cross-scale statistical index spatialization method considering grid unit attribute grading

Publications (2)

Publication Number Publication Date
CN110689055A CN110689055A (en) 2020-01-14
CN110689055B true CN110689055B (en) 2022-07-19

Family

ID=69107960

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910854444.4A Active CN110689055B (en) 2019-09-10 2019-09-10 Cross-scale statistical index spatialization method considering grid unit attribute grading

Country Status (1)

Country Link
CN (1) CN110689055B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112527867B (en) * 2020-12-18 2023-10-13 重庆师范大学 Non-agriculture employment post supply capability identification method, storage device and server
CN115272025A (en) * 2021-04-30 2022-11-01 华为技术有限公司 Method, device and storage medium for determining population distribution thermodynamic data
CN114331790B (en) * 2022-03-09 2022-07-12 中国测绘科学研究院 Grid processing method and system for incomplete edges of population data
CN114912760B (en) * 2022-04-14 2024-07-05 华南理工大学 Method, system and medium for assigning take-out packaging garbage population weight

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218517A (en) * 2013-03-22 2013-07-24 南京信息工程大学 GIS (Geographic Information System)-based region-meshed spatial population density computing method
CN105740325A (en) * 2016-01-20 2016-07-06 国家基础地理信息中心 Trans-scale geographic information linkage updating technical method based on spatial automatic matching
CN107092680A (en) * 2017-04-21 2017-08-25 中国测绘科学研究院 A kind of government information resources integration method based on geographic grid
CN107730099A (en) * 2017-09-30 2018-02-23 四川师范大学 A kind of space planning method for establishing model
CN108154193A (en) * 2018-01-16 2018-06-12 黄河水利委员会黄河水利科学研究院 A kind of long-term sequence precipitation data NO emissions reduction method
CN109934617A (en) * 2019-01-28 2019-06-25 浙江工业大学 A kind of classification display system on the practical innerland in shopping center
CN109978249A (en) * 2019-03-19 2019-07-05 广州大学 Population spatial distribution method, system and medium based on two-zone model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218517A (en) * 2013-03-22 2013-07-24 南京信息工程大学 GIS (Geographic Information System)-based region-meshed spatial population density computing method
CN105740325A (en) * 2016-01-20 2016-07-06 国家基础地理信息中心 Trans-scale geographic information linkage updating technical method based on spatial automatic matching
CN107092680A (en) * 2017-04-21 2017-08-25 中国测绘科学研究院 A kind of government information resources integration method based on geographic grid
CN107730099A (en) * 2017-09-30 2018-02-23 四川师范大学 A kind of space planning method for establishing model
CN108154193A (en) * 2018-01-16 2018-06-12 黄河水利委员会黄河水利科学研究院 A kind of long-term sequence precipitation data NO emissions reduction method
CN109934617A (en) * 2019-01-28 2019-06-25 浙江工业大学 A kind of classification display system on the practical innerland in shopping center
CN109978249A (en) * 2019-03-19 2019-07-05 广州大学 Population spatial distribution method, system and medium based on two-zone model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Mei Yang等.Population Spatialization in Gansu Province Based on RS and GIS.《2009 Joint Urban Remote Sensing Event》.2009, *
王宇.中国化石能源碳排放统计数据跨尺度空间化方法研究.《中国优秀硕士学位论文全文数据库工程科技I辑》.2018, *

Also Published As

Publication number Publication date
CN110689055A (en) 2020-01-14

Similar Documents

Publication Publication Date Title
CN110689055B (en) Cross-scale statistical index spatialization method considering grid unit attribute grading
CN106844781B (en) Data processing method and device
Biard et al. Automated detection of weather fronts using a deep learning neural network
CN107679734A (en) It is a kind of to be used for the method and system without label data classification prediction
CN113361742A (en) Hydrologic simulation-based regional comprehensive drought identification method
CN112990976A (en) Commercial network site selection method, system, equipment and medium based on open source data mining
CN114048920A (en) Site selection layout method, device, equipment and storage medium for charging facility construction
CN113177857A (en) Resource elastic allocation method based on typhoon disaster estimation
CN112668238A (en) Rainfall processing method, device, equipment and storage medium
CN116933946A (en) Rail transit OD passenger flow prediction method and system based on passenger flow destination structure
CN113808015B (en) Spatial interpolation method, device and equipment for meteorological parameters of power transmission line region
CN112200363B (en) Landslide prediction method, landslide prediction device, landslide prediction equipment and storage medium
CN117455551A (en) Industry electricity consumption prediction method based on industry relation complex network
CN116258279B (en) Landslide vulnerability evaluation method and device based on comprehensive weighting
CN117036781A (en) Image classification method based on tree comprehensive diversity depth forests
CN115270904B (en) Method and system for spatialization of proper-age permanent population in compulsory education stage
CN115906669A (en) Dense residual error network landslide susceptibility evaluation method considering negative sample selection strategy
CN112860824B (en) Scale adaptability evaluation method for high-resolution DEM terrain feature extraction
CN114880954A (en) Landslide sensitivity evaluation method based on machine learning
CN114784795A (en) Wind power prediction method and device, electronic equipment and storage medium
CN113689048A (en) Method, system and computer-readable storage medium for predicting refined spatial distribution of future population
CN114153899A (en) Method and electronic equipment for acquiring employment posts of unit building area of different land types
CN114118512A (en) Grouping public medical facility configuration method based on dynamic population data model
Barna et al. Regional index flood estimation at multiple durations with generalized additive models
Hasanudin et al. a Comparative Study of Iconnet Jabodetabek and Banten Using Linear Regression and Support Vector Regression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant