CN110689055A - Cross-scale statistical index spatialization method considering grid unit attribute grading - Google Patents

Cross-scale statistical index spatialization method considering grid unit attribute grading Download PDF

Info

Publication number
CN110689055A
CN110689055A CN201910854444.4A CN201910854444A CN110689055A CN 110689055 A CN110689055 A CN 110689055A CN 201910854444 A CN201910854444 A CN 201910854444A CN 110689055 A CN110689055 A CN 110689055A
Authority
CN
China
Prior art keywords
grid
unit
statistical
auxiliary data
statistical index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910854444.4A
Other languages
Chinese (zh)
Other versions
CN110689055B (en
Inventor
桂志鹏
梅宇翱
吴京航
刘正廉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201910854444.4A priority Critical patent/CN110689055B/en
Publication of CN110689055A publication Critical patent/CN110689055A/en
Application granted granted Critical
Publication of CN110689055B publication Critical patent/CN110689055B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a cross-scale statistical index spatialization method considering grid unit attribute grading, which comprises the steps of firstly analyzing the correlation between a statistical index to be spatialized and multi-source data in a coarse-grained administrative unit scale, and selecting data with higher correlation with the statistical index to be spatialized as modeling auxiliary data; then, classifying the grid statistical values of various modeling auxiliary data by adopting a classification method, and determining the optimal classification quantity of each type of modeling auxiliary data; secondly, constructing a grade proportion feature vector on a political unit scale, and inputting the grade proportion feature vector into a regression model for training; secondly, dividing the fine-grained grid unit scale into grid units according to the optimal grade of various auxiliary data to construct feature vectors, and inputting a regression model to obtain the statistical index weight of each grid unit; and finally, distributing the total value of the statistical indexes to be spatialized in the administrative unit to each grid unit according to the weight to obtain the final grid statistical value. The method of the invention can greatly improve the prediction precision.

Description

Cross-scale statistical index spatialization method considering grid unit attribute grading
Technical Field
The invention relates to the field of geographic information science, including economic geography, demographic geography, environmental geography and the like, in particular to a cross-scale statistical index spatialization method considering grid unit attribute grading.
Background
Statistical index spatialization aims at reproducing the spatial distribution of a statistical index in a geographic grid or other divisions (e.g., hexagons, buildings, communities, etc.), and generally converts the spatial expression of the statistical index from a coarse-grained administrative unit to a fine-grained geographic grid. The spatialization of the statistical indexes has been widely researched on data such as population, GDP (gross distribution product) and grain yield, and the method has important scientific significance and wide application prospect in the aspects of finely depicting the spatial distribution of the statistical indexes, assisting the reasonable allocation of resources, guiding government decision-making and the like.
The inventor of the present application finds that the method of the prior art has at least the following technical problems in the process of implementing the present invention:
because the statistical indexes lack grid scale training data, the traditional spatialization is generally realized by constructing an incidence relation between a modeling factor at an administrative unit level and the statistical indexes and transferring the rule from the administrative unit to the grid unit; however, the two have scale difference across orders of magnitude, which causes the scale reduction problem of model migration, thereby causing the low spatialization precision of statistical indexes.
Therefore, the method in the prior art has the technical problem of low precision.
Disclosure of Invention
In view of the above, the present invention provides a cross-scale statistical index spatialization method considering mesh unit attribute classification, so as to solve or at least partially solve the technical problem of low precision in the prior art.
In order to solve the technical problem, the invention provides a cross-scale statistical index spatialization method considering grid unit attribute grading, which comprises the following steps:
step S1: analyzing the correlation between the statistical index to be spatialized and the multi-source data in the coarse-grained administrative unit scale, and selecting data which has correlation meeting a preset degree with the statistical index to be spatialized from the multi-source data as modeling auxiliary data;
step S2: grading the grid statistic values of various modeling auxiliary data by adopting a grading method, and determining the optimal grading result of each type of modeling auxiliary data through grading evaluation indexes;
step S3: counting the number proportion of each grade grid unit in each type of modeling auxiliary data on the scale of coarse-grained administrative units, constructing grade proportion characteristic vectors, inputting the grade proportion characteristic vectors into a regression model for training, and obtaining the trained regression model;
s4: constructing a feature vector for each grid unit according to the optimal grading result of various auxiliary data in the unit scale of the fine-grained grid, and inputting the trained regression model to obtain the statistical index weight of each grid unit;
s5: and carrying out normalization processing on the statistical index weights of the grid units contained in each administrative unit, and distributing the total statistical index values to be spatialized in the administrative unit to each grid unit according to the weights to obtain the final grid statistical values.
In one embodiment, step S1 specifically includes:
step S1.1: counting m types of multi-source data attribute values which can be quantized in the grid unit scale on all n administrative units, wherein n and m are positive integers larger than 0;
step S1.2: calculating a Pearson correlation coefficient of the statistical index to be spatialized and the multi-source data attribute value at the level of the administrative unit;
step S1.3: and selecting the multi-source data with the correlation coefficient larger than a threshold value T as final M-type modeling auxiliary data, wherein M is a positive integer larger than 0.
In one embodiment, step S2 specifically includes:
step S2.1: counting the quantitative values of the M-type modeling auxiliary data on all N grid units, wherein each grid unit corresponds to M statistical values;
step (ii) ofS2.2: mesh statistics for class t modeling assistance dataAnd dividing the grid into different grades by using a grading method, and measuring the grading result in each time by adopting a preset evaluation index to determine the optimal grading result.
In one embodiment, step S3 specifically includes:
step S3.1: in the coarse-grained administrative unit scale, counting the number proportion of each grade grid unit in each type of modeling auxiliary data, and constructing a grade proportion feature vector betai
Figure BDA0002197902210000022
Wherein N isiRepresents the total grid number of the ith administration unit,
Figure BDA0002197902210000023
the grid quantity of the kth level of the t-th modeling auxiliary data is represented, one administrative unit corresponds to one feature vector, and the feature vector comprises a plurality of level ratios of a plurality of types of modeling auxiliary data;
step S3.2: and taking an administrative unit as a sample, taking the grade proportion characteristic vector of the administrative unit as input, taking the spatial statistical index as output, and training the random forest regression model to obtain the trained random forest regression model.
In one embodiment, step S4 specifically includes:
step S4.1: for a grid unit, determining the belonging grade of the grid in various modeling auxiliary data according to the optimal grading result, and constructing the feature vector of the grid unit according to the belonging grade of the grid in various modeling auxiliary data;
step S4.2: and inputting the constructed feature vectors of all grid units into the trained random forest regression model, and outputting to obtain the statistical index weight of the grid.
In one embodiment, in step S5, the total value of the statistical indexes to be spatialized in the administrative unit is assigned to each grid unit according to the weight, and the calculation method is as follows:
Figure BDA0002197902210000031
wherein i represents the ith administrative unit, j represents the jth grid, and SIijThe final statistical index value, SI, of the jth grid of the ith administration unitiA total value of statistical indicators, W, representing the to-be-spatialized state of the ith administrative Unitij、WiuRespectively the jth and u grid weight values, N, of the ith administration unitiRepresenting the total number of grids for the ith administration.
In one embodiment, the index to be spatialized includes, but is not limited to, population and grain, and the total value of the statistical index to be spatialized in the administrative unit is assigned to each grid unit according to the weight to obtain the final grid statistical value, including:
and distributing the population and grain statistics of the administrative unit to each grid unit according to the weight to obtain the final grid population number and grain yield.
In one embodiment, the method further comprises:
after the grid statistical index is obtained, summarizing the obtained grid predicted value in the next-level administrative unit of the model training scale, and comparing the collected grid predicted value with the actual administrative unit statistical index value to verify the precision.
In one embodiment, the method for verifying the accuracy specifically includes:
the indexes for measuring the errors are an average absolute error MAE and a root mean square error RMSE, and the calculation formula is as follows:
Figure BDA0002197902210000041
Figure BDA0002197902210000042
wherein,
Figure BDA0002197902210000043
A predicted statistical index value representing the ith next-level administration unit,
Figure BDA0002197902210000044
denotes the ith
And (4) the real statistical index value of the next level of administration units, wherein n represents the number of the next level of administration units.
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
the invention provides a cross-scale statistical index spatialization method considering grid unit attribute grading, which comprises the steps of firstly analyzing the correlation between a statistical index to be spatialized and multi-source data (quantized in the grid unit scale) in a coarse-grained administrative unit scale (such as a district and a county), and selecting data with higher correlation with the statistical index to be spatialized as modeling auxiliary data; grading the grid statistical values of various modeling auxiliary data by adopting a grading method, and determining the optimal grading result of each type of modeling auxiliary data through grading evaluation indexes; then, on the scale of administrative units, counting the number proportion of each grade grid unit in each type of modeling auxiliary data, constructing a grade proportion feature vector and inputting the grade proportion feature vector into a regression model for training; secondly, dividing the fine-grained grid unit scale into grid units according to the optimal grade of various auxiliary data to construct a feature vector, inputting a regression model, and accordingly obtaining the statistical index weight of each grid unit; and then, taking each administrative unit as a unit, normalizing the weight of the statistical indexes of the grid units contained in the administrative unit, and distributing the total value of the statistical indexes to be spatialized in the administrative unit to each grid unit according to the weight to obtain the final grid statistical value.
Because the method provided by the invention unifies the regression modeling process of statistical index spatialization on the grid scale, the problem of cross-scale caused by the existing spatialization method that the statistical index lacks grid scale training data and is directly trained by coarse-grained administrative unit data and transferred to fine-grained grid units is avoided. The method considers the grade distribution characteristic of the grid unit scale attribute, so that the model migration is smooth; compared with the traditional spatialization method, the method has higher precision, and can provide a new solution for spatialization of statistical indexes based on multi-source data fusion under the background of big data, particularly spatialization of statistical indexes such as population, GDP, grain yield, weather and climate factor data and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a cross-scale statistical index spatialization method considering grid element attribute classification according to the present invention;
FIG. 2 is a technical roadmap for the method provided by the present invention;
FIG. 3 is a schematic diagram of the calculation process of the method of the present invention;
FIG. 4 is a flow chart of a cross-scale feature extraction method of the present invention;
FIG. 5 is a graph of the average absolute error contrast at street level for the Wuhan City population spatialization results based on POI data of high, medium and low relevance in a specific example (using three grid cell sizes of 1000 meters, 500 meters and 200 meters);
FIG. 6 is a graph of comparison of average absolute errors at street level for Wuhan city population spatialization results with DBI index determined for ranking and arbitrary ranking (three grid cell sizes of 1000m, 500m and 200m are used, the errors are arranged from low to high);
FIG. 7 is a graph of the average absolute error of the Wuhan population spatialization results at street level using the conventional method and the method of the present invention (using three grid cell sizes of 1000m, 500m and 200 m);
FIG. 8 is a graph comparing the results of the traditional method and the method of the present invention for the spatialization of the Wuhan population and the error distribution (200 m grid cell size).
Detailed Description
The invention aims to provide a cross-scale statistical index spatialization method considering grid unit attribute grading, aiming at the problem that the spatialization precision of statistical indexes is low due to scale reduction when statistical indexes lack grid scale training data and rules learned by administrative units are directly transferred to grids.
In order to achieve the above purpose, the main concept of the invention is as follows:
firstly, analyzing the correlation between the statistical indexes to be spatialized and multi-source data (which can be quantized in the grid unit scale) in the coarse-grained administrative unit scale (such as a district and a county), and selecting data with higher correlation with the statistical indexes to be spatialized as modeling auxiliary data; secondly, classifying the grid statistical values of various types of modeling auxiliary data by adopting a classification method, and determining the optimal classification quantity of each type of modeling auxiliary data through a classification evaluation index; secondly, on the scale of administrative units, counting the number proportion of each grade grid unit in each type of modeling auxiliary data, constructing a grade proportion feature vector and inputting the grade proportion feature vector into a regression model for training; then, on the scale of fine-grained grid units, dividing the fine-grained grid units into all grid units according to the optimal levels of various auxiliary data to construct feature vectors, inputting a regression model, and accordingly obtaining the statistical index weight of each grid unit; and then, taking each administrative unit as a unit, normalizing the weight of the statistical indexes of the grid units contained in the administrative unit, and distributing the total value of the statistical indexes to be spatialized in the administrative unit to each grid unit according to the weight to obtain the final grid statistical value. And summarizing grid predicted values by using a next-level administrative unit (such as a street) of an administrative unit scale in the model training stage, and comparing the grid predicted values with the real administrative unit statistical index values to verify the precision.
The invention unifies the regression modeling process of statistical index spatialization on grid scale, thereby avoiding the cross-scale problem caused by the existing spatialization method when the statistical index lacks grid scale training data and is directly trained by coarse-grained administrative unit data and transferred to fine-grained grid units. The method considers the grade distribution characteristic of the grid unit scale attribute, so that the model migration is smooth; compared with the traditional spatialization method, the method has higher precision, and can provide a new solution for spatialization of statistical indexes based on multi-source data fusion under the background of big data, particularly spatialization of statistical indexes such as population, GDP, grain yield, weather and climate factor data and the like.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment provides a cross-scale statistical index spatialization method considering grid unit attribute classification, please refer to fig. 1, the method includes:
step S1: and analyzing the correlation between the statistical index to be spatialized and the multi-source data in the coarse-grained administrative unit scale, and selecting the data which has the correlation meeting the preset degree with the statistical index to be spatialized from the multi-source data as modeling auxiliary data.
Specifically, the coarse-grained administrative unit scale may be at a district level, a county level, and the like, taking population spatialization as an example, the multi-source data may include POI data, noctilucent remote sensing data, land use type data, DEM data, road network density data, and the like, and the multi-source data may be quantized in the grid unit scale. Different methods can be adopted for analyzing the correlation between the statistical indexes to be spatialized and the multi-source data, such as chart correlation analysis, covariance and covariance matrix, correlation coefficient and the like.
In one embodiment, step S1 specifically includes:
step S1.1: counting m types of multi-source data attribute values which can be quantized in the grid unit scale on all n administrative units, wherein n and m are positive integers larger than 0;
step S1.2: calculating a Pearson correlation coefficient of the statistical index to be spatialized and the multi-source data attribute value at the level of the administrative unit;
step S1.3: and selecting the multi-source data with the correlation coefficient larger than a threshold value T as final M-type modeling auxiliary data, wherein M is a positive integer larger than 0.
In the present embodiment, a correlation coefficient method is adopted. Specifically, a Pearson correlation coefficient of the statistical index to be spatialized and the multi-source data attribute value at the administrative unit level is calculated:
Figure BDA0002197902210000071
where ρ istIs the Pearson correlation coefficient of t-type multi-source data and statistical index to be spatialized at the level of administrative unit, n is the total number of the administrative unit, SIiIs a statistical indicator (statistical indicator) of the ith administration unit,
Figure BDA0002197902210000072
is the average value of all administrative unit statistical indexes; DV (distance vector)i,tIs the quantized value of the t-th class multi-source data on the ith administration unit,
Figure BDA0002197902210000073
the average quantized value of the t-th multi-source data in all administrative units is shown. The value of rho is between-1 and +1 if rho>0, indicating that the two variables are positively correlated, p<0, indicating that the two variables are negatively correlated, and the larger the absolute value of the pearson correlation coefficient, the larger the degree of correlation between the two variables.
Then, data with a correlation coefficient larger than a certain threshold value T is selected as final M-type modeling auxiliary data, and it is generally considered that two variables with a Pearson coefficient larger than 0.6 have strong correlation, so that multi-source data with the Pearson coefficient larger than 0.6 is used as modeling auxiliary data. Therefore, data with high correlation with the statistical indexes to be spatialized can be selected as modeling auxiliary data.
Step S2: and classifying the grid statistical values of various types of modeling auxiliary data by adopting a classification method, and determining the optimal classification result of each type of modeling auxiliary data through classification evaluation indexes.
Wherein, step S2 specifically includes:
step S2.1: counting the quantitative values of the M-type modeling auxiliary data on all N grid units, wherein each grid unit corresponds to M statistical values;
step S2.2: mesh statistics for class t modeling assistance data
Figure BDA0002197902210000074
And dividing the grid into different grades by using a grading method, and measuring the grading result in each time by adopting a preset evaluation index to determine the optimal grading result.
In particular, the classification method may employ different methods. Such as natural breakpoint methods, divide the mesh into different levels. Setting the grading quantity as K belongs to [2, K ], and measuring the result of each grading by adopting a proper evaluation index to determine the optimal grading result, such as DBI index, by changing the grading quantity:
Figure BDA0002197902210000081
wherein the content of the first and second substances,
Figure BDA0002197902210000082
is the davison burger index for a t-th class of modeling assistance data with a hierarchical number of k,
Figure BDA0002197902210000083
and
Figure BDA0002197902210000084
within-class average distances, σ, of the x-th and y-th classes, respectively, in the classification resultxAnd σyThe distance between the centers of the x-th and y-th scales, respectively. The davison fort index is in a value range of [0, + ∞), and the smaller DB means the smaller the distance in the grade is, and the larger the distance between grades is. After grading, selecting Davisenbergin indexThe small number of gradations is used as the optimum number of gradations. Repeating the step S2.2 for all M types of data, and determining the optimal grading scheme for each type of modeling auxiliary data, thereby obtaining the optimal grading quantity Ct∈[2,K](t=1,2,…,M)。
Step S3: and on the scale of coarse-grained administrative units, counting the number proportion of each grade grid unit in each type of modeling auxiliary data, constructing a grade proportion feature vector, inputting the grade proportion feature vector into a regression model, and training to obtain the trained regression model.
Wherein, step S3 specifically includes:
step S3.1: in the coarse-grained administrative unit scale, counting the number proportion of each grade grid unit in each type of modeling auxiliary data, and constructing a grade proportion feature vector betai
Figure BDA0002197902210000085
Wherein N isiRepresents the total grid number of the ith administration unit,the grid quantity of the kth level of the t-th modeling auxiliary data is represented, one administrative unit corresponds to one feature vector, and the feature vector comprises a plurality of level ratios of a plurality of types of modeling auxiliary data;
step S3.2: and taking an administrative unit as a sample, taking the grade proportion characteristic vector of the administrative unit as input, taking the spatial statistical index as output, and training the random forest regression model to obtain the trained random forest regression model.
Specifically, the rank-proportional feature vector βiInA ratio of the number of meshes representing the 1 st level of the type 1 modeling assistance data,
Figure BDA0002197902210000088
representing type 1 modelling assistance data item C1And (3) comparing the quantity of the grids of each level, obtaining the feature vectors of all administrative units through the step S3.1, then taking the feature vectors as samples, and taking the constructed level comparison feature vectors as input to train a random forest regression model.
To verify the validity of the method, the method further comprises:
after the grid statistical index is obtained, summarizing the obtained grid predicted value in the next-level administrative unit of the model training scale, and comparing the collected grid predicted value with the actual administrative unit statistical index value to verify the precision.
The method for verifying the precision specifically comprises the following steps:
the indexes for measuring the errors are an average absolute error MAE and a root mean square error RMSE, and the calculation formula is as follows:
Figure BDA0002197902210000091
Figure BDA0002197902210000092
wherein the content of the first and second substances,
Figure BDA0002197902210000093
a predicted statistical index value representing the ith next-level administration unit,
Figure BDA0002197902210000094
and the real statistical index value of the ith next-level administration unit is represented, and n represents the number of the next-level administration units.
S4: and constructing a feature vector for each grid unit according to the optimal grading result of various auxiliary data in the unit scale of the fine-grained grid, and inputting the trained regression model to obtain the statistical index weight of each grid unit.
Specifically, in step S3, a trained regression model is obtained, and in this step, a feature vector is constructed for each grid unit according to the optimal classification result of the auxiliary data on the scale of the fine-grained grid unit, so that the statistical index weight of the grid unit can be obtained.
In one embodiment, step S4 specifically includes:
step S4.1: for a grid unit, determining the belonging grade of the grid in various modeling auxiliary data according to the optimal grading result, and constructing the feature vector of the grid unit according to the belonging grade of the grid in various modeling auxiliary data;
step S4.2: and inputting the constructed feature vectors of all grid units into the trained random forest regression model, and outputting to obtain the statistical index weight of the grid.
Specifically, if the mesh belongs to the kth level of the t-th-class modeling auxiliary data, the feature value is assigned to 1 at the feature vector code corresponding to the kth level of the t-th-class data, and is assigned to 0 at other level codes of the t-th-class data, so that the feature vectors of all mesh units can be constructed.
When the prediction is carried out through the regression model, the prediction value is not directly used as the final statistical value, but is used as the weight of the statistical value, so that the basis is provided for the index space conversion of the next step, and the prediction precision can be improved. Taking the population spatialization as an example, the statistical index refers to the population, and the statistical population of the administrative unit needs to be converted into the grid population finally. The output value of the model can be regarded as the population weight of each grid, and finally, the statistical population of the administration units is distributed to the grids according to the population weight in each administration unit. The reason why the model output value is not directly used as the grid population value in the prediction stage is that the population is distributed according to the weight in the administrative unit, so that the total number of the grid population at the administrative unit level has no error and higher precision.
S5: and carrying out normalization processing on the statistical index weights of the grid units contained in each administrative unit, and distributing the total statistical index values to be spatialized in the administrative unit to each grid unit according to the weights to obtain the final grid statistical values.
Specifically, after the weights of the statistical indexes of the grid units are obtained, normalization processing is performed on the statistical indexes, and then the statistical indexes can be distributed to the grid units according to the weights, so that conversion from a coarse-grained level (administrative unit) to a fine-grained level (grid unit) is realized.
In one embodiment, in step S5, the total value of the statistical indexes to be spatialized in the administrative unit is assigned to each grid unit according to the weight, and the calculation method is as follows:
Figure BDA0002197902210000101
wherein i represents the ith administrative unit, j represents the jth grid, and SIijThe final statistical index value, SI, of the jth grid of the ith administration unitiA total value of statistical indicators, W, representing the to-be-spatialized state of the ith administrative Unitij、WiuRespectively the jth and u grid weight values, N, of the ith administration unitiRepresenting the total number of grids for the ith administration.
Wherein, the index to be spatialized includes but is not limited to population, grain, distribute the total value of statistical index to be spatialized in administrative unit to each grid unit according to weight, obtain final grid statistical value, include:
and distributing the population and grain statistics of the administrative unit to each grid unit according to the weight to obtain the final grid population number and grain yield.
Referring to fig. 2, a technical route diagram of the method provided by the present invention is firstly selecting modeling data (including administrative unit scale data statistics, correlation analysis, and determination of modeling auxiliary data), then determining an optimal classification of a grid (grid unit scale data statistics, grid classification, and determination of an optimal classification according to an evaluation index), then performing feature modeling and migration (model modeling, model prediction, and grid statistical index weight), and finally performing a check on grid allocation and accuracy (weight allocation, grid statistical index, and accuracy check).
Fig. 3 shows a calculation flow of the method of the present invention, which can be implemented by calculating a pearson correlation coefficient, a spearman correlation coefficient, and a kender correlation coefficient when performing the administrative unit-level correlation analysis, and when classifying a grid, an equal interval classification, a natural segment point classification, or an equal-fraction classification can be adopted. The evaluation grading result can adopt indexes such as contour coefficient, DBI coefficient, elbow method and the like.
In order to more clearly illustrate the implementation and beneficial effects of the method provided by the invention, the following detailed description is given by specific examples.
The traditional Wuhan street administrative division data, street population data and 10 types of POI data need to spatialize the Wuhan street population according to the POI data and distribute the Wuhan street population to a 200m grid to obtain more detailed population space distribution. Due to the lack of real grid population training data, the rules learned from administrative units are often directly applied to grids in the traditional population spatialization modeling method, and the problem of scale reduction exists in the model migration process.
The invention adopts a cross-scale population spatialization method, and a mode of grading grid units according to statistical information, thereby overcoming the problem of cross-scale characteristics in the training and predicting migration process of the traditional population spatialization method and realizing more fine population spatialization.
The following will explain the algorithm process of the present invention in detail with reference to the drawings in the present invention, and the specific steps are as follows:
1) counting the number of 10 POIs and the number of population of all streets in Wuhan city, respectively calculating the 10 POIs and the Pearson coefficients of the population at the street level, and selecting 4 POIs with the Pearson coefficients larger than 0.6 as final modeling auxiliary data;
2) and mapping the 4 types of POI points to each grid cell, and counting and recording the number of the types of POI points of each grid cell. And determining the optimal grid grading number for each type of POI by using a natural breakpoint method and a DBI index, wherein the method comprises the following steps:
① organizing the statistics of the number of grids of a POI into a one-dimensional vector, setting the number of grades as [2,10], and grading the POI by using a natural breakpoint method;
②, continuously changing the grading number, repeating step ①, calculating the DBI index for each grading result, and selecting the grading result with the minimum DBI index as the optimal grading result;
③, carrying out ①② steps on all the 4 types of POIs, and determining the optimal grading quantity and grading results for all the 4 types of POIs;
3) constructing a feature vector at a street level and training the feature vector, wherein the method comprises the following steps:
① counting the total number of grids contained in a street and the number of grids in each level of each auxiliary data determined in step 2);
② dividing the grid number of each grade by the total grid number to obtain the number ratio of 4 types of POI grids of each grade, and constructing grade ratio feature vectors;
③, carrying out ①② steps on all streets in Wuhan city to obtain the feature vectors of all the streets;
④ the feature vectors of each street and the population of the street are input into a regression random forest model for training.
4) Constructing a feature vector at a grid level and predicting, wherein the method comprises the following steps:
① for a mesh cell, depending on the number of classes of POIs in the mesh, which class the mesh belongs to assigns a feature value of 1 at the eigenvector coding corresponding to that class, and the remainder of 0
② obtaining feature vectors of all grids in Wuhan City according to the step ①;
③, inputting the feature vectors of the grids into the trained random forest model, and outputting to obtain the population weight of each grid;
wherein, the feature vector construction method of the step 3) and 4) is shown in FIG. 4;
5) and in all streets, carrying out population weight normalization on grids contained in the streets, and then distributing the population of the streets to each grid according to the weight values.
6) At the community level, the grid population contained in the community is summarized and compared with the real community statistical population to measure the accuracy of the population spatialization.
The invention has the following beneficial effects: the invention provides a cross-scale statistical index spatialization method, which is characterized in that grid units are graded according to statistical information, so that statistical characteristics in coarse-grained administrative units have the attributes of fine-grained grid units, and then final spatialization results are obtained by transferring rules obtained by modeling of the administrative units to the grid.
The method can effectively select auxiliary data which is similar to the spatial distribution mode of the statistical index to be spatialized through correlation analysis, and the spatialization precision is high; as shown in fig. 5, population spatialization is performed by using three groups of POIs with different correlations as modeling auxiliary data, and the result shows that although an abnormal condition occurs in the size of 1000m grid cell, the spatialization accuracy generally increases with the improvement of the correlation of the modeling data. Meanwhile, the optimal classification is determined based on the grid cell attribute grade distribution characteristics, and compared with methods such as any classification and the like which do not take the grid cell attribute grade characteristics into consideration, the method has lower error; as shown in fig. 6, the DBI index is used to select the number of grades of the grid, and the result shows that there is no absolutely invariable optimal grade, and a better grading scheme can be determined by an appropriate evaluation index. Finally, compared with the characteristic statistics and model migration of the traditional method, the method overcomes the problem of scale reduction caused by directly migrating the rule learned by the administrative unit modeling to the grid due to the lack of grid scale training data of statistical indexes to a certain extent; as shown in fig. 7, at a size of 1000m of the grid cells, the conventional method performs better than the method, while at sizes of 500m and 200m of the grid, the method is better than the conventional method, and as the grid becomes smaller, the embodied advantages become more and more obvious; as shown in fig. 8, taking the comparison of the spatialization results of 200m grid cell population in wuhan city as an example, it can be seen that the grid population at the edge of wuhan is generally overestimated by the conventional method, and the visualization effect is likely to present a patch shape, whereas the results of the method of the present invention are improved for both, and from the error map of the street population, the prediction accuracy of the method of the present invention is significantly improved in the three rectangular frame regions as shown in the figure, compared with the conventional method; the result shows that the method is more suitable for grids with smaller scale, and has greater advantages in the statistical index spatialization with larger scale span compared with the traditional method.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims (9)

1. A cross-scale statistical index spatialization method considering grid unit attribute classification is characterized by comprising the following steps:
step S1: analyzing the correlation between the statistical index to be spatialized and the multi-source data in the coarse-grained administrative unit scale, and selecting data which has correlation meeting a preset degree with the statistical index to be spatialized from the multi-source data as modeling auxiliary data;
step S2: grading the grid statistic values of various modeling auxiliary data by adopting a grading method, and determining the optimal grading result of each type of modeling auxiliary data through grading evaluation indexes;
step S3: counting the number proportion of each grade grid unit in each type of modeling auxiliary data on the scale of coarse-grained administrative units, constructing grade proportion characteristic vectors, inputting the grade proportion characteristic vectors into a regression model for training, and obtaining the trained regression model;
s4: constructing a feature vector for each grid unit according to the optimal grading result of various auxiliary data in the unit scale of the fine-grained grid, and inputting the trained regression model to obtain the statistical index weight of each grid unit;
s5: and carrying out normalization processing on the statistical index weights of the grid units contained in each administrative unit, and distributing the total statistical index values to be spatialized in the administrative unit to each grid unit according to the weights to obtain the final grid statistical values.
2. The method according to claim 1, wherein step S1 specifically comprises:
step S1.1: counting m types of multi-source data attribute values which can be quantized in the grid unit scale on all n administrative units, wherein n and m are positive integers larger than 0;
step S1.2: calculating a Pearson correlation coefficient of the statistical index to be spatialized and the multi-source data attribute value at the level of the administrative unit;
step S1.3: and selecting the multi-source data with the correlation coefficient larger than a threshold value T as final M-type modeling auxiliary data, wherein M is a positive integer larger than 0.
3. The method according to claim 1, wherein step S2 specifically comprises:
step S2.1: counting the quantitative values of the M-type modeling auxiliary data on all N grid units, wherein each grid unit corresponds to M statistical values;
step S2.2: mesh statistics for class t modeling assistance dataAnd dividing the grid into different grades by using a grading method, and measuring the grading result in each time by adopting a preset evaluation index to determine the optimal grading result.
4. The method according to claim 1, wherein step S3 specifically comprises:
step S3.1: in the coarse-grained administrative unit scale, counting the number proportion of each grade grid unit in each type of modeling auxiliary data, and constructing a grade proportion feature vector betai
Figure FDA0002197902200000021
Wherein N isiRepresents the total grid number of the ith administration unit,
Figure FDA0002197902200000022
the grid quantity of the kth level of the t-th modeling auxiliary data is represented, one administrative unit corresponds to one feature vector, and the feature vector comprises a plurality of level ratios of a plurality of types of modeling auxiliary data;
step S3.2: and taking an administrative unit as a sample, taking the grade proportion characteristic vector of the administrative unit as input, taking the spatial statistical index as output, and training the random forest regression model to obtain the trained random forest regression model.
5. The method according to claim 1, wherein step S4 specifically comprises:
step S4.1: for a grid unit, determining the belonging grade of the grid in various modeling auxiliary data according to the optimal grading result, and constructing the feature vector of the grid unit according to the belonging grade of the grid in various modeling auxiliary data;
step S4.2: and inputting the constructed feature vectors of all grid units into the trained random forest regression model, and outputting to obtain the statistical index weight of the grid.
6. The method according to claim 1, wherein the total value of the statistical indexes to be spatialized in the administrative unit is assigned to each grid cell by weight in step S5, and the calculation method is as follows:
Figure FDA0002197902200000023
wherein i represents the ith administrative unit, j represents the jth grid, and SIijThe final statistical index value, SI, of the jth grid of the ith administration unitiA total value of statistical indicators, W, representing the to-be-spatialized state of the ith administrative Unitij、WiuRespectively the jth and u grid weight values, N, of the ith administration unitiRepresenting the total number of grids for the ith administration.
7. The method of claim 1, wherein the index to be spatialized includes but is not limited to population and grain, and the step of assigning the total value of the statistical index to be spatialized in the administrative unit to each grid unit according to the weight to obtain the final grid statistical value comprises:
and distributing the population and grain statistics of the administrative unit to each grid unit according to the weight to obtain the final grid population number and grain yield.
8. The method of claim 1, wherein the method further comprises:
after the grid statistical index is obtained, summarizing the obtained grid predicted value in the next-level administrative unit of the model training scale, and comparing the collected grid predicted value with the actual administrative unit statistical index value to verify the precision.
9. The method of claim 8, wherein the method of verifying the accuracy is specifically:
the indexes for measuring the errors are an average absolute error MAE and a root mean square error RMSE, and the calculation formula is as follows:
Figure FDA0002197902200000031
Figure FDA0002197902200000032
wherein the content of the first and second substances,
Figure FDA0002197902200000033
a predicted statistical index value representing the ith next-level administration unit,
Figure FDA0002197902200000034
and the real statistical index value of the ith next-level administration unit is represented, and n represents the number of the next-level administration units.
CN201910854444.4A 2019-09-10 2019-09-10 Cross-scale statistical index spatialization method considering grid unit attribute grading Active CN110689055B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910854444.4A CN110689055B (en) 2019-09-10 2019-09-10 Cross-scale statistical index spatialization method considering grid unit attribute grading

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910854444.4A CN110689055B (en) 2019-09-10 2019-09-10 Cross-scale statistical index spatialization method considering grid unit attribute grading

Publications (2)

Publication Number Publication Date
CN110689055A true CN110689055A (en) 2020-01-14
CN110689055B CN110689055B (en) 2022-07-19

Family

ID=69107960

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910854444.4A Active CN110689055B (en) 2019-09-10 2019-09-10 Cross-scale statistical index spatialization method considering grid unit attribute grading

Country Status (1)

Country Link
CN (1) CN110689055B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112527867A (en) * 2020-12-18 2021-03-19 重庆师范大学 Non-agricultural employment post supply capacity identification method, storage device and server
CN114331790A (en) * 2022-03-09 2022-04-12 中国测绘科学研究院 Grid processing method and system for incomplete edges of population data
WO2022228320A1 (en) * 2021-04-30 2022-11-03 华为技术有限公司 Population distribution heat data determination method and apparatus, and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218517A (en) * 2013-03-22 2013-07-24 南京信息工程大学 GIS (Geographic Information System)-based region-meshed spatial population density computing method
CN105740325A (en) * 2016-01-20 2016-07-06 国家基础地理信息中心 Trans-scale geographic information linkage updating technical method based on spatial automatic matching
CN107092680A (en) * 2017-04-21 2017-08-25 中国测绘科学研究院 A kind of government information resources integration method based on geographic grid
CN107730099A (en) * 2017-09-30 2018-02-23 四川师范大学 A kind of space planning method for establishing model
CN108154193A (en) * 2018-01-16 2018-06-12 黄河水利委员会黄河水利科学研究院 A kind of long-term sequence precipitation data NO emissions reduction method
CN109934617A (en) * 2019-01-28 2019-06-25 浙江工业大学 A kind of classification display system on the practical innerland in shopping center
CN109978249A (en) * 2019-03-19 2019-07-05 广州大学 Population spatial distribution method, system and medium based on two-zone model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218517A (en) * 2013-03-22 2013-07-24 南京信息工程大学 GIS (Geographic Information System)-based region-meshed spatial population density computing method
CN105740325A (en) * 2016-01-20 2016-07-06 国家基础地理信息中心 Trans-scale geographic information linkage updating technical method based on spatial automatic matching
CN107092680A (en) * 2017-04-21 2017-08-25 中国测绘科学研究院 A kind of government information resources integration method based on geographic grid
CN107730099A (en) * 2017-09-30 2018-02-23 四川师范大学 A kind of space planning method for establishing model
CN108154193A (en) * 2018-01-16 2018-06-12 黄河水利委员会黄河水利科学研究院 A kind of long-term sequence precipitation data NO emissions reduction method
CN109934617A (en) * 2019-01-28 2019-06-25 浙江工业大学 A kind of classification display system on the practical innerland in shopping center
CN109978249A (en) * 2019-03-19 2019-07-05 广州大学 Population spatial distribution method, system and medium based on two-zone model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MEI YANG等: "Population Spatialization in Gansu Province Based on RS and GIS", 《2009 JOINT URBAN REMOTE SENSING EVENT》, 26 June 2009 (2009-06-26), pages 1 - 6 *
王宇: "中国化石能源碳排放统计数据跨尺度空间化方法研究", 《中国优秀硕士学位论文全文数据库工程科技I辑》, 15 February 2018 (2018-02-15), pages 027 - 302 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112527867A (en) * 2020-12-18 2021-03-19 重庆师范大学 Non-agricultural employment post supply capacity identification method, storage device and server
CN112527867B (en) * 2020-12-18 2023-10-13 重庆师范大学 Non-agriculture employment post supply capability identification method, storage device and server
WO2022228320A1 (en) * 2021-04-30 2022-11-03 华为技术有限公司 Population distribution heat data determination method and apparatus, and storage medium
CN114331790A (en) * 2022-03-09 2022-04-12 中国测绘科学研究院 Grid processing method and system for incomplete edges of population data

Also Published As

Publication number Publication date
CN110689055B (en) 2022-07-19

Similar Documents

Publication Publication Date Title
CN110634080B (en) Abnormal electricity utilization detection method, device, equipment and computer readable storage medium
CN110689055B (en) Cross-scale statistical index spatialization method considering grid unit attribute grading
CN106844781B (en) Data processing method and device
CN101853290A (en) Meteorological service performance evaluation method based on geographical information system (GIS)
CN114168906B (en) Mapping geographic information data acquisition system based on cloud computing
CN107679734A (en) It is a kind of to be used for the method and system without label data classification prediction
Biard et al. Automated detection of weather fronts using a deep learning neural network
CN110889196B (en) Water environment bearing capacity assessment method and device based on water quality model and storage medium
CN114048920A (en) Site selection layout method, device, equipment and storage medium for charging facility construction
CN114743010B (en) Ultrahigh voltage power transmission line point cloud data semantic segmentation method based on deep learning
CN116187835A (en) Data-driven-based method and system for estimating theoretical line loss interval of transformer area
CN109740818A (en) A kind of probability density forecasting system applied to en-route sector traffic
CN113177857A (en) Resource elastic allocation method based on typhoon disaster estimation
CN110471131B (en) High-spatial-resolution automatic prediction method and system for refined atmospheric horizontal visibility
CN113379269B (en) Urban business function partitioning method, device and medium for multi-factor spatial clustering
CN113284369B (en) Prediction method for actually measured airway data based on ADS-B
CN114219123A (en) Regional collapse probability prediction method based on frequency ratio-random forest model
CN112200363A (en) Landslide prediction method, device, equipment and storage medium
CN116611725A (en) Land type identification method and device based on green ecological index
CN115906669A (en) Dense residual error network landslide susceptibility evaluation method considering negative sample selection strategy
CN115270904B (en) Method and system for spatialization of proper-age permanent population in compulsory education stage
CN114880954A (en) Landslide sensitivity evaluation method based on machine learning
Li et al. Identifying urban form typologies in seoul with mixture model based clustering
CN110991600B (en) Drought intelligent prediction method integrating distribution estimation algorithm and extreme learning machine
CN113689048A (en) Method, system and computer-readable storage medium for predicting refined spatial distribution of future population

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant