CN118051845A

CN118051845A - Geospatial full coverage data generation method and device based on space variable parameter machine learning

Info

Publication number: CN118051845A
Application number: CN202410446330.7A
Authority: CN
Inventors: 高秉博; 王雨雪; 殷悦; 王辰怡; 刘燕青; 谢东凯; 姚晓闯; 杨建宇; 冯权泷
Original assignee: China Agricultural University
Current assignee: China Agricultural University
Priority date: 2024-04-15
Filing date: 2024-04-15
Publication date: 2024-05-17
Anticipated expiration: 2044-04-15
Also published as: CN118051845B

Abstract

The invention provides a geographic space full-coverage data generation method and device based on space variable parameter machine learning, and relates to the technical field of geographic information science. The method comprises the following steps: gradually partitioning a target area, and calculating the spatial layering heterogeneity of the relationship between various auxiliary variables and target variables in the current partitioning state based on various auxiliary variables and target variables in each observation site in the target area after each partitioning; determining a target partition state based on the spatial hierarchical heterogeneity; under a target partition state, respectively constructing a space variable parameter machine learning model aiming at each subarea in the target area; and respectively carrying out interpolation prediction on target variables of each preset point to be predicted in the target region based on each space variable parameter machine learning model corresponding to each sub-region to obtain an interpolation prediction result and an uncertainty analysis result. According to the method, the spatial distribution map corresponding to the accurate geospatial full coverage data can be interpolated according to the limited observation site data.

Description

Method and device for generating geospatial full coverage data based on spatially variable parameter machine learning

技术领域Technical Field

本发明涉及地理信息科学技术领域，尤其涉及一种基于空间变参数机器学习的地理空间全覆盖数据生成方法和装置。The present invention relates to the field of geographic information science and technology, and in particular to a method and device for generating geographic space full coverage data based on space-varying parameter machine learning.

背景技术Background technique

空间插值常用于将离散点的测量数据转换为连续的数据曲面，以便与其它空间现象的分布模式进行比较。建立气象、土壤等观测站点通常成本较高、数量有限，且无法获取到研究区内连续曲面数据，因此常使用空间插值技术将数量有限的观测站点数据推算至整个研究区。随着空间数据插值技术不断的推广，其应用得到广泛发展，使得对此技术的精准度有了更高的要求。然而，受制于复杂多变的地理环境，在进行大范围高空间分辨率插值时，当目标变量受到多种辅助变量的影响，且当目标变量与辅助变量之间的关系存在较强的空间异质性时，通过传统的空间插值方法获得的插值结果的精度较低。Spatial interpolation is often used to convert the measurement data of discrete points into continuous data surfaces for comparison with the distribution patterns of other spatial phenomena. The establishment of meteorological and soil observation sites is usually costly and limited in number, and it is impossible to obtain continuous surface data in the study area. Therefore, spatial interpolation technology is often used to extrapolate the data of a limited number of observation sites to the entire study area. With the continuous promotion of spatial data interpolation technology, its application has been widely developed, which has led to higher requirements for the accuracy of this technology. However, due to the complex and changeable geographical environment, when performing large-scale high-spatial-resolution interpolation, when the target variable is affected by multiple auxiliary variables, and when there is strong spatial heterogeneity in the relationship between the target variable and the auxiliary variables, the accuracy of the interpolation results obtained by traditional spatial interpolation methods is low.

在假定目标变量与辅助变量之间局部空间关系平稳的前提下，研究人员开发了地理加权回归模型和贝叶斯空间变系数模型，以利用空间自相关性来模拟空间非平稳关系。然而，前者严重依赖于预定义的空间核函数，极易受到共线性的影响，进而影响插值结果的精度；后者则需要为模型的每个系数预定义分布并为系数的空间随机部分设定互协方差函数，而这是很难正确设定的，设定的不正确则会影响插值结果的精度。Under the premise of assuming that the local spatial relationship between the target variable and the auxiliary variable is stable, the researchers developed a geographically weighted regression model and a Bayesian spatially variable coefficient model to use spatial autocorrelation to simulate spatial non-stationary relationships. However, the former relies heavily on a predefined spatial kernel function and is easily affected by collinearity, which in turn affects the accuracy of the interpolation results; the latter requires a predefined distribution for each coefficient of the model and sets a mutual covariance function for the spatial random part of the coefficient, which is difficult to set correctly, and incorrect settings will affect the accuracy of the interpolation results.

因此，在大范围高空间分辨率插值情境中，如何能够根据有限观测站点数据插值出准确的气象或土壤数据等的空间分布图是当前亟待解决的问题。Therefore, in the context of large-scale high spatial resolution interpolation, how to interpolate accurate spatial distribution maps of meteorological or soil data based on limited observation site data is an urgent problem to be solved.

发明内容Summary of the invention

本发明提供一种基于空间变参数机器学习的地理空间全覆盖数据生成方法和装置，用以解决现有技术中针对地理空间中，气象或土壤数据等空间插值结果的精度较低的缺陷，实现高精度的地理空间全覆盖数据的空间插值。The present invention provides a method and device for generating geographic space full coverage data based on spatial variable parameter machine learning, so as to solve the defect of low precision of spatial interpolation results of meteorological or soil data in geographic space in the prior art, and realize spatial interpolation of high-precision geographic space full coverage data.

本发明提供一种基于空间变参数机器学习的地理空间全覆盖数据生成方法，包括：The present invention provides a method for generating geographic space full coverage data based on space-varying parameter machine learning, comprising:

对目标区域逐步进行分区，每次分区后基于所述目标区域中各观测站点中的各类辅助变量和目标变量，计算当前分区状态下各类辅助变量分别与目标变量间关系的空间分层异质性；The target area is partitioned step by step, and after each partition, based on various auxiliary variables and target variables in each observation station in the target area, the spatial hierarchical heterogeneity of the relationship between various auxiliary variables and the target variables in the current partition state is calculated;

基于每次分区后各类辅助变量分别与目标变量间关系的所述空间分层异质性，确定目标分区状态，所述目标分区状态对应所述目标区域的多个子区域；Based on the spatial hierarchical heterogeneity of the relationship between each type of auxiliary variable and the target variable after each partition, determine the target partition state, the target partition state corresponds to multiple sub-regions of the target region;

在所述目标分区状态下，针对所述目标区域中的各所述子区域分别构建空间变参数机器学习模型，各所述空间变参数机器学习模型分别为基于对应的空间范围内各观测站点的目标变量、辅助变量，以及各所述观测站点至对应的所述空间变参数机器学习模型的距离训练获得的；各所述空间变参数机器学习模型分别包括位置信息和空间范围信息，所述位置信息为所述空间变参数机器学习模型对应的所述子区域的中心坐标，所述空间范围信息为所述空间变参数机器学习模型对应的所述子区域的大小；In the target partition state, a spatially varying parameter machine learning model is constructed for each sub-region in the target region, each of which is obtained by training based on the target variables and auxiliary variables of each observation site within the corresponding spatial range, and the distance from each observation site to the corresponding spatially varying parameter machine learning model; each of the spatially varying parameter machine learning models includes position information and spatial range information, the position information is the center coordinates of the sub-region corresponding to the spatially varying parameter machine learning model, and the spatial range information is the size of the sub-region corresponding to the spatially varying parameter machine learning model;

基于各所述子区域分别对应的各空间变参数机器学习模型对所述目标区域中的预设各待预测点的目标变量分别进行插值预测，获得目标插值结果和不确定性分析结果。Based on the spatially variable parameter machine learning models corresponding to the sub-regions, interpolation prediction is performed on the target variables of the preset points to be predicted in the target area to obtain target interpolation results and uncertainty analysis results.

根据本发明提供的一种基于空间变参数机器学习的地理空间全覆盖数据生成方法，所述每次分区后基于所述目标区域中各观测站点中的各类辅助变量和目标变量，计算当前分区状态下各类辅助变量分别与目标变量间关系的空间分层异质性，包括：According to a method for generating geographic space full coverage data based on space-varying parameter machine learning provided by the present invention, after each partition, based on various auxiliary variables and target variables in each observation station in the target area, the spatial hierarchical heterogeneity of the relationship between various auxiliary variables and the target variable in the current partition state is calculated, including:

针对所述目标区域中各所述观测站点，计算所述观测站点中的所述目标变量分别与各类所述辅助变量之间的双变量局部空间自相关系数；For each observation site in the target area, calculating the bivariate local spatial autocorrelation coefficient between the target variable in the observation site and each type of auxiliary variable;

每次对所述目标区域进行分区后，基于各所述双变量局部自相关系数分别计算当前分区状态下各类所述辅助变量分别与目标变量间关系的空间分层异质性。After the target area is partitioned each time, the spatial hierarchical heterogeneity of the relationship between each type of auxiliary variable and the target variable in the current partition state is calculated based on the local autocorrelation coefficient of each bivariate.

根据本发明提供的一种基于空间变参数机器学习的地理空间全覆盖数据生成方法，所述针对所述目标区域中各所述观测站点，计算所述观测站点中的所述目标变量分别与各类所述辅助变量之间的双变量局部空间自相关系数，包括：According to a method for generating geographic space full coverage data based on space-varying parameter machine learning provided by the present invention, for each observation site in the target area, calculating the bivariate local spatial autocorrelation coefficient between the target variable in the observation site and each type of auxiliary variable, respectively, includes:

针对所述目标区域中的各所述观测站点，基于所述观测站点的辅助变量，以及所述观测站点的第一预设数量的邻近观测站点的目标变量，计算所述观测站点对应的双变量局部空间自相关系数。For each of the observation sites in the target area, based on the auxiliary variables of the observation site and the target variables of a first preset number of neighboring observation sites of the observation site, the bivariate local spatial autocorrelation coefficient corresponding to the observation site is calculated.

根据本发明提供的一种基于空间变参数机器学习的地理空间全覆盖数据生成方法，所述每次对所述目标区域进行分区后，基于各所述双变量局部自相关系数分别计算当前分区状态下各类所述辅助变量分别与目标变量间关系的空间分层异质性，包括：According to a method for generating geographic space full coverage data based on spatially varying parameter machine learning provided by the present invention, each time the target area is partitioned, the spatial hierarchical heterogeneity of the relationship between each type of auxiliary variable and the target variable in the current partition state is calculated based on the local autocorrelation coefficient of each bivariate, including:

每次对所述目标区域进行分区后，基于当前分区状态下各分区空间内所有所述观测站点对应的所述双变量局部空间自相关系数，计算各所述分区空间中各类辅助变量分别与目标变量间关系的局部自相关指数方差值；After the target area is partitioned each time, based on the local spatial autocorrelation coefficients of the two variables corresponding to all the observation sites in each partition space under the current partition state, the local autocorrelation index variance values of the relationships between various auxiliary variables and the target variables in each partition space are calculated;

基于所述目标区域中所有所述观测站点对应的所述双变量局部空间自相关系数，计算所述目标区域中各类辅助变量分别与目标变量间关系的全局自相关指数方差值；Based on the bivariate local spatial autocorrelation coefficients corresponding to all the observation sites in the target area, the global autocorrelation index variance values of the relationships between various auxiliary variables and the target variable in the target area are calculated;

基于所述局部自相关指数方差值和所述全局自相关指数方差值分别计算所述目标区域在当前分区状态下各类所述辅助变量分别与目标变量间关系的空间分层异质性。The spatial hierarchical heterogeneity of the relationship between each type of auxiliary variable and the target variable in the target area under the current partition state is calculated based on the local autocorrelation index variance value and the global autocorrelation index variance value.

根据本发明提供的一种基于空间变参数机器学习的地理空间全覆盖数据生成方法，所述基于每次分区后各类辅助变量分别与目标变量间关系的所述空间分层异质性，确定目标分区状态，包括：According to a method for generating geographic space full coverage data based on spatially variable parameter machine learning provided by the present invention, the spatial hierarchical heterogeneity of the relationship between various auxiliary variables and the target variable after each partition is used to determine the target partition state, including:

基于每次分区后各类所述辅助变量分别与目标变量间关系的所述空间分层异质性，分别计算每次分区后所述目标区域对应的平均空间分层异质性；Based on the spatial hierarchical heterogeneity of the relationship between each type of auxiliary variable and the target variable after each partition, the average spatial hierarchical heterogeneity corresponding to the target area after each partition is calculated;

基于每次分区后所述目标区域对应的所述平均空间分层异质性构建第一变化曲线，或基于相邻两次分区后的所述平均空间分层异质性的差值构建第二变化曲线；Constructing a first change curve based on the average spatial stratified heterogeneity corresponding to the target area after each partition, or constructing a second change curve based on the difference between the average spatial stratified heterogeneity after two adjacent partitions;

将所述第一变化曲线的拐点或所述第二变化曲线的拐点对应的分区状态确定为目标分区状态。A partition state corresponding to an inflection point of the first change curve or an inflection point of the second change curve is determined as a target partition state.

根据本发明提供的一种基于空间变参数机器学习的地理空间全覆盖数据生成方法，各所述子区域对应的各所述空间变参数机器学习模型分别为基于下述方法训练获得的：According to a method for generating geographic space full coverage data based on spatially varying parameter machine learning provided by the present invention, each of the spatially varying parameter machine learning models corresponding to each of the sub-regions is trained based on the following method:

获取所述子区域内所有所述观测站点的目标变量和辅助变量；Obtain target variables and auxiliary variables of all the observation sites in the sub-region;

确定所述子区域内各所述观测站点分别至所述子区域的中心点的距离；Determine the distances from each of the observation sites in the sub-area to the center point of the sub-area;

基于所述子区域内所有所述观测站点的目标变量和辅助变量，以及所述子区域内各所述观测站点分别至所述子区域的中心点的距离训练随机森林模型，将训练后的所述随机森林模型作为所述子区域对应的所述空间变参数机器学习模型。A random forest model is trained based on the target variables and auxiliary variables of all the observation sites in the sub-region, and the distances from each observation site in the sub-region to the center point of the sub-region, and the trained random forest model is used as the spatially varying parameter machine learning model corresponding to the sub-region.

根据本发明提供的一种基于空间变参数机器学习的地理空间全覆盖数据生成方法，所述基于各所述子区域分别对应的各空间变参数机器学习模型对所述目标区域中预设的各待预测点的目标变量分别进行插值预测，获得目标插值结果和不确定性分析结果，包括：According to a method for generating geographic space full coverage data based on space-varying parameter machine learning provided by the present invention, the target variables of each to-be-predicted point preset in the target area are interpolated and predicted respectively based on each space-varying parameter machine learning model corresponding to each sub-area, and a target interpolation result and an uncertainty analysis result are obtained, including:

采用多个空间变参数机器学习模型分别对所述待预测点的目标变量进行预测，获得多个插值预测结果，其中，所述多个空间变参数机器学习模型包括和所述待预测点邻近的第二预设数量个空间变参数机器学习模型；Using multiple spatially varying parameter machine learning models to respectively predict the target variable of the point to be predicted, and obtaining multiple interpolation prediction results, wherein the multiple spatially varying parameter machine learning models include a second preset number of spatially varying parameter machine learning models adjacent to the point to be predicted;

基于反距离加权的方式，根据多个所述插值预测结果确定所述待预测点的目标变量的目标插值结果和不确定性分析结果。Based on the inverse distance weighted method, the target interpolation result and the uncertainty analysis result of the target variable of the point to be predicted are determined according to the multiple interpolation prediction results.

本发明还提供一种基于空间变参数机器学习的地理空间全覆盖数据生成装置，包括：The present invention also provides a device for generating geographic space full coverage data based on space-varying parameter machine learning, comprising:

分区模块，用于对目标区域逐步进行分区，每次分区后基于所述目标区域中各观测站点中的各类辅助变量和目标变量，计算当前分区状态下各类辅助变量与目标变量间关系的空间分层异质性；A partitioning module is used to partition the target area step by step, and after each partition, based on various auxiliary variables and target variables in each observation station in the target area, calculate the spatial hierarchical heterogeneity of the relationship between various auxiliary variables and target variables in the current partition state;

确定模块，用于基于每次分区后各类辅助变量分别与目标变量间关系的所述空间分层异质性，确定目标分区状态，所述目标分区状态对应所述目标区域的多个子区域；A determination module, configured to determine a target partition state based on the spatial hierarchical heterogeneity of the relationship between each type of auxiliary variable and the target variable after each partition, wherein the target partition state corresponds to a plurality of sub-regions of the target region;

建模模块，用于在所述目标分区状态下，针对所述目标区域中的各所述子区域分别构建空间变参数机器学习模型，各所述空间变参数机器学习模型分别为基于对应的空间范围内各观测站点的目标变量、辅助变量，以及各所述观测站点至对应的所述空间变参数机器学习模型的距离训练获得的；各所述空间变参数机器学习模型分别包括位置信息和空间范围信息，所述位置信息为所述空间变参数机器学习模型对应的所述子区域的中心坐标，所述空间范围信息为所述空间变参数机器学习模型对应的所述子区域的大小；A modeling module is used to construct a spatially varying parameter machine learning model for each sub-region in the target region under the target partition state, each of the spatially varying parameter machine learning models is obtained by training based on the target variables and auxiliary variables of each observation site within the corresponding spatial range, and the distance from each observation site to the corresponding spatially varying parameter machine learning model; each of the spatially varying parameter machine learning models includes position information and spatial range information, the position information is the center coordinates of the sub-region corresponding to the spatially varying parameter machine learning model, and the spatial range information is the size of the sub-region corresponding to the spatially varying parameter machine learning model;

预测模块，用于基于各所述子区域分别对应的各空间变参数机器学习模型对所述目标区域中预设的各待预测点的目标变量分别进行插值预测，获得目标插值结果和不确定性分析结果。The prediction module is used to perform interpolation prediction on the target variables of each preset point to be predicted in the target area based on the spatially variable parameter machine learning models corresponding to each sub-area, so as to obtain the target interpolation result and uncertainty analysis result.

本发明还提供一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如上述任一种所述基于空间变参数机器学习的地理空间全覆盖数据生成方法。The present invention also provides an electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, it implements any of the above-described methods for generating geographic space full coverage data based on spatially variable parameter machine learning.

本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现如上述任一种所述基于空间变参数机器学习的地理空间全覆盖数据生成方法。The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon. When the computer program is executed by a processor, the method for generating geographic space full coverage data based on spatially variable parameter machine learning as described in any of the above is implemented.

本发明还提供一种计算机程序产品，包括计算机程序，所述计算机程序被处理器执行时实现如上述任一种所述基于空间变参数机器学习的地理空间全覆盖数据生成方法。The present invention also provides a computer program product, including a computer program, which, when executed by a processor, implements any of the above-mentioned methods for generating geographic space full coverage data based on spatially variable parameter machine learning.

本发明提供的空间变参数机器学习的空间插值方法和装置，针对目标变量与各类辅助变量间分别同时存在属性相似与空间自相关性，且目标变量与各类辅助变量之间关系异质性较强的情况，通过对目标区域基于空间分层异质性进行分区，获得多个子区域，并基于各子区域对应的空间变参数机器学习模型对目标区域待预测点进行预测获得目标插值结果。由于针对各子区域构建的各空间变参数机器学习模型均有特定的位置和范围，因此可以有效的模拟目标变量和辅助变量之间的空间非平稳关系，同时，该空间变参数机器学习模型是基于观测站点的目标变量、辅助变量和观测站点距空间变参数机器学习模型位置的距离训练建立的，因此可以自动对目标变量和辅助变量之间关系中的局部空间变化进行建模，进而充分利用多维混合类型辅助变量的空间相关性与分异性规律，在大范围高空间分辨率插值情境中，能够根据有限观测站点数据准确插值出各待预测点的目标变量，进而获得地理空间全覆盖数据对应的准确的空间分布图。The spatial interpolation method and device of spatially variable parameter machine learning provided by the present invention, for the case where there are attribute similarity and spatial autocorrelation between the target variable and various auxiliary variables, and the heterogeneity of the relationship between the target variable and various auxiliary variables is strong, the target area is partitioned based on spatial hierarchical heterogeneity to obtain multiple sub-areas, and the target area prediction points are predicted based on the spatially variable parameter machine learning model corresponding to each sub-area to obtain the target interpolation result. Since each spatially variable parameter machine learning model constructed for each sub-area has a specific position and range, the spatial non-stationary relationship between the target variable and the auxiliary variable can be effectively simulated. At the same time, the spatially variable parameter machine learning model is established based on the target variable, the auxiliary variable and the distance of the observation site from the spatially variable parameter machine learning model position of the observation site, so the local spatial changes in the relationship between the target variable and the auxiliary variable can be automatically modeled, and then the spatial correlation and heterogeneity of the multi-dimensional mixed type auxiliary variables are fully utilized. In the large-scale high spatial resolution interpolation scenario, the target variables of each point to be predicted can be accurately interpolated according to the limited observation site data, and then the accurate spatial distribution map corresponding to the full coverage data of the geographic space is obtained.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the present invention or the prior art, the following briefly introduces the drawings required for use in the embodiments or the description of the prior art. Obviously, the drawings described below are some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.

图1是本发明实施例提供的基于空间变参数机器学习的地理空间全覆盖数据生成方法的流程示意图之一；FIG1 is a flow chart of a method for generating geographic space full coverage data based on space-varying parameter machine learning according to an embodiment of the present invention;

图2是本发明实施例提供的基于空间变参数机器学习的地理空间全覆盖数据生成方法的流程示意图之二；FIG2 is a second flow chart of a method for generating geographic space full coverage data based on space-varying parameter machine learning provided by an embodiment of the present invention;

图3是本发明实施例提供的基于平均空间分层异质性构建的第一曲线和第二曲线的示意图；3 is a schematic diagram of a first curve and a second curve constructed based on average spatial hierarchical heterogeneity provided by an embodiment of the present invention;

图4是本发明实施例提供的目标分区状态下各子区域的示意图；FIG4 is a schematic diagram of each sub-area in a target partition state provided by an embodiment of the present invention;

图5是本发明实施例提供的基于空间变参数机器学习的地理空间全覆盖数据生成装置的结构示意图；5 is a schematic diagram of the structure of a device for generating geographic space full coverage data based on space-varying parameter machine learning provided by an embodiment of the present invention;

图6是本发明实施例提供的电子设备的结构示意图。FIG. 6 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面将结合本发明中的附图，对本发明中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purpose, technical solution and advantages of the present invention clearer, the technical solution of the present invention will be clearly and completely described below in conjunction with the drawings of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

本发明针对目标变量与辅助变量间分别同时存在属性相似与空间自相关性，且目标变量与辅助变量之间关系异质性较强的情况，研究能够充分利用多维混合类型辅助变量的空间相关性与分异性规律的基于空间变参数机器学习的地理空间全覆盖数据生成方法，以提高大范围高空间分辨率插值制图的精度。Aiming at the situation where both attribute similarity and spatial autocorrelation exist between target variables and auxiliary variables, and the relationship heterogeneity between the target variables and auxiliary variables is strong, the present invention studies a method for generating geographic space full coverage data based on spatially varying parameter machine learning, which can fully utilize the spatial correlation and heterogeneity laws of multi-dimensional mixed-type auxiliary variables, so as to improve the accuracy of large-scale high spatial resolution interpolation mapping.

针对现有技术中存在的技术问题，本申请实施例提供一种基于空间变参数机器学习的地理空间全覆盖数据生成方法，图1是本发明实施例提供的基于空间变参数机器学习的地理空间全覆盖数据生成方法的流程示意图之一，如图1中所示，该方法包括:In view of the technical problems existing in the prior art, the embodiment of the present application provides a method for generating geographic space full coverage data based on space-varying parameter machine learning. FIG1 is one of the flow diagrams of the method for generating geographic space full coverage data based on space-varying parameter machine learning provided by an embodiment of the present invention. As shown in FIG1, the method includes:

步骤110：对目标区域逐步进行分区，每次分区后基于所述目标区域中各观测站点中的各类辅助变量和目标变量，计算当前分区状态下各类辅助变量分别与目标变量间关系的空间分层异质性。Step 110: partition the target area step by step, and after each partition, based on various auxiliary variables and target variables in each observation station in the target area, calculate the spatial hierarchical heterogeneity of the relationship between each auxiliary variable and the target variable in the current partition state.

具体地，图2是本发明实施例提供的基于空间变参数机器学习的地理空间全覆盖数据生成方法的流程示意图之二，如图2中所示可以基于四叉树递归原理，对目标区域递归的进行格网划分，即将目标区域等分为四个区域，进一步地对四个区域按照预设顺序如左上、右上、左下、右下的顺序逐步进行四分块的划分，每对一个区域进行一次四分块的划分即为对目标区域的一次分区，如此逐步进行分区。每次分区后需要计算当前分区状态下各辅助变量分别与目标变量间关系的空间分层异质性，其中，每次分区后，目标区域的一个网格区域即表示一个分区空间。Specifically, FIG. 2 is a second flow chart of a method for generating geographic spatial full coverage data based on spatially variable parameter machine learning provided by an embodiment of the present invention. As shown in FIG. 2 , the target area can be recursively grid-divided based on the quadtree recursive principle, that is, the target area is equally divided into four areas, and the four areas are further divided into four blocks in a preset order such as upper left, upper right, lower left, and lower right. Each division of an area into four blocks is a partition of the target area, and the partitioning is performed step by step. After each partitioning, the spatial hierarchical heterogeneity of the relationship between each auxiliary variable and the target variable in the current partition state needs to be calculated, wherein, after each partitioning, a grid area of the target area represents a partition space.

在一个实施例中，所述每次分区后基于所述目标区域中各观测站点中的各类辅助变量和目标变量，计算当前分区状态下各类辅助变量分别与目标变量间关系的空间分层异质性，包括：In one embodiment, after each partitioning, based on various auxiliary variables and target variables in each observation station in the target area, the spatial hierarchical heterogeneity of the relationship between each auxiliary variable and the target variable in the current partitioning state is calculated, including:

具体地，如图2所示，基于目标变量与其影响因素数据，确定双变量局部空间自相关系数，影响因素数据即指的是影响目标变量的辅助变量。双变量（双变量即指目标变量和一类辅助变量）局部空间自相关系数（双变量LISA）可以测量两个变量之间相关性的局部异质性。在分别计算出所有观测站点上目标变量（Y）分别与各类辅助变量（X）之间的双变量LISA值后，可根据双变量LISA值的空间分布得出目标变量（Y）分别与各类辅助变量（X）之间关系的空间模式。Specifically, as shown in Figure 2, the bivariate local spatial autocorrelation coefficient is determined based on the target variable and its influencing factor data, where the influencing factor data refers to the auxiliary variables that affect the target variable. The bivariate local spatial autocorrelation coefficient (bivariate LISA) can measure the local heterogeneity of the correlation between two variables. After calculating the bivariate LISA values between the target variable (Y) and each type of auxiliary variable (X) at all observation sites, the spatial pattern of the relationship between the target variable (Y) and each type of auxiliary variable (X) can be obtained based on the spatial distribution of the bivariate LISA values.

在一个实施例中，所述针对所述目标区域中各所述观测站点，计算所述观测站点中的所述目标变量分别与各类所述辅助变量之间的双变量局部空间自相关系数，包括：In one embodiment, for each observation site in the target area, calculating the bivariate local spatial autocorrelation coefficient between the target variable in the observation site and each type of auxiliary variable, respectively, includes:

具体地，每个观测站点上可以有一个目标变量和多类辅助变量，在计算目标变量分别与各类辅助变量之间的双变量局部空间自相关系数时，首先针对各观测站点上的目标变量和各类辅助变量进行标准化处理。标准化处理的过程以一个观测站点上的目标变量为例进行说明：以该观测站点的目标变量减去目标区域的目标变量均值后，再除以目标区域的目标变量标准差，进而得到标准化处理后的目标变量。Specifically, each observation site can have one target variable and multiple types of auxiliary variables. When calculating the bivariate local spatial autocorrelation coefficients between the target variable and each type of auxiliary variable, the target variable and each type of auxiliary variable at each observation site are first standardized. The standardization process is explained using the target variable at an observation site as an example: the target variable of the observation site is subtracted from the mean of the target variable in the target area, and then divided by the standard deviation of the target variable in the target area to obtain the standardized target variable.

在对各目标变量和各辅助变量分别进行标准化处理后，计算各观测站点上的双变量局部莫兰指数，以双变量局部莫兰指数作为双变量局部空间自相关系数。针对各观测站点而言，观测站点上的双变量局部莫兰指数为基于该观测站点上的辅助变量，以及与该观测站点的第一预设数量的邻近观测站点上的目标变量计算确定的。其中，针对一个观测站点而言，一类辅助变量与目标变量对应一个双变量局部莫兰指数，即当该观测站点有5类辅助变量时，则会针对该观测站点确定5个双变量局部莫兰指数。After standardizing each target variable and each auxiliary variable, the bivariate local Moran's index is calculated for each observation site, and the bivariate local Moran's index is used as the bivariate local spatial autocorrelation coefficient. For each observation site, the bivariate local Moran's index at the observation site is calculated based on the auxiliary variables at the observation site and the target variables at the first preset number of neighboring observation sites of the observation site. Among them, for one observation site, one type of auxiliary variable and the target variable correspond to one bivariate local Moran's index, that is, when the observation site has 5 types of auxiliary variables, 5 bivariate local Moran's indexes will be determined for the observation site.

可选地，目标区域各观测站点上的双变量局部莫兰指数可通过下式计算获得：Optionally, the bivariate local Moran index at each observation station in the target area It can be calculated by the following formula:

(1) (1)

其中，上式中，表示目标区域B上观测站点/>的双变量局部莫兰指数，/>的取值范围为[-1，1]，/><0表示目标变量与辅助变量负相关，/>>0表示目标变量与辅助变量正相关；/>表示观测站点/>上的某一类辅助变量值；/>是观测站点/>的目标变量值；/>表示与观测站点/>在空间上邻近的邻近观测站点的数量（即第一预设数量，示例性地，可以从观测站点/>的上下左右分别选择一个邻近观测站点，即第一预设数量/>可以为4）；/>表示基于观测站点/>和观测站点/>之间距离加权的空间权重矩阵。Among them, in the above formula, Indicates the observation site on the target area B/> The bivariate local Moran index of The value range is [-1, 1], /> <0 means the target variable is negatively correlated with the auxiliary variable, /> >0 means the target variable is positively correlated with the auxiliary variable; /> Indicates observation site/> A certain type of auxiliary variable value on; /> It is an observation site/> The target variable value of Indication and observation sites/> The number of adjacent observation sites that are adjacent in space (i.e., the first preset number, exemplarily, can be obtained from the observation sites/> Select one adjacent observation site from the top, bottom, left and right of the image, that is, the first preset number/> Can be 4); /> Indicates based on observation site/> and observation sites/> The spatial weight matrix weighted by the distance between them.

在一个实施例中，所述每次对所述目标区域进行分区后，基于各所述双变量局部自相关系数分别计算当前分区状态下各类所述辅助变量分别与目标变量间关系的空间分层异质性，包括：In one embodiment, after partitioning the target area each time, the spatial hierarchical heterogeneity of the relationship between each type of auxiliary variable and the target variable in the current partition state is calculated based on the local autocorrelation coefficients of each bivariate, including:

具体地，如图2中所示，可以通过地理探测器，基于双变量局部自相关系数计算当前分区状态下各类辅助变量对应的空间分层异质性。由上述实施例可知，每个观测站点对应有多类辅助变量，每类辅助变量均对应有双变量局部空间自相关系数。基于此，每次对目标区域进行分区后，针对当前分区状态下的每个分区空间，根据分区空间内所有的观测站点对应的双变量局部空间自相关系数，计算该分区空间中各类辅助变量分别与目标变量间关系的局部自相关指数方差值。每次对目标区域进行分区后，基于目标区域中所有所述观测站点对应的双变量局部空间自相关系数，分别计算目标区域中各类辅助变量分别与目标变量间关系的全局自相关指数方差值，即每种辅助变量均对应有全局自相关指数方差值。Specifically, as shown in FIG2 , the spatial hierarchical heterogeneity corresponding to each type of auxiliary variable in the current partition state can be calculated based on the bivariate local autocorrelation coefficient through the geographic detector. It can be seen from the above embodiment that each observation site corresponds to multiple types of auxiliary variables, and each type of auxiliary variable corresponds to a bivariate local spatial autocorrelation coefficient. Based on this, each time the target area is partitioned, for each partition space in the current partition state, the local autocorrelation index variance value of the relationship between each type of auxiliary variable and the target variable in the partition space is calculated according to the bivariate local spatial autocorrelation coefficient corresponding to all the observation sites in the partition space. Each time the target area is partitioned, based on the bivariate local spatial autocorrelation coefficient corresponding to all the observation sites in the target area, the global autocorrelation index variance value of the relationship between each type of auxiliary variable in the target area and the target variable is calculated, that is, each auxiliary variable corresponds to a global autocorrelation index variance value.

在获得各类辅助变量分别与目标变量间关系的局部自相关指数方差值和全局自相关指数方差值之后，基于局部自相关指数方差值和全局自相关指数方差值分别计算目标区域在当前分区状态下各类辅助变量分别与目标变量间关系的空间分层异质性：After obtaining the local autocorrelation index variance values and global autocorrelation index variance values of the relationship between each type of auxiliary variable and the target variable, the spatial hierarchical heterogeneity of the relationship between each type of auxiliary variable and the target variable in the target area under the current partition state is calculated based on the local autocorrelation index variance values and the global autocorrelation index variance values. :

(2) (2)

其中，表示第/>类辅助变量对应的空间分层异质性，/>的取值范围为[0，1]，表示目标变量与辅助变量之间的关系没有明显的空间模式，是随机分配到空间位置的，/>表示在同一分区空间内目标变量与辅助变量之间的关系是一致的，而在不同分区空间之间的目标变量与辅助变量之间的关系是不同的；/>表示分区空间；/>表示目标区域中分区空间的个数；/>表示分区空间/>中观测站点的数量；/>表示目标区域中观测站点的数量；/>表示分区空间/>中第/>类辅助变量对应的局部自相关指数方差值；/>表示目标区域中第/>类辅助变量对应的全局自相关指数方差值。in, Indicates the first/> The spatial stratified heterogeneity corresponding to the auxiliary variables of the class, /> The value range of is [0, 1], Indicates that there is no obvious spatial pattern in the relationship between the target variable and the auxiliary variable, and they are randomly assigned to spatial locations. /> It means that the relationship between the target variable and the auxiliary variable in the same partition space is consistent, while the relationship between the target variable and the auxiliary variable in different partition spaces is different;/> Represents partition space; /> Indicates the number of partition spaces in the target area; /> Represents partition space/> The number of observation sites in; /> Indicates the number of observation sites in the target area; /> Represents partition space/> Middle/> The local autocorrelation index variance value corresponding to the auxiliary variable of the class; /> Indicates the target area. The global autocorrelation index variance value corresponding to the class auxiliary variable.

步骤120：基于每次分区后各类辅助变量分别与目标变量间关系的所述空间分层异质性，确定目标分区状态，所述目标分区状态对应所述目标区域的多个子区域。Step 120: Based on the spatial hierarchical heterogeneity of the relationship between each type of auxiliary variable and the target variable after each partition, determine the target partition state, the target partition state corresponding to the multiple sub-regions of the target region.

具体的，在每次分区后，基于各类辅助变量分别与目标变量间关系的空间分层异质性确定目标分区状态，即确定不再进行划分后，将当前的分区状态确定为最终的目标分区状态，目标分区状态对应目标区域的多个子区域。Specifically, after each partitioning, the target partitioning state is determined based on the spatial hierarchical heterogeneity of the relationship between each type of auxiliary variable and the target variable. That is, after it is determined that no further partitioning is required, the current partitioning state is determined as the final target partitioning state, and the target partitioning state corresponds to multiple sub-areas of the target area.

在一个实施例中，所述基于每次分区后各类辅助变量分别与目标变量间关系的所述空间分层异质性，确定目标分区状态，包括：In one embodiment, the determining of the target partition state based on the spatial hierarchical heterogeneity of the relationship between each type of auxiliary variable and the target variable after each partition includes:

具体地，每次分区后，基于针对各类辅助变量分别计算的空间分层异质性，计算每次分区后目标区域对应的平均空间分层异质性。示例性的，当目标区域中各观测站点对应的辅助变量有5类时，基于上述实施例，可以分别计算获得5类辅助变量的空间分层异质性、/>、/>、/>和/>，进而计算确定平均空间分层异质性/>。Specifically, after each partition, the average spatial stratified heterogeneity corresponding to the target area after each partition is calculated based on the spatial stratified heterogeneity calculated for each type of auxiliary variable. For example, when there are 5 types of auxiliary variables corresponding to each observation station in the target area, based on the above embodiment, the spatial stratified heterogeneity of the 5 types of auxiliary variables can be calculated respectively. 、/> 、/> 、/> and/> , and then calculate and determine the average spatial stratified heterogeneity/> .

图3是本发明实施例提供的基于平均空间分层异质性构建的第一曲线和第二曲线的示意图。如图3中所示，基于每次分区后计算的平均空间分层异质性，可以构建第一变化曲线，第一变化曲线表征了每次分区后的平均空间分层异质性的变化情况。或者，基于相邻两次分区后的所述平均空间分层异质性的差值，可以构建第二变化曲线，第二变化曲线表征了相邻两次分区对应的平均空间分层异质性的差值。将第一变化曲线的拐点或第二变化曲线的拐点对应的分区状态确定为目标分区状态，第一变化曲线拐点即该点之后的进行分区后分区状态对应的平均空间分层异质性变化趋于平稳，第二变化曲线的拐点即该点之后相邻两次分区后平均空间分层异质性的差值的变化趋于稳定。FIG3 is a schematic diagram of a first curve and a second curve constructed based on average spatial stratified heterogeneity provided by an embodiment of the present invention. As shown in FIG3, based on the average spatial stratified heterogeneity calculated after each partition, a first change curve can be constructed, and the first change curve represents the change of the average spatial stratified heterogeneity after each partition. Alternatively, based on the difference between the average spatial stratified heterogeneity after two adjacent partitions, , a second change curve can be constructed, which represents the difference in average spatial stratified heterogeneity corresponding to two adjacent partitions. The partition state corresponding to the inflection point of the first change curve or the inflection point of the second change curve is determined as the target partition state. The inflection point of the first change curve, that is, the average spatial stratified heterogeneity of the partition state after the partition is carried out tends to be stable, and the inflection point of the second change curve, that is, the difference in average spatial stratified heterogeneity after two adjacent partitions, tends to be stable.

上述实施例中的基于空间变参数机器学习的地理空间全覆盖数据生成方法，目标分区状态及目标分区状态下各分区空间的空间位置和空间范围是基于目标变量和辅助变量之间的自相关性，通过地理探测器模型基于四叉树原理递归确定的，可以有效模拟变量间的空间非平稳关系。In the above-mentioned embodiment, the method for generating geographic space full coverage data based on spatially variable parameter machine learning, the target partition state and the spatial position and spatial range of each partition space under the target partition state are based on the autocorrelation between the target variable and the auxiliary variable, and are recursively determined by the geographic detector model based on the quadtree principle, which can effectively simulate the spatial non-stationary relationship between variables.

步骤130：在所述目标分区状态下，针对所述目标区域中的各所述子区域分别构建空间变参数机器学习模型，各所述空间变参数机器学习模型分别为基于对应的空间范围内各观测站点的目标变量、辅助变量，以及各所述观测站点至对应的所述空间变参数机器学习模型的距离训练获得的；各所述空间变参数机器学习模型分别包括位置信息和空间范围信息，所述位置信息为所述空间变参数机器学习模型对应的所述子区域的中心坐标，所述空间范围信息为所述空间变参数机器学习模型对应的所述子区域的大小。Step 130: In the target partition state, a spatially varying parameter machine learning model is constructed for each sub-region in the target region, each of which is respectively obtained by training based on the target variables, auxiliary variables of each observation site within the corresponding spatial range, and the distance from each observation site to the corresponding spatially varying parameter machine learning model; each of the spatially varying parameter machine learning models includes location information and spatial range information, the location information is the center coordinates of the sub-region corresponding to the spatially varying parameter machine learning model, and the spatial range information is the size of the sub-region corresponding to the spatially varying parameter machine learning model.

在一个实施例中，各所述子区域对应的各所述空间变参数机器学习模型分别为基于下述方法训练获得的：In one embodiment, each of the spatially variable parameter machine learning models corresponding to each of the sub-regions is trained based on the following method:

具体地，目标分区状态下，目标区域包括多个子区域，针对各子区域，在子区域中心位置设置带位置信息和空间范围的空间边参数机器学习模型，并分别训练各子区域对应的空间变参数机器学习模型。如图2所示，各子区域对应的空间变参数机器学习模型的具体训练过程为：Specifically, in the target partition state, the target area includes multiple sub-areas. For each sub-area, a spatial edge parameter machine learning model with location information and spatial range is set at the center of the sub-area, and the spatial variable parameter machine learning model corresponding to each sub-area is trained separately. As shown in Figure 2, the specific training process of the spatial variable parameter machine learning model corresponding to each sub-area is as follows:

计算子区域的中心点坐标，确定子区域的大小，将子区域的中心坐标确定为对应的待训练的空间变参数机器学习模型的位置（S），将子区域的大小确定为对应的待训练的空间变参数机器学习模型（即随机森林模型RF）的空间范围（E）。图4是本发明实施例提供的目标分区状态下各子区域的示意图。如图4中所示，图中的各矩形方格表示各子区域的大小，也即待训练的空间变参数机器学习模型的空间范围；各子区域中的大圆点表示子区域的中心点，也即待训练的空间变参数机器学习模型的位置，图4中的各小圆点表示的是观测站点的位置分布。Calculate the coordinates of the center point of the sub-region, determine the size of the sub-region, determine the center coordinates of the sub-region as the position ( S ) of the corresponding spatially varying parameter machine learning model to be trained, and determine the size of the sub-region as the spatial range ( E ) of the corresponding spatially varying parameter machine learning model to be trained (i.e., random forest model RF ). FIG. 4 is a schematic diagram of each sub-region under the target partition state provided by an embodiment of the present invention. As shown in FIG. 4, each rectangular square in the figure represents the size of each sub-region, that is, the spatial range of the spatially varying parameter machine learning model to be trained; the large dot in each sub-region represents the center point of the sub-region, that is, the position of the spatially varying parameter machine learning model to be trained, and each small dot in FIG. 4 represents the location distribution of the observation site.

获取子区域内所有观测站点的目标变量和辅助变量，分别确定子区域内各观测站点与该子区域的中心点（也即待训练的空间变参数机器学习模型的位置）的距离，以该子区域覆盖的观测站点的数据为样本，将观测站点的目标变量、辅助变量以及观测站点与对应的待训练的空间变参数机器学习模型的距离带入公式（3）以训练空间变参数机器学习模型：Obtain the target variables and auxiliary variables of all observation sites in the sub-region, determine the distance between each observation site in the sub-region and the center point of the sub-region (that is, the location of the spatially varying parameter machine learning model to be trained), use the data of the observation sites covered by the sub-region as samples, and bring the target variables, auxiliary variables of the observation sites, and the distance between the observation sites and the corresponding spatially varying parameter machine learning model to be trained into formula (3) to train the spatially varying parameter machine learning model:

(3) (3)

其中，上式中Y表示目标变量的取值；和/>分别表示待训练的空间变参数机器学习模型的位置信息和空间范围信息；/>表示影响Y的辅助变量；/>表示待训练的空间变参数机器学习模型的空间范围/>覆盖下的各观测站点到此空间变参数机器学习模型的位置/>的距离。In the above formula, Y represents the value of the target variable; and/> Respectively represent the location information and spatial range information of the space-varying parameter machine learning model to be trained; /> Represents auxiliary variables that affect Y; /> Represents the spatial extent of the spatially varying parameter machine learning model to be trained/> The location of each observation station under coverage to this spatially variable parameter machine learning model/> distance.

添加作为训练数据，是为了模拟目标变量与辅助变量之间关系的局部空间变化，这意味着所有辅助变量值相同的两个点，如果与空间机器学习模型的距离不同，其目标变量值也可能是不同的，而这种可能的差异是空间变参数机器学习模型根据观测站点数据预测出来的。Add to As training data, it is intended to simulate the local spatial variation of the relationship between the target variable and the auxiliary variables, which means that two points with the same values of all auxiliary variables may have different target variable values if they are at different distances from the spatial machine learning model, and this possible difference is predicted by the spatially variable parameter machine learning model based on the observation site data.

在具体训练过程中，为确保待训练的空间变参数机器学习模型能够有足够的训练样本，可以将待训练的空间变参数机器学习模型的空间范围放大，以扩大加入其训练的观测站点的数量，具体地放大程度可以根据预设的最小训练数据量确定，例如当训练模型最少需要其范围内有100个观测站点时，则扩大待训练的空间变参数机器学习模型的空间范围/>，以使其范围内至少包括100个观测站点。这种放大是合理的，因为机器学习可以建立比线性模型更复杂的模型，而且可以对关系中的局部变化进行建模。另外，在空间变化系数模型中，部分训练数据的重叠也是可以接受的。In the specific training process, in order to ensure that the spatially varying parameter machine learning model to be trained has sufficient training samples, the spatial range of the spatially varying parameter machine learning model to be trained can be Zoom in to expand the number of observation sites added to its training. The specific zoom level can be determined according to the preset minimum amount of training data. For example, when the training model requires at least 100 observation sites within its scope, the spatial scope of the spatially variable parameter machine learning model to be trained is expanded. , so that its range includes at least 100 observation sites. This enlargement is reasonable because machine learning can build more complex models than linear models and can model local changes in relationships. In addition, in the spatial variation coefficient model, some overlap of training data is acceptable.

相比现有的插值方法，空间变参数机器学习模型可利用高维辅助变量，在避免共线性的同时模拟目标变量及其辅助变量之间的空间非平稳关系，因此具有下述有益效果：Compared with existing interpolation methods, spatially varying parameter machine learning models can utilize high-dimensional auxiliary variables to simulate the spatial non-stationary relationship between the target variable and its auxiliary variables while avoiding collinearity, thus having the following beneficial effects:

（1）针对受复杂地形、人为活动等条件影响，目标变量与其辅助变量之间的关系存在空间非平稳性的问题，对每个子区域对应的空间变参数机器学习模型设置了空间位置和范围两个主要参数，此外，在训练时添加空间变参数机器学习模型与观测站点的距离作为一个关键变量，以自动模拟目标变量和辅助变量间关系的局部空间变化，从而正确利用空间自相关规律，进而提高了模型的预测精度。(1) In order to address the problem of spatial non-stationarity in the relationship between the target variable and its auxiliary variables due to the influence of complex terrain, human activities and other conditions, two main parameters, spatial location and range, were set for the spatially varying parameter machine learning model corresponding to each sub-region. In addition, the distance between the spatially varying parameter machine learning model and the observation site was added as a key variable during training to automatically simulate the local spatial variation of the relationship between the target variable and the auxiliary variables, thereby correctly utilizing the spatial autocorrelation law and improving the prediction accuracy of the model.

（2）针对多元线性回归模型存在因子共线性，而机器学习可解释性差的问题，上述实施例中的空间变参数机器学习模型，能够有效处理多维辅助变量高度交互作用的情况，并可以较好的模拟目标变量与其辅助变量间的非线性关系。(2) In order to address the problem of factor collinearity in the multivariate linear regression model and poor interpretability of machine learning, the spatially variable parameter machine learning model in the above embodiment can effectively handle the situation where multidimensional auxiliary variables have high interactions, and can better simulate the nonlinear relationship between the target variable and its auxiliary variables.

步骤140：基于各所述子区域分别对应的各空间变参数机器学习模型对所述目标区域中预设的各待预测点的目标变量分别进行插值预测，获得目标插值结果和不确定性分析结果。Step 140: Based on the spatially variable parameter machine learning models corresponding to the sub-regions, interpolation prediction is performed on the target variables of the preset points to be predicted in the target region to obtain target interpolation results and uncertainty analysis results.

在一个实施例中，所述基于各所述子区域分别对应的各空间变参数机器学习模型对所述目标区域中预设的各待预测点的目标变量分别进行插值预测，获得目标插值结果和不确定性分析结果，包括：In one embodiment, the interpolation prediction of the target variables of each preset point to be predicted in the target area based on each space-varying parameter machine learning model corresponding to each sub-area is performed to obtain the target interpolation result and the uncertainty analysis result, including:

具体地，在确定目标分区状态后，基于目标分区状态下各子区域对应的各空间变参数机器学习模型分别对目标区域中的各待预测点进行插值预测，获得目标插值结果的方法具体为：Specifically, after determining the target partition state, interpolation prediction is performed on each point to be predicted in the target area based on each spatially variable parameter machine learning model corresponding to each sub-area under the target partition state. The method for obtaining the target interpolation result is specifically as follows:

针对每个待预测点，分别采用多个空间变参数机器学习模型进行预测，进而获得多个插值预测结果。其中，多个空间变参数机器学习模型包括与该待预测点邻近的第二预设数量个空间变参数机器学习模型，例如，可以选择与该待预测点最邻近的4个空间变参数机器学习模型。For each point to be predicted, multiple space-varying parameter machine learning models are used to perform prediction, thereby obtaining multiple interpolation prediction results. The multiple space-varying parameter machine learning models include a second preset number of space-varying parameter machine learning models adjacent to the point to be predicted. For example, the four space-varying parameter machine learning models closest to the point to be predicted can be selected.

进一步地，基于反距离加权的方式，根据多个插值预测结果确定待预测点的目标插值结果。基于空间变参数机器学习的地理空间全覆盖数据生成方法中，各待预测点的目标插值结果是和待预测点空间邻近的空间变参数机器学习模型的插值预测结果的线性组合，即目标插值结果为：Furthermore, based on the inverse distance weighted method, the target interpolation result of the point to be predicted is determined according to multiple interpolation prediction results. In the method for generating geographic spatial full coverage data based on spatially variable parameter machine learning, the target interpolation result of each point to be predicted is a linear combination of the interpolation prediction results of the spatially variable parameter machine learning model of the spatially adjacent points to be predicted, that is, the target interpolation result for:

(4) (4)

其中，上式中表示待预测点/>的目标插值结果（也即模型预测的目标变量）；表示第/>个空间变参数机器学习模型/>对待预测点/>的插值预测结果；/>表示针对待预测点/>进行预测的空间变参数机器学习模型的个数（即第二预设数量）；/>表示空间变参数机器学习模型/>对应的权重值，即估计的回归系数，各用于预测的空间变参数机器学习模型的权重值在空间上各不相同，该权重值是基于反距离加权的方式确定的；表示待预测点/>的位置坐标。Among them, in the above formula Indicates the point to be predicted/> The target interpolation result (that is, the target variable predicted by the model); Indicates the first/> A space-varying parameter machine learning model/> Treat prediction points/> The interpolation prediction result of Indicates the point to be predicted/> The number of spatially varying parameter machine learning models for prediction (i.e., the second preset number);/> Representing spatially varying parameter machine learning models/> The corresponding weight values, i.e., the estimated regression coefficients, of the spatially varying parameter machine learning models used for prediction are spatially different and are determined based on the inverse distance weighting method; Indicates the point to be predicted/> The location coordinates of .

可选地，可以使用多种核函数来确定公式（4）中各空间变参数机器学习模型分别对应的权重。如图2中所示，本申请实施例使用五种核函数进行了交叉试验，以确定误差较小的核函数进行插值预测。本申请实施例使用的五种核函数包括：最近邻、等权重（EW）、反距离加权（IDW）、高斯加权（GW）和自适应高斯加权（GAW），具体介绍如下：Optionally, multiple kernel functions can be used to determine the weights corresponding to each spatially variable parameter machine learning model in formula (4). As shown in Figure 2, the embodiment of the present application uses five kernel functions for cross-testing to determine the kernel function with smaller error for interpolation prediction. The five kernel functions used in the embodiment of the present application include: nearest neighbor, equal weight (EW), inverse distance weighted (IDW), Gaussian weighted (GW) and adaptive Gaussian weighted (GAW), which are specifically introduced as follows:

（1）最近邻：以空间上距离待预测点最近的空间变参数机器学习模型预测值为最终目标插值结果。(1) Nearest neighbor: The prediction value of the spatially variable parameter machine learning model that is closest to the point to be predicted is used as the final target interpolation result.

（2）EW：使用待预测点周围最近的个空间变参数机器学习模型进行预测，并将各插值预测结果的平均值作为最终目标插值结果。(2) EW: Use the nearest A spatially variable parameter machine learning model is used for prediction, and the average value of each interpolation prediction result is taken as the final target interpolation result.

（3）IDW：使用待预测点周围最近的个空间变参数机器学习模型进行预测，并对各插值预测结果进行反距离加权，得出最终目标插值结果，各空间变参数机器学习模型的权重/>公式如下：(3) IDW: Use the nearest neighboring points to be predicted The spatially variable parameter machine learning model is used for prediction, and the inverse distance weighting is performed on each interpolation prediction result to obtain the final target interpolation result. The weight of each spatially variable parameter machine learning model is The formula is as follows:

(5) (5)

其中，上式中表示一个任意的正实数，通常设为2；/>表示空间变参数机器学习模型/>到待预测点/>的欧氏距离；/>表示针对待预测点/>进行预测的空间变参数机器学习模型的个数。Among them, in the above formula Represents an arbitrary positive real number, usually set to 2; /> Representing spatially varying parameter machine learning models/> To the predicted point/> Euclidean distance; /> Indicates the point to be predicted/> The number of spatially varying parameter machine learning models used to make predictions.

（4）GW：使用待预测点周围最近的个空间变参数机器学习模型进行预测，并对插值预测结果进行高斯加权，各空间变参数机器学习模型的权重/>公式如下：(4) GW: Use the nearest neighboring point to be predicted The prediction is made by using a spatially variable parameter machine learning model, and the interpolation prediction results are Gaussian weighted. The weight of each spatially variable parameter machine learning model/> The formula is as follows:

(6) (6)

其中，上式中表示待预测点/>的位置坐标，/>表示空间变参数机器学习模型/>到待预测点/>的欧氏距离，计算公式为/>，/>是常数，/>表示空间变参数机器学习模型/>的位置坐标。Among them, in the above formula Indicates the point to be predicted/> The location coordinates of Representing spatially varying parameter machine learning models/> To the predicted point/> The Euclidean distance is calculated as/> ,/> is a constant, /> Representing spatially varying parameter machine learning models/> The location coordinates of .

（5）GAW：使用待预测点周围最近的个空间变参数机器学习模型进行预测，并对最终目标插值结果进行高斯自适应加权，其公式如下：(5) GAW: Use the nearest A spatially variable parameter machine learning model is used for prediction, and Gaussian adaptive weighting is performed on the final target interpolation result. The formula is as follows:

(7) (7)

其中，上式中表示待预测点/>的位置坐标，/>表示空间变参数机器学习模型/>到待预测点/>的欧氏距离；/>表示优化参数。Among them, in the above formula Indicates the point to be predicted/> The location coordinates of Representing spatially varying parameter machine learning models/> To the predicted point/> Euclidean distance of; /> Represents the optimization parameters.

经交叉试验，误差较小的核函数为反距离加权（IDW），因此，确定基于反距离加权的方式，确定各空间变参数机器学习模型对应的权重值。After cross-testing, the kernel function with the smallest error is the inverse distance weighted (IDW). Therefore, the weight value corresponding to each spatial variable parameter machine learning model is determined based on the inverse distance weighted method.

可以理解的是，空间变参数机器学习模型在进行预测时会产生一定的误差，因此，最终的目标插值结果的不确定性分析结果可以根据各空间变参数机器学习模型的不确定性来衡量，即最终目标插值结果的误差方差（即不确定性分析结果）可以转化为各空间变参数机器学习模型的误差方差的线性组合，误差方差的公式如下：It is understandable that the spatially varying parameter machine learning model will produce certain errors when making predictions. Therefore, the uncertainty analysis result of the final target interpolation result can be measured according to the uncertainty of each spatially varying parameter machine learning model. That is, the error variance of the final target interpolation result (that is, the uncertainty analysis result) can be converted into a linear combination of the error variances of each spatially varying parameter machine learning model. The formula is as follows:

(8) (8)

其中，上式中V表示针对待预测点进行预测的空间变参数机器学习模型的插值预测结果的方差统计量；表示插值预测结果；/>表示真实值；/>表示针对待预测点进行预测的空间变参数机器学习模型的个数；/>表示第/>个空间变参数机器学习模型的插值预测结果；/>表示第/>个空间变参数机器学习模型的权重。Wherein, V in the above formula represents the variance statistic of the interpolation prediction result of the spatially varying parameter machine learning model for the predicted point; Indicates the interpolation prediction result; /> Represents a true value; /> Indicates the number of spatially variable parameter machine learning models used to predict the points to be predicted; /> Indicates the first/> Interpolation prediction results of a spatially variable parameter machine learning model; /> Indicates the first/> The weights of a spatially varying parameter machine learning model.

由于各空间变参数机器学习模型是独立训练建立的，因此不同空间变参数机器学习模型的预测误差可视为独立的，基于此，式（8）可进一步推导为式（9）：Since each spatially varying parameter machine learning model is trained and established independently, the prediction errors of different spatially varying parameter machine learning models can be considered independent. Based on this, equation (8) can be further derived as equation (9):

(9) (9)

其中，是第/>个空间变参数机器学习模型的插值预测结果的误差方差。in, It is the first/> The error variance of the interpolated prediction results of a spatially varying parameter machine learning model.

进一步的，还可以根据空间变参数机器学习模型对目标区域中各待预测点的预测结果来确定空间插值预测的置信区间。Furthermore, the confidence interval of the spatial interpolation prediction can be determined based on the prediction results of the spatially varying parameter machine learning model for each point to be predicted in the target area.

通过对预测结果进行不确定性估计，给出预测结果的误差方差，并通过预测结果给出空间插值预测的置信区间，进一步的提高了空间变参数机器学习模型的可解释性。By estimating the uncertainty of the prediction results, giving the error variance of the prediction results, and giving the confidence interval of the spatial interpolation prediction through the prediction results, the interpretability of the spatially varying parameter machine learning model is further improved.

本发明提供的空间变参数机器学习的空间插值方法，针对目标变量与各类辅助变量间分别同时存在属性相似与空间自相关性，且目标变量与各类辅助变量之间关系异质性较强的情况，通过对目标区域基于空间分层异质性进行分区，获得多个子区域，并基于各子区域对应的空间变参数机器学习模型对目标区域待预测点进行预测获得目标插值结果。由于针对各子区域构建的各空间变参数机器学习模型均有特定的位置和范围，因此可以有效的模拟目标变量和辅助变量之间的空间非平稳关系，同时，该空间变参数机器学习模型是基于观测站点的目标变量、辅助变量和观测站点距空间变参数机器学习模型位置的距离训练建立的，因此可以自动对目标变量和辅助变量之间关系中的局部空间变化进行建模，进而充分利用多维混合类型辅助变量的空间相关性与分异性规律，在大范围高空间分辨率插值情境中，能够根据有限观测站点数据准确插值出各待预测点的目标变量，进而获得地理空间全覆盖数据对应的准确的空间分布图。The spatial interpolation method of spatially variable parameter machine learning provided by the present invention is aimed at the situation that there are attribute similarity and spatial autocorrelation between the target variable and various auxiliary variables at the same time, and the relationship heterogeneity between the target variable and various auxiliary variables is strong. By partitioning the target area based on spatial hierarchical heterogeneity, multiple sub-areas are obtained, and the target area prediction points are predicted based on the spatially variable parameter machine learning model corresponding to each sub-area to obtain the target interpolation result. Since each spatially variable parameter machine learning model constructed for each sub-area has a specific position and range, the spatial non-stationary relationship between the target variable and the auxiliary variable can be effectively simulated. At the same time, the spatially variable parameter machine learning model is established based on the target variable, the auxiliary variable and the distance of the observation site from the spatially variable parameter machine learning model position of the observation site, so the local spatial changes in the relationship between the target variable and the auxiliary variable can be automatically modeled, and then the spatial correlation and heterogeneity of the multi-dimensional mixed type auxiliary variables are fully utilized. In the large-scale high spatial resolution interpolation scenario, the target variables of each point to be predicted can be accurately interpolated according to the limited observation site data, and then the accurate spatial distribution map corresponding to the full coverage data of the geographic space is obtained.

下面对本发明提供的基于空间变参数机器学习的地理空间全覆盖数据生成装置进行描述，下文描述的基于空间变参数机器学习的地理空间全覆盖数据生成装置与上文描述的基于空间变参数机器学习的地理空间全覆盖数据生成方法可相互对应参照。The following is a description of the device for generating geographic space full coverage data based on spatially varying parameter machine learning provided by the present invention. The device for generating geographic space full coverage data based on spatially varying parameter machine learning described below and the method for generating geographic space full coverage data based on spatially varying parameter machine learning described above can be referenced to each other.

图5是本发明实施例提供的基于空间变参数机器学习的地理空间全覆盖数据生成装置的结构示意图，如图5中所示，所述基于空间变参数机器学习的地理空间全覆盖数据生成装置500包括：分区模块510、确定模块520、建模模块530和预测模块540；FIG5 is a schematic diagram of the structure of a device for generating geographic space full coverage data based on space-varying parameter machine learning according to an embodiment of the present invention. As shown in FIG5 , the device 500 for generating geographic space full coverage data based on space-varying parameter machine learning includes: a partitioning module 510, a determination module 520, a modeling module 530, and a prediction module 540;

分区模块510，用于对目标区域逐步进行分区，每次分区后基于所述目标区域中各观测站点中的各类辅助变量和目标变量，计算当前分区状态下各类辅助变量与目标变量间关系的空间分层异质性；Partitioning module 510, for partitioning the target area step by step, and after each partitioning, based on various auxiliary variables and target variables in each observation station in the target area, calculating the spatial hierarchical heterogeneity of the relationship between various auxiliary variables and target variables in the current partitioning state;

确定模块520，用于基于每次分区后各类辅助变量分别与目标变量间关系的所述空间分层异质性，确定目标分区状态，所述目标分区状态对应所述目标区域的多个子区域；A determination module 520, configured to determine a target partition state based on the spatial hierarchical heterogeneity of the relationship between each type of auxiliary variable and the target variable after each partition, wherein the target partition state corresponds to a plurality of sub-regions of the target region;

建模模块530，用于在所述目标分区状态下，针对所述目标区域中的各所述子区域分别构建空间变参数机器学习模型，各所述空间变参数机器学习模型分别为基于对应的空间范围内各观测站点的目标变量、辅助变量，以及各所述观测站点至对应的所述空间变参数机器学习模型的距离训练获得的；各所述空间变参数机器学习模型分别包括位置信息和空间范围信息，所述位置信息为所述空间变参数机器学习模型对应的所述子区域的中心坐标，所述空间范围信息为所述空间变参数机器学习模型对应的所述子区域的大小；Modeling module 530, for constructing a spatially varying parameter machine learning model for each of the sub-regions in the target region in the target partition state, each of the spatially varying parameter machine learning models is obtained by training based on the target variables and auxiliary variables of each observation site within the corresponding spatial range, and the distance from each observation site to the corresponding spatially varying parameter machine learning model; each of the spatially varying parameter machine learning models includes position information and spatial range information, the position information is the center coordinates of the sub-region corresponding to the spatially varying parameter machine learning model, and the spatial range information is the size of the sub-region corresponding to the spatially varying parameter machine learning model;

预测模块540，用于基于各所述子区域分别对应的各空间变参数机器学习模型对所述目标区域中预设的各待预测点的目标变量分别进行插值预测，获得目标插值结果和不确定性分析结果。The prediction module 540 is used to perform interpolation prediction on the target variables of each preset point to be predicted in the target area based on the spatially variable parameter machine learning models corresponding to each sub-area, so as to obtain the target interpolation result and the uncertainty analysis result.

本发明提供的空间变参数机器学习的空间插值装置，针对目标变量与各类辅助变量间分别同时存在属性相似与空间自相关性，且目标变量与各类辅助变量之间关系异质性较强的情况，通过对目标区域基于空间分层异质性进行分区，获得多个子区域，并基于各子区域对应的空间变参数机器学习模型对目标区域待预测点进行预测获得目标插值结果。由于针对各子区域构建的各空间变参数机器学习模型均有特定的位置和范围，因此可以有效的模拟目标变量和辅助变量之间的空间非平稳关系，同时，该空间变参数机器学习模型是基于观测站点的目标变量、辅助变量和观测站点距空间变参数机器学习模型位置的距离训练建立的，因此可以自动对目标变量和辅助变量之间关系中的局部空间变化进行建模，进而充分利用多维混合类型辅助变量的空间相关性与分异性规律，在大范围高空间分辨率插值情境中，能够根据有限观测站点数据准确插值出各待预测点的目标变量，进而获得地理空间全覆盖数据对应的准确的空间分布图。The spatial interpolation device of spatially variable parameter machine learning provided by the present invention, for the case where there are attribute similarity and spatial autocorrelation between the target variable and various auxiliary variables, and the heterogeneity of the relationship between the target variable and various auxiliary variables is strong, by partitioning the target area based on spatial hierarchical heterogeneity, a plurality of sub-areas are obtained, and the target interpolation result is obtained by predicting the target area to be predicted points based on the spatially variable parameter machine learning model corresponding to each sub-area. Since each spatially variable parameter machine learning model constructed for each sub-area has a specific position and range, the spatial non-stationary relationship between the target variable and the auxiliary variable can be effectively simulated. At the same time, the spatially variable parameter machine learning model is established based on the target variable, the auxiliary variable and the distance of the observation site from the spatially variable parameter machine learning model position of the observation site, so the local spatial changes in the relationship between the target variable and the auxiliary variable can be automatically modeled, and then the spatial correlation and heterogeneity of the multi-dimensional mixed type auxiliary variables are fully utilized. In the large-scale high spatial resolution interpolation scenario, the target variables of each point to be predicted can be accurately interpolated according to the limited observation site data, and then the accurate spatial distribution map corresponding to the full coverage data of the geographic space is obtained.

在一个实施例中，所述分区模块510具体用于：In one embodiment, the partition module 510 is specifically used for:

针对所述目标区域中各所述观测站点，计算所述观测站点中的所述目标变量分别与各类所述辅助变量之间的双变量局部空间自相关系数双变量局部空间自相关系数；For each observation site in the target area, calculate the bivariate local spatial autocorrelation coefficient between the target variable in the observation site and each type of auxiliary variable;

每次对所述目标区域进行分区后，基于各所述双变量局部自相关系数分别计算当前分区状态下各类所述辅助变量与目标变量间关系的空间分层异质性。After the target area is partitioned each time, the spatial hierarchical heterogeneity of the relationship between each type of auxiliary variable and the target variable in the current partition state is calculated based on the local autocorrelation coefficient of each bivariate.

每次对所述目标区域进行分区后，基于当前分区状态下各分区空间内所有所述观测站点对应的所述双变量局部空间自相关系数，计算各所述分区空间中各类辅助变量分别对应的局部自相关指数方差值；After the target area is partitioned each time, based on the local spatial autocorrelation coefficients of the two variables corresponding to all the observation sites in each partition space under the current partition state, the local autocorrelation index variance values corresponding to each type of auxiliary variable in each partition space are calculated;

在一个实施例中，所述确定模块520具体用于：In one embodiment, the determining module 520 is specifically configured to:

在一个实施例中，各目标分区空间对应的各所述空间变参数机器学习模型分别为基于下述方法训练获得的：In one embodiment, the space-varying parameter machine learning models corresponding to the target partition spaces are respectively trained based on the following methods:

在一个实施例中，所述预测模块540具体用于：In one embodiment, the prediction module 540 is specifically used to:

图6示例了一种电子设备的实体结构示意图，如图6所示，该电子设备可以包括：处理器(processor)610、通信接口(Communications Interface)620、存储器(memory)630和通信总线640，其中，处理器610，通信接口620，存储器630通过通信总线640完成相互间的通信。处理器610可以调用存储器630中的逻辑指令，以执行基于空间变参数机器学习的地理空间全覆盖数据生成方法，该方法包括：FIG6 illustrates a schematic diagram of the physical structure of an electronic device. As shown in FIG6 , the electronic device may include: a processor 610, a communication interface 620, a memory 630, and a communication bus 640, wherein the processor 610, the communication interface 620, and the memory 630 communicate with each other through the communication bus 640. The processor 610 may call the logic instructions in the memory 630 to execute a method for generating geographic space full coverage data based on space-varying parameter machine learning, the method comprising:

基于各所述子区域分别对应的各空间变参数机器学习模型对所述目标区域中预设的各待预测点的目标变量分别进行插值预测，获得目标插值结果和不确定性分析结果。Based on the spatially variable parameter machine learning models corresponding to the sub-regions, interpolation prediction is performed on the target variables of the preset points to be predicted in the target region to obtain target interpolation results and uncertainty analysis results.

此外，上述的存储器630中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备（可以是个人计算机，服务器，或者网络设备等）执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器（ROM，Read-Only Memory）、随机存取存储器（RAM，Random Access Memory）、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the logic instructions in the above-mentioned memory 630 can be implemented in the form of a software functional unit and can be stored in a computer-readable storage medium when it is sold or used as an independent product. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art or the part of the technical solution, can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including a number of instructions to enable a computer device (which can be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk, etc. Various media that can store program codes.

另一方面，本发明还提供一种计算机程序产品，所述计算机程序产品包括计算机程序，计算机程序可存储在非暂态计算机可读存储介质上，所述计算机程序被处理器执行时，计算机能够执行上述各方法所提供的基于空间变参数机器学习的地理空间全覆盖数据生成方法，该方法包括：On the other hand, the present invention also provides a computer program product, the computer program product includes a computer program, the computer program can be stored on a non-transitory computer-readable storage medium, when the computer program is executed by a processor, the computer can execute the method for generating geographic space full coverage data based on space-varying parameter machine learning provided by the above methods, the method includes:

又一方面，本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现以执行上述各方法提供的基于空间变参数机器学习的地理空间全覆盖数据生成方法，该方法包括：In another aspect, the present invention further provides a non-transitory computer-readable storage medium having a computer program stored thereon, which is implemented when the computer program is executed by a processor to execute the method for generating geographic space full coverage data based on space-varying parameter machine learning provided by the above methods, the method comprising:

以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. Ordinary technicians in this field can understand and implement it without paying creative labor.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备（可以是个人计算机，服务器，或者网络设备等）执行各个实施例或者实施例的某些部分所述的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that each implementation method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be implemented by hardware. Based on this understanding, the above technical solution is essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, a disk, an optical disk, etc., including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in each embodiment or some parts of the embodiments.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit it. Although the present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or make equivalent replacements for some of the technical features therein. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for generating geographic space full coverage data based on spatially variable parameter machine learning, characterized by comprising:

The target area is partitioned step by step, and after each partition, based on various auxiliary variables and target variables in each observation station in the target area, the spatial hierarchical heterogeneity of the relationship between various auxiliary variables and the target variables in the current partition state is calculated;

Based on the spatial hierarchical heterogeneity of the relationship between each type of auxiliary variable and the target variable after each partition, determine the target partition state, the target partition state corresponds to multiple sub-regions of the target region;

In the target partition state, a spatially varying parameter machine learning model is constructed for each sub-region in the target region, each of which is obtained by training based on the target variables and auxiliary variables of each observation site within the corresponding spatial range, and the distance from each observation site to the corresponding spatially varying parameter machine learning model; each of the spatially varying parameter machine learning models includes position information and spatial range information, the position information is the center coordinates of the sub-region corresponding to the spatially varying parameter machine learning model, and the spatial range information is the size of the sub-region corresponding to the spatially varying parameter machine learning model;

Based on the spatially variable parameter machine learning models corresponding to the sub-regions, interpolation prediction is performed on the target variables of the preset points to be predicted in the target region to obtain target interpolation results and uncertainty analysis results.

2. The method for generating geographic space full coverage data based on spatially variable parameter machine learning according to claim 1 is characterized in that after each partitioning, based on various auxiliary variables and target variables in each observation station in the target area, the spatial hierarchical heterogeneity of the relationship between various auxiliary variables and the target variable in the current partition state is calculated, including:

For each of the observation sites in the target area, calculating the bivariate local spatial autocorrelation coefficient between the target variable in the observation site and each type of the auxiliary variables;

After the target area is partitioned each time, the spatial hierarchical heterogeneity of the relationship between each type of auxiliary variable and the target variable in the current partition state is calculated based on the local autocorrelation coefficient of each bivariate.

3. The method for generating geographic space full coverage data based on space-varying parameter machine learning according to claim 2 is characterized in that, for each observation site in the target area, the bivariate local spatial autocorrelation coefficient between the target variable in the observation site and each type of auxiliary variable is calculated, including:

For each of the observation sites in the target area, based on the auxiliary variables of the observation site and the target variables of a first preset number of neighboring observation sites of the observation site, the bivariate local spatial autocorrelation coefficient corresponding to the observation site is calculated.

4. The method for generating geographic space full coverage data based on spatially varying parameter machine learning according to claim 2 is characterized in that after partitioning the target area each time, the spatial hierarchical heterogeneity of the relationship between each type of auxiliary variable and the target variable in the current partition state is calculated based on the local autocorrelation coefficient of each bivariate, including:

After the target area is partitioned each time, based on the local spatial autocorrelation coefficients of the two variables corresponding to all the observation sites in each partition space under the current partition state, the local autocorrelation index variance values of the relationships between various auxiliary variables and the target variables in each partition space are calculated;

Based on the bivariate local spatial autocorrelation coefficients corresponding to all the observation sites in the target area, the global autocorrelation index variance values of the relationships between various auxiliary variables and the target variable in the target area are calculated;

The spatial hierarchical heterogeneity of the relationship between each type of auxiliary variable and the target variable in the target area under the current partition state is calculated based on the local autocorrelation index variance value and the global autocorrelation index variance value.

5. The method for generating geographic space full coverage data based on spatially varying parameter machine learning according to any one of claims 1 to 4, characterized in that the spatial hierarchical heterogeneity of the relationship between each type of auxiliary variable and the target variable after each partition is used to determine the target partition state, comprising:

Based on the spatial hierarchical heterogeneity of the relationship between each type of auxiliary variable and the target variable after each partition, the average spatial hierarchical heterogeneity corresponding to the target area after each partition is calculated;

Constructing a first change curve based on the average spatial stratified heterogeneity corresponding to the target area after each partition, or constructing a second change curve based on the difference between the average spatial stratified heterogeneity after two adjacent partitions;

A partition state corresponding to an inflection point of the first change curve or an inflection point of the second change curve is determined as a target partition state.

6. The method for generating geographic space full coverage data based on spatially varying parameter machine learning according to any one of claims 1 to 4, characterized in that each of the spatially varying parameter machine learning models corresponding to each of the sub-regions is trained based on the following method:

Obtain target variables and auxiliary variables of all the observation sites in the sub-region;

Determine the distances from each of the observation sites in the sub-area to the center point of the sub-area;

A random forest model is trained based on the target variables and auxiliary variables of all the observation sites in the sub-region, and the distances from each observation site in the sub-region to the center point of the sub-region, and the trained random forest model is used as the spatially varying parameter machine learning model corresponding to the sub-region.

7. The method for generating geographic space full coverage data based on space-varying parameter machine learning according to any one of claims 1 to 4, characterized in that the target variables of each preset point to be predicted in the target area are interpolated and predicted based on each space-varying parameter machine learning model corresponding to each sub-area, respectively, to obtain a target interpolation result and an uncertainty analysis result, including:

Using multiple spatially varying parameter machine learning models to respectively predict the target variable of the point to be predicted, and obtaining multiple interpolation prediction results, wherein the multiple spatially varying parameter machine learning models include a second preset number of spatially varying parameter machine learning models adjacent to the point to be predicted;

Based on the inverse distance weighted method, the target interpolation result and the uncertainty analysis result of the target variable of the point to be predicted are determined according to the multiple interpolation prediction results.

8. A device for generating geographic space full coverage data based on space-varying parameter machine learning, characterized by comprising:

A partitioning module is used to partition the target area step by step, and after each partition, based on various auxiliary variables and target variables in each observation station in the target area, calculate the spatial hierarchical heterogeneity of the relationship between various auxiliary variables and target variables in the current partition state;

A determination module, configured to determine a target partition state based on the spatial hierarchical heterogeneity of the relationship between each type of auxiliary variable and the target variable after each partition, wherein the target partition state corresponds to a plurality of sub-regions of the target region;

A modeling module is used to construct a spatially varying parameter machine learning model for each sub-region in the target region under the target partition state, each of the spatially varying parameter machine learning models is respectively obtained by training based on the target variables and auxiliary variables of each observation site within the corresponding spatial range, and the distance from each observation site to the corresponding spatially varying parameter machine learning model; each of the spatially varying parameter machine learning models includes position information and spatial range information, the position information is the center coordinates of the sub-region corresponding to the spatially varying parameter machine learning model, and the spatial range information is the size of the sub-region corresponding to the spatially varying parameter machine learning model;

The prediction module is used to perform interpolation prediction on the target variables of each preset point to be predicted in the target area based on the spatially variable parameter machine learning models corresponding to each sub-area, so as to obtain the target interpolation result and uncertainty analysis result.

9. An electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the method for generating geographic spatial full coverage data based on spatially variable parameter machine learning as described in any one of claims 1 to 7 is implemented.

10. A non-transitory computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the method for generating geographic spatial full coverage data based on spatially variable parameter machine learning as described in any one of claims 1 to 7 is implemented.