CN115049026A

CN115049026A - Regression analysis method of space non-stationarity relation based on GSNNR

Info

Publication number: CN115049026A
Application number: CN202210984054.0A
Authority: CN
Inventors: 倪巳涵; 王中一
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2022-08-17
Filing date: 2022-08-17
Publication date: 2022-09-13

Abstract

The invention discloses a regression analysis method of spatial non-stationarity relation based on GSNNR, belonging to the technical field of combination of deep learning and spatial analysis. The method comprises the following steps: collecting spatial information data; inputting the spatial features and the attribute spatial features into a full-space adjacent nonlinear fusion neural network model, wherein the SAPDNN neural network model takes GNNWR as a basic model, and the attribute spatial features are added into an input layer; obtaining a full-space adjacent expression matrix through the SAPDNN neural network model; and inputting the full-space adjacent expression matrix into an SWNN module for processing, and outputting a weight matrix. The invention introduces the attribute space as an important characteristic for analyzing the non-stationarity process, provides the full-space expression of the fusion space and the attribute space, fuses the geographic space and the attribute space by using the deep neural network, and further improves the accuracy of the measurement of the non-stationarity.

Description

Regression Analysis Method of Spatial Nonstationarity Relationship Based on GSNNR

技术领域technical field

本发明属于深度学习和空间分析相结合的技术领域，具体地说，尤其涉及一种基于改进GSNNR的空间非平稳性关系的回归分析方法。The invention belongs to the technical field of combining deep learning and spatial analysis, and in particular, particularly relates to a regression analysis method based on the spatial non-stationarity relationship of improved GSNNR.

背景技术Background technique

在空间分析领域，对非平稳性的解析是非常关键的，一般在分析与预测中都需要使用数学模型来解析对应空间的非平稳性关系。对于非平稳性的衡量的精确性已经成为地理空间分析模型的核心评价方法。In the field of spatial analysis, the analysis of non-stationarity is very critical. Generally, mathematical models are needed to analyze the non-stationarity relationship of the corresponding space in analysis and prediction. The accuracy of measures of non-stationarity has become a core evaluation method for geospatial analysis models.

地理神经网络加权回归（GNNWR）是地理空间非平稳性解析领域中较为先进的模型结构。该模型采用深度神经网络代替经典GWR模型中用于非线性拟合的核函数，弥补了核函数不能拟合复杂非线性映射的缺点。在GNNWR模型中，利用先进的深度神经网络强大的非线性拟合能力，通过构建空间加权神经网络(SWNN)来拟合原始地理空间数据到高维空间隐藏特征数据的非线性映射过程。首先计算多个样本点与待估计点之间的地理空间位置距离，得到待估计的未知点与多个已知的样本点之间的空间距离矩阵。然后将空间距离矩阵输入到SWNN中，深度神经网络将原始数据在高维空间进行非线性映射，通过对数据的学习得到对应的空间权重矩阵。最后将空间权重矩阵作为线性回归模型的输入，得到最终的拟合值。Geographic Neural Network Weighted Regression (GNNWR) is a relatively advanced model structure in the field of geospatial nonstationarity analysis. This model uses a deep neural network to replace the kernel function used for nonlinear fitting in the classical GWR model, which makes up for the disadvantage that the kernel function cannot fit complex nonlinear mapping. In the GNNWR model, the advanced deep neural network's powerful nonlinear fitting ability is used to construct a spatially weighted neural network (SWNN) to fit the nonlinear mapping process from original geospatial data to high-dimensional spatial hidden feature data. Firstly, the geographic space position distances between the multiple sample points and the points to be estimated are calculated, and the spatial distance matrix between the unknown points to be estimated and the multiple known sample points is obtained. Then the spatial distance matrix is input into the SWNN, and the deep neural network performs nonlinear mapping of the original data in the high-dimensional space, and obtains the corresponding spatial weight matrix by learning the data. Finally, the spatial weight matrix is used as the input of the linear regression model to obtain the final fitted value.

GNNWR对于非平稳性过程的数学建模局限于单一的地理空间范畴，只考虑到了样本点与估计点之间的距离特征。而现实中的空间非平稳性还受到属性的影响，GNNWR在空间非平稳的数据特征表达上考虑不够全面，导致精确度不稳定。The mathematical modeling of GNNWR for non-stationary processes is limited to a single geographic space category, and only considers the distance characteristics between sample points and estimated points. In reality, spatial non-stationarity is also affected by attributes. GNNWR does not fully consider the spatial non-stationary data feature expression, resulting in unstable accuracy.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提供一种基于GSNNR的空间非平稳性关系的回归分析方法，以弥补现有技术的不足。The purpose of the present invention is to provide a regression analysis method based on the spatial non-stationarity relationship of GSNNR to make up for the deficiencies of the prior art.

为达到上述目的，本发明采取的具体技术方案为：In order to achieve the above object, the concrete technical scheme that the present invention takes is:

一种基于全空间神经网络回归（GSNNR）的空间非平稳性关系的回归分析方法，包括以下步骤：A regression analysis method of spatial non-stationarity relationship based on full spatial neural network regression (GSNNR), including the following steps:

S1：收集空间信息数据，分为训练集和测试集，数据进行预处理得到的特征信息包括空间特征和属性空间特征；S1: Collect spatial information data, which is divided into training set and test set, and the feature information obtained by data preprocessing includes spatial features and attribute spatial features;

S2：将S1得到的空间特征和属性空间特征输入至全空间邻近非线性融合神经网络模型(SAPDNN)中，该SAPDNN神经网络模型以GNNWR为基础模型，输入层添加了属性空间特征；经过该SAPDNN神经网络模型得到全空间邻近表达矩阵；S2: Input the spatial features and attribute spatial features obtained in S1 into the full-space adjacent nonlinear fusion neural network model (SAPDNN). The SAPDNN neural network model is based on GNNWR, and the attribute space features are added to the input layer; after the SAPDNN The neural network model obtains the full-space adjacent expression matrix;

S3：所述全空间邻近表达矩阵输入至SWNN模块中进行处理，输出一个权重矩阵W；S3: The full-space adjacent expression matrix is input into the SWNN module for processing, and a weight matrix W is output;

权重矩阵W在输入到线性回归模型OLR输出最终的预测结果y^；The weight matrix W is input to the linear regression model OLR to output the final prediction result y^;

S4：所述GNNWR和SWNN构成了GSNNR模型，利用所述训练集对GSNNR模型进行训练，得到训练好的GSNNR模型，再将测试数据输入至训练好的GSNNR模型中，输出结果即可。S4: The GNNWR and SWNN constitute a GSNNR model, and the training set is used to train the GSNNR model to obtain a trained GSNNR model, and then the test data is input into the trained GSNNR model, and the result can be output.

进一步的，所述S1中：所述空间特征指在地理空间中所处的位置信息，比如经纬度、海拔、位置坐标等信息特征；所述属性空间特征是指地理实体所拥有的自身属性，比如温度、风向、植被类型、树木直径等信息特征。Further, in S1: the spatial feature refers to the location information in the geographic space, such as information features such as longitude and latitude, altitude, and position coordinates; the attribute spatial feature refers to the own attribute owned by the geographic entity, such as Information features such as temperature, wind direction, vegetation type, tree diameter, etc.

进一步的，所述S1中：对于空间特征采用欧式距离进行度量：Further, in the S1: the Euclidean distance is used to measure the spatial features:

对于属性空间特征的度量，指地理属性在向量空间中指定属性值的绝对差值距离或者多个属性值的加权差值距离；属性距离(Attribute Distance)的数学表达如下：For the measurement of attribute space features, it refers to the absolute difference distance of the specified attribute value of the geographic attribute in the vector space or the weighted difference distance of multiple attribute values; the mathematical expression of the attribute distance is as follows:

其中，

表示第i与j个样本点之间的属性距离，上标A是属性特征的标识，n为样本点参与计算的属性类别个数，

是第k个属性值的加权系数，且满足

；in,

Indicates the attribute distance between the i-th and jth sample points, the superscript A is the identification of the attribute feature, n is the number of attribute categories that the sample points participate in the calculation,

is the weighting coefficient of the kth attribute value, and satisfies

;

为消除位置距离与属性距离的在向量空间中度量尺度上的差异，引入尺度权重参数，将位置距离

与属性距离

进行融合，构建“位置-属性”统一距离表达

，表示如下：In order to eliminate the difference in the measurement scale between the position distance and the attribute distance in the vector space, the scale weight parameter is introduced, and the position distance is

distance from property

Fusion to build a "position-attribute" unified distance expression

, expressed as follows:

其中，λ、φ分别为位置距离尺度权重参数与属性距离尺度权重参数。Among them, λ and φ are the location distance scale weight parameter and the attribute distance scale weight parameter, respectively.

进一步的，所述S2中：对于空间中𝑖与𝑗两个样本点，假设存在兼顾位置距离与属性距离的统一距离表达

的非线性融合函数，其数学表达如下：Further, in S2: for the two sample points 𝑖 and 𝑗 in the space, it is assumed that there is a unified distance expression that takes into account both the position distance and the attribute distance

The nonlinear fusion function of , its mathematical expression is as follows:

利用神经网络来拟合“位置-属性”统一距离表达

的非线性融合函数，构建两个样本点之间的“位置-属性”融合神经网络（Spatial-attribute Proximities NeuralNetwork，SAPNN）；以位置距离

与属性距离

作为输入，通过若干全连接层，得到𝑖与𝑗两个样本点之间的统一距离表征量：以位置距离

与属性距离

作为输入，通过若干全连接层，得到𝑖与𝑗两个样本点之间的统一距离表征量，如下公式表示：Using Neural Networks to Fit the "Location-Attribute" Unified Distance Expression

The nonlinear fusion function of , constructs a "position-attribute" fusion neural network (Spatial-attribute Proximities Neural Network, SAPNN) between two sample points;

distance from property

As input, through several fully connected layers, the uniform distance representation between the two sample points 𝑖 and 𝑗 is obtained: the position distance

distance from property

As input, through several fully connected layers, the uniform distance representation between the two sample points 𝑖 and 𝑗 is obtained, which is expressed by the following formula:

SAPNN用于融合两个样本点的空间特征和属性特征；考虑到点集中任意两个样本点之间都存在“空间-属性”统一距离关系的相互作用，构建“空间-属性”融合深度神经网络（Spatial-attribute Proximities Deep Neural Network，SAPDNN）；SAPNN is used to fuse the spatial features and attribute features of two sample points; considering the interaction of the "space-attribute" unified distance relationship between any two sample points in the point set, a "space-attribute" fusion deep neural network is constructed. (Spatial-attribute Proximities Deep Neural Network, SAPDNN);

对于任意一个样本点i，均可获得该点与样本空间内点集中其他点的位置距离表征向量

与属性距离表征向量

，其中n为样本点总数；为简单起见，以上两种距离表征向量分别简化为

与

；以样本点i与其他所有样本点的位置距离

与属性距离

作为输入，对样本点

与每一个样本点两点之间的位置距离

与属性距离

都采用SAPNN网络进行“位置-属性”的统一距离融合计算，可以得到该样本点

与所有样本点的融合位置距离与属性距离的统一距离表征向量

，再经过若干全连接层进行非线性融合，获得可以表征空间中样本点

的与其他所有样本点之间“位置-属性”统一距离度量

，其公式如下表示：For any sample point i, the position distance representation vector of the point and other points in the point set in the sample space can be obtained

and attribute distance representation vector

, where n is the total number of sample points; for simplicity, the above two distance representation vectors are simplified as

and

; the distance between sample point i and all other sample points

distance from property

As input, for sample points

The position distance between the two points from each sample point

distance from property

Both use the SAPNN network to perform the unified distance fusion calculation of "position-attribute", and the sample point can be obtained.

Uniform distance representation vector with fused location distances and attribute distances to all sample points

, and then go through several fully connected layers for nonlinear fusion to obtain sample points that can represent the space.

The unified distance metric of "location-attribute" between all other sample points

, the formula is as follows:

。

.

进一步的，所述SAPDNN神经网络模型采用输入层、隐含层和输出层三层的神经网络架构，在训练中使用He参数初始化、PReLU激活函数、批量归一化、变学习率等技术提高模型的泛化性。Further, the SAPDNN neural network model adopts a three-layer neural network architecture of input layer, hidden layer and output layer, and uses He parameter initialization, PReLU activation function, batch normalization, variable learning rate and other technologies to improve the model during training. generalizability.

更进一步的，所述He参数初始化、PReLU激活函数、批量归一化、变学习率具体如下：Further, the He parameter initialization, PReLU activation function, batch normalization, and variable learning rate are as follows:

所述He参数初始化：避免信号在网络中进行前项传播和反向传播时出现指数级的放大或缩小，从而避免梯度消失或爆炸；The initialization of the He parameter: avoid the exponential enlargement or shrinkage of the signal when the pre-propagation and back-propagation are performed in the network, thereby avoiding the disappearance or explosion of the gradient;

所述PReLU激活函数，ai是一个可学习的参数，The PReLU activation function, ai is a learnable parameter,

PReLU激活函数在几乎没有增加参数的情况下提升了模型的拟合性能，减少了过拟合风险；The PReLU activation function improves the fitting performance of the model with almost no increase in parameters, reducing the risk of overfitting;

所述批量归一化：对模型每一层的输出在经过激活函数之前先进行归一化操作，确保数值在网络中间传播时保持稳定，使得网络更加容易收敛，降低过拟合风险；The batch normalization: the output of each layer of the model is normalized before passing through the activation function to ensure that the value remains stable when it propagates in the middle of the network, making the network easier to converge and reducing the risk of overfitting;

所述变学习率：在模型训练中通常希望初期学习率大一点，后期学习率小一点。使用变学习率可以使得学习率适应模型训练的程度，当模型越来越精确的同时，学习率越来越小。The variable learning rate: In model training, it is usually desirable to have a larger initial learning rate and a smaller later learning rate. Using a variable learning rate can adapt the learning rate to the extent of the model training, and as the model becomes more and more accurate, the learning rate becomes smaller and smaller.

进一步的，所述SWNN模块为输入层、两层隐含层（可以不止两层）和输出层的神经网络架构；对于输入的对组全空间邻近表达矩阵进行权重的计算；在训练中采用与SAPDNN同样的训练优化技术。Further, the SWNN module is a neural network architecture of an input layer, two hidden layers (can be more than two layers) and an output layer; the weights are calculated for the input pair full-space adjacent expression matrix; The same training optimization technique for SAPDNN.

与现有技术相比，本发明的优点和有益效果是：Compared with the prior art, the advantages and beneficial effects of the present invention are:

（1）本发明引入属性空间作为解析非平稳性过程的重要特征，属性空间纳入了空间非平稳性探测模型的输入；属性空间（Attribute Space）指地理空间范围之内拥有的属性。地理属性在空间上的差异结合地理时空分布对于揭示复杂的地理现象具有重要意义。(1) The present invention introduces attribute space as an important feature of the analytical non-stationarity process, and the attribute space incorporates the input of the spatial non-stationarity detection model; Attribute Space refers to the attributes possessed within the scope of geographic space. The spatial differences of geographical attributes combined with the spatial and temporal distribution of geography are of great significance for revealing complex geographical phenomena.

（2）本发明提出融合空间和属性空间的全空间表达，使用深度神经网络将地理空间与属性空间相融合，融合之后的复合特征相比单一的地理空间特征更加能够准确的表征实际的空间非平稳性过程，从而进一步提高非平稳性的衡量的精确性。(2) The present invention proposes a full-space representation of fusion space and attribute space, and uses deep neural network to fuse geographic space and attribute space. Compared with a single geographic space feature, the composite feature after fusion can more accurately represent the actual spatial feature. Stationarity process, thereby further improving the accuracy of the measure of non-stationarity.

（3）本发明还提出全空间邻近非线性融合神经网络(SAPDNN)，该神经网络模型以以GNNWR为基础模型，添加了属性空间特征，提高了预测的准确度。该网络用来融合地理空间特征与地理属性特征，得到地理特征的一个全空间表达。(3) The present invention also proposes a full-space adjacent nonlinear fusion neural network (SAPDNN). The neural network model is based on GNNWR and adds attribute space features to improve the accuracy of prediction. The network is used to integrate geographic spatial features and geographic attribute features to obtain a full spatial representation of geographic features.

附图说明Description of drawings

图1是SAPDNN神经网络模型的基本框架图。Figure 1 is the basic frame diagram of the SAPDNN neural network model.

图2是SAPNN神经网络模型的处理过程图。Figure 2 is a process diagram of the SAPNN neural network model.

图3是SAPDNN神经网络模型的处理过程图。Figure 3 is a process diagram of the SAPDNN neural network model.

图4是SWNN模块的基本框架图。Figure 4 is the basic frame diagram of the SWNN module.

图5是SWNN模块输出权重矩阵的流程图。Figure 5 is a flow chart of the output weight matrix of the SWNN module.

图6是GSNNWR模型的输入与输出结构图。Figure 6 is the input and output structure diagram of the GSNNWR model.

图7是本发明交叉训练验证流程图。FIG. 7 is a flow chart of cross-training and verification of the present invention.

具体实施方式Detailed ways

下面结合实施例对本发明所述的技术方案作进一步地描述说明。The technical solutions of the present invention will be further described below with reference to the embodiments.

实施例1：Example 1:

S1：收集空间信息数据，分为训练集和测试集，数据进行预处理得到的特征信息包括空间特征和属性空间特征；所述空间特征指在地理空间中所处的位置信息，比如经纬度、海拔、位置坐标等信息特征；所述属性空间特征是指地理实体所拥有的自身属性，比如温度、风向、植被类型、树木直径等信息特征。S1: Collect spatial information data, which is divided into training set and test set. The feature information obtained by data preprocessing includes spatial features and attribute spatial features; the spatial features refer to location information in geographic space, such as latitude and longitude, altitude , location coordinates and other information features; the attribute space features refer to the own attributes possessed by geographic entities, such as information features such as temperature, wind direction, vegetation type, tree diameter, etc.

对于空间特征采用欧式距离进行度量：For spatial features, Euclidean distance is used to measure:

其中，

是第k个属性值的加权系数，且满足

；in,

is the weighting coefficient of the kth attribute value, and satisfies

;

与属性距离

进行融合，构建“位置-属性”统一距离表达

distance from property

Fusion to build a "position-attribute" unified distance expression

, expressed as follows:

S2：将S1得到的空间特征和属性空间特征输入至全空间邻近非线性融合神经网络模型(SAPDNN)中，如图1所示，该SAPDNN神经网络模型以GNNWR为基础模型，输入层添加了属性空间特征；经过该SAPDNN神经网络模型得到全空间邻近表达矩阵；S2: Input the spatial features and attribute spatial features obtained by S1 into the full-space adjacent nonlinear fusion neural network model (SAPDNN), as shown in Figure 1, the SAPDNN neural network model is based on GNNWR, and attributes are added to the input layer. Spatial features; the full-space adjacent expression matrix is obtained through the SAPDNN neural network model;

对于空间中𝑖与𝑗两个样本点，假设存在兼顾位置距离与属性距离的统一距离表达

的非线性融合函数，其数学表达如下：For the two sample points 𝑖 and 𝑗 in the space, it is assumed that there is a uniform distance expression that takes into account both the location distance and the attribute distance.

The nonlinear fusion function of , its mathematical expression is as follows:

利用神经网络来拟合“位置-属性”统一距离表达

的非线性融合函数，构建两个样本点之间的“位置-属性”融合神经网络（Spatial-attribute Proximities NeuralNetwork，SAPNN），如图2所示；以位置距离

与属性距离

与属性距离

The nonlinear fusion function of the

distance from property

SAPNN用于融合两个样本点的空间特征和属性特征；考虑到点集中任意两个样本点之间都存在“空间-属性”统一距离关系的相互作用，构建“空间-属性”融合深度神经网络（Spatial-attribute Proximities Deep Neural Network，SAPDNN），如图3所示；SAPNN is used to fuse the spatial features and attribute features of two sample points; considering the interaction of the "space-attribute" unified distance relationship between any two sample points in the point set, a "space-attribute" fusion deep neural network is constructed. (Spatial-attribute Proximities Deep Neural Network, SAPDNN), as shown in Figure 3;

对于任意一个样本点

，均可获得该点与样本空间内点集中其他点的位置距离表征向量

与属性距离表征向量

，其中

为样本点总数；为简单起见，以上两种距离表征向量分别简化为

与

；以样本点i与其他所有样本点的位置距离

与属性距离

作为输入，对样本点

与每一个样本点两点之间的位置距离

与属性距离

，再经过若干全连接层进行非线性融合，获得可以表征空间中样本点的与其他所有样本点之间“位置-属性”统一距离度量

，其公式如下表示：for any sample point

, the position distance representation vector of the point and other points in the point set in the sample space can be obtained

and attribute distance representation vector

,in

is the total number of sample points; for simplicity, the above two distance representation vectors are simplified as

and

; the distance between sample point i and all other sample points

distance from property

As input, for sample points

The position distance between the two points from each sample point

distance from property

, and then perform nonlinear fusion through several fully connected layers to obtain a unified distance metric of "position-attribute" between the sample point and all other sample points in the space.

, the formula is as follows:

。

.

所述SAPDNN神经网络模型采用输入层、隐含层和输出层三层的神经网络架构，在训练中使用He参数初始化、PReLU激活函数、批量归一化、变学习率等技术提高模型的泛化性。The SAPDNN neural network model adopts a three-layer neural network architecture of input layer, hidden layer and output layer, and uses He parameter initialization, PReLU activation function, batch normalization, variable learning rate and other technologies in training to improve the generalization of the model. sex.

所述He参数初始化、PReLU激活函数、批量归一化、变学习率具体如下：The He parameter initialization, PReLU activation function, batch normalization, and variable learning rate are as follows:

S3：所述全空间邻近表达矩阵输入至SWNN模块中进行处理，如图5所示，输出一个权重矩阵W；S3: The full-space adjacent expression matrix is input into the SWNN module for processing, as shown in Figure 5, a weight matrix W is output;

如图4所示，所述SWNN模块为输入层、两层隐含层（可以不止两层）和输出层四层的神经网络架构；对于输入的对组全空间邻近表达矩阵进行权重的计算；在训练中采用与SAPDNN同样的训练优化技术。As shown in Figure 4, the SWNN module is a neural network architecture with an input layer, two hidden layers (can be more than two layers) and an output layer with four layers; the weights are calculated for the input pair full-space adjacent expression matrix; The same training optimization technique as SAPDNN is used in training.

S4：所述GNNWR和SWNN构成了GSNNR模型，利用所述训练集对GSNNR模型进行训练，得到训练好的GSNNR模型，再将测试数据输入至训练好的GSNNR模型中，输出结果即可，如图6所示。S4: The GNNWR and SWNN constitute a GSNNR model, and the GSNNR model is trained by using the training set to obtain a trained GSNNR model, and then the test data is input into the trained GSNNR model, and the output result can be, as shown in the figure 6 shown.

本实施例的技术特点包括以下：The technical features of this embodiment include the following:

（1）“空间-属性”特征融合。将每个样本点对应的空间特征与地理属性特征输入SAPDNN，经过运算得到“空间-属性”全邻近性特征表达矩阵。将多个样本点的输出组合成一个大矩阵作为下一模块的输入。(1) "Space-attribute" feature fusion. The spatial features and geographic attribute features corresponding to each sample point are input into SAPDNN, and the "spatial-attribute" full proximity feature expression matrix is obtained after operation. Combine the outputs of multiple sample points into a large matrix as the input to the next module.

（2）“空间-属性”特征权重矩阵计算。对于前一模块输出的融合特征矩阵，采用深度神经网络提取特征，神经网络采用多层感知机结构，在训练中采用Dropout、He参数初始化、PReLU激活函数等优化技术来增强模型的泛化能力。(2) "Space-attribute" feature weight matrix calculation. For the fusion feature matrix output by the previous module, a deep neural network is used to extract features, and the neural network adopts a multi-layer perceptron structure. In training, optimization techniques such as Dropout, He parameter initialization, and PReLU activation function are used to enhance the generalization ability of the model.

（3）预测结果计算。将非平稳权值与最小二乘系数相乘来得到非平稳系数。模型最终输出的拟合值y ̂_i是所有非平稳系数及其相应自变量的乘法求和的结果。最小二乘系数是从训练集得到的。(3) Calculation of prediction results. The non-stationary coefficients are obtained by multiplying the non-stationary weights by the least squares coefficients. The fitted value y ̂_i of the final output of the model is the result of the multiplicative summation of all non-stationary coefficients and their corresponding independent variables. The least squares coefficients are obtained from the training set.

（4）验证测试。未来验证算法设计的有效性，将数据集按照3:1的比例分为训练集和测试集，训练集内部按9:1的比例进行10折交叉验证，交叉验证的过程如图7所示。(4) Verification test. To verify the effectiveness of the algorithm design in the future, the data set is divided into training set and test set according to the ratio of 3:1, and 10-fold cross-validation is carried out within the training set according to the ratio of 9:1. The process of cross-validation is shown in Figure 7.

实施例2Example 2

该实施例以实施例1为基础，以大气中PM2.5浓度的空间非平稳性关系为研究对象，利用所述算法模型预测实际的PM2.5浓度数值。This example is based on Example 1, takes the spatial non-stationarity relationship of PM2.5 concentration in the atmosphere as the research object, and uses the algorithm model to predict the actual PM2.5 concentration value.

为保证数据的代表性，本发明选用了2018年全国范围内的监测点数据作为研究数据，重点对比“空间-属性”特征融合后的全空间邻近性表达对该非平稳性关系解算精度的影响。对于数据处理，选用风向(WD)作为与PM2.5浓度相关的地理属性特征，以PM2.5浓度作为预测对象，模型的输入特征还包括高程(DEM)、相对湿度(r)、10m风速(WS)、气溶胶(AOD)、降水量(TP)、2m温度(TEMP)的数据。In order to ensure the representativeness of the data, the present invention selects the nationwide monitoring point data in 2018 as the research data, and focuses on comparing the full-spatial proximity expression after the fusion of "space-attribute" features. influences. For data processing, wind direction (WD) is selected as the geographical attribute feature related to PM2.5 concentration, and PM2.5 concentration is used as the prediction object. The input features of the model also include elevation (DEM), relative humidity (r), 10m wind speed ( WS), aerosol (AOD), precipitation (TP), temperature at 2 m (TEMP).

在训练集和测试集中，将全部数据样本按照3：1的比例随机划分为训练的交叉验证集和测试集，在交叉验证集中采用9：1的比例进行10折交叉验证保证模型的泛化能力。所使用数据是从全国的监测点中进行随机采样获得，数据在全国的地理空间范围内呈现出随机分布，保证了本案例的结论具有普遍代表性。In the training set and test set, all data samples are randomly divided into training cross-validation set and test set according to the ratio of 3:1. In the cross-validation set, the ratio of 9:1 is used for 10-fold cross-validation to ensure the generalization ability of the model. . The data used were randomly sampled from monitoring points across the country, and the data presented a random distribution within the geographic space of the country, ensuring that the conclusions of this case were generally representative.

对比的GWR和GNNWR模型，本发明的改进主要在于以下两点：Compared with the GWR and GNNWR models, the improvement of the present invention mainly lies in the following two points:

一、引入地理属性空间作为算法输入的特征之一，并使用SAPDNN对“空间-属性”特征进行融合处理，获得全空间表达。本发明所提的新方案在基础数据层面上相比只考虑地理空间位置关系的原始方案更具代表性，“空间-属性”相结合的特征处理方式更能表征出实际的地理空间非平稳性关系。1. Introduce geographic attribute space as one of the input features of the algorithm, and use SAPDNN to fuse the "space-attribute" features to obtain a full spatial representation. The new scheme proposed by the present invention is more representative at the basic data level than the original scheme that only considers the geospatial position relationship, and the feature processing method combining "space-attribute" can better characterize the actual geospatial non-stationarity relation.

二、解算精度在引入新的特征表示方式后得到提升，对比不考虑地理属性空间特征的GNNWR模型与采用不同核函数的GWR模型，精度平均提升了10%左右。2. The calculation accuracy is improved after the introduction of a new feature representation method. Comparing the GNNWR model without considering the spatial characteristics of geographic attributes and the GWR model with different kernel functions, the accuracy is improved by about 10% on average.

一、引入地理属性空间1. Introducing geographic attribute space

现有模型方案只从地理空间位置关系上的二维距离来挖掘样本之间的空间非平稳性关系，然而实际上样本之间的这种关系受到多种因素的影响，在引入地理属性特征后，样本在数据上的表达更加接近实际情况，包含了更多的语义信息。The existing model scheme only mines the spatial non-stationarity relationship between samples from the two-dimensional distance in the geographic spatial position relationship. However, in fact, this relationship between samples is affected by many factors. After introducing geographic attribute features , the expression of the sample in the data is closer to the actual situation and contains more semantic information.

二、采用深度神经网络融合两种特征Second, the use of deep neural network fusion of two features

在引入新的特征表达后，对于两种不同特征的处理采用了深度神经网络的方式进行融合，获得“空间-属性”的全空间表达矩阵作为后续解算的输入数据。由于数据本身蕴含了更多的语义信息，使得模型的解算精度得到了提升。After introducing a new feature expression, a deep neural network is used to fuse the processing of two different features, and a full-space expression matrix of "space-attribute" is obtained as the input data for the subsequent calculation. Since the data itself contains more semantic information, the calculation accuracy of the model is improved.

在上述实施例的基础上，本发明继续对其中涉及到的技术特征及该技术特征在本发明中所起到的功能、作用进行详细的描述，以帮助本领域的技术人员充分理解本发明的技术方案并且予以重现。On the basis of the above embodiments, the present invention continues to describe in detail the technical features involved and the functions and functions of the technical features in the present invention, so as to help those skilled in the art to fully understand the features of the present invention. technical solutions and reproduce them.

最后，虽然本说明书按照实施方式加以描述，但并非每个实施方式仅包含一个独立的技术方案，说明书的这种叙述方式仅仅是为清楚起见，本领域技术人员应当将说明书作为一个整体，各实施例中的技术方案也可以经适当组合，形成本领域技术人员可以理解的其他实施方式。Finally, although this specification is described in terms of implementations, not each implementation only includes an independent technical solution. This description in the specification is only for the sake of clarity. Those skilled in the art should take the specification as a whole, and each implementation The technical solutions in the examples can also be appropriately combined to form other embodiments that can be understood by those skilled in the art.

Claims

1. a regression analysis method based on the spatial non-stationarity relation of GSNNR, is characterized in that, this method may further comprise the steps:

S1: Collect spatial information data, which is divided into training set and test set, and the feature information obtained by data preprocessing includes spatial features and attribute spatial features;

S2: Input the spatial features and attribute spatial features obtained in S1 into the full-space adjacent nonlinear fusion neural network model SAPDNN. The SAPDNN uses GNNWR as the basic model, and attribute space features are added to the input layer; expression matrix;

S3: The full-space adjacent expression matrix is input into the SWNN module for processing, and a weight matrix W is output;

The weight matrix W is input to the linear regression model OLR to output the final prediction result y^;

S4: The GNNWR and SWNN constitute a GSNNR model, and the training set is used to train the GSNNR model to obtain a trained GSNNR model, and then the test data is input into the trained GSNNR model, and the result can be output.

2. The regression analysis method according to claim 1, wherein in S1: the spatial feature refers to location information in geographic space, including latitude and longitude, altitude, and position coordinates; the attribute spatial feature Refers to the own attributes of geographic entities, including temperature, wind direction, vegetation type, and tree diameter.

3. regression analysis method as claimed in claim 1, is characterized in that, in described S1: described spatial feature adopts Euclidean distance to measure:

;

The measure of the attribute space feature is the absolute difference distance of the specified attribute value of the geographic attribute in the vector space or the weighted difference distance of multiple attribute values; the mathematical expression of the attribute distance is as follows:

in,

is the weighting coefficient of the kth attribute value, and satisfies

;

Introduce the scale weight parameter, the position distance

distance from property

Fusion to build a "position-attribute" unified distance expression

, expressed as follows:

Among them, λ and φ are the location distance scale weight parameter and the attribute distance scale weight parameter, respectively.

4. The regression analysis method according to claim 1, characterized in that, in S2: for two sample points 𝑖 and 𝑗 in the space, it is assumed that there is a uniform distance expression that takes into account the position distance and the attribute distance

The nonlinear fusion function of , its mathematical expression is as follows:

Using Neural Networks to Fit the "Location-Attribute" Unified Distance Expression

The nonlinear fusion function of , constructs the "position-attribute" fusion neural network SAPNN between two sample points;

distance from property

distance from property

SAPNN is used to fuse the spatial features and attribute features of two sample points; considering the interaction of the "space-attribute" unified distance relationship between any two sample points in the point set, a "space-attribute" fusion deep neural network is constructed. SAPDNN;

For any sample point i, the position distance representation vector of the point and other points in the point set in the sample space can be obtained

and attribute distance representation vector

and

; the distance between sample point i and all other sample points

distance from property

As input, for sample points

The position distance between the two points from each sample point

distance from property

, the formula is as follows:

.

5. regression analysis method as claimed in claim 1 is characterized in that, in described S2, described SAPDNN adopts the neural network architecture of input layer, hidden layer and output layer three layers, uses He parameter initialization in training, PReLU activation function, batch normalization, and variable learning rate to improve the generalization of the model.

6. regression analysis method as claimed in claim 5, is characterized in that, described He parameter initialization, PReLU activation function, batch normalization, variable learning rate are as follows:

The initialization of the He parameter: avoid the exponential enlargement or shrinkage of the signal when the pre-propagation and back-propagation are performed in the network, thereby avoiding the disappearance or explosion of the gradient;

The PReLU activation function, ai is a learnable parameter,

The PReLU activation function improves the fitting performance of the model with almost no increase in parameters;

The batch normalization: normalize the output of each layer of the model before passing through the activation function;

The variable learning rate: Using the variable learning rate can make the learning rate adapt to the degree of model training. When the model becomes more and more accurate, the learning rate becomes smaller and smaller.

7. regression analysis method as claimed in claim 1, is characterized in that, described SWNN module is the neural network architecture of input layer, two layers and above hidden layer and output layer; For the paired group full-space adjacent expression matrix of input Perform weight calculation; use the same training optimization technique as the SAPDNN in training.