CN115049026A

CN115049026A - Regression analysis method of space non-stationarity relation based on GSNNR

Info

Publication number: CN115049026A
Application number: CN202210984054.0A
Authority: CN
Inventors: 倪巳涵; 王中一
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2022-08-17
Filing date: 2022-08-17
Publication date: 2022-09-13

Abstract

The invention discloses a regression analysis method of spatial non-stationarity relation based on GSNNR, belonging to the technical field of combination of deep learning and spatial analysis. The method comprises the following steps: collecting spatial information data; inputting the spatial features and the attribute spatial features into a full-space adjacent nonlinear fusion neural network model, wherein the SAPDNN neural network model takes GNNWR as a basic model, and the attribute spatial features are added into an input layer; obtaining a full-space adjacent expression matrix through the SAPDNN neural network model; and inputting the full-space adjacent expression matrix into an SWNN module for processing, and outputting a weight matrix. The invention introduces the attribute space as an important characteristic for analyzing the non-stationarity process, provides the full-space expression of the fusion space and the attribute space, fuses the geographic space and the attribute space by using the deep neural network, and further improves the accuracy of the measurement of the non-stationarity.

Description

Regression analysis method of space non-stationarity relation based on GSNNR

Technical Field

The invention belongs to the technical field of combination of deep learning and spatial analysis, and particularly relates to a regression analysis method based on a spatial non-stationarity relation of improved GSNNR.

Background

In the field of spatial analysis, analysis of non-stationarity is very critical, and generally, a mathematical model is used for analyzing a non-stationarity relation of a corresponding space in analysis and prediction. The accuracy of the measure for the non-stationarity has become a core evaluation method of the geospatial analysis model.

Geoneural network weighted regression (GNNWR) is a more advanced model structure in the field of geospatial non-stationarity analysis. The model adopts the deep neural network to replace a kernel function for nonlinear fitting in a classical GWR model, and overcomes the defect that the kernel function cannot fit complex nonlinear mapping. In the GNNWR model, the strong nonlinear fitting capability of an advanced deep neural network is utilized, and a nonlinear mapping process from original geographic space data to high-dimensional space hidden feature data is fitted by constructing a Spatial Weighted Neural Network (SWNN). Firstly, calculating the geographical spatial position distances between a plurality of sample points and a point to be estimated to obtain a spatial distance matrix between an unknown point to be estimated and a plurality of known sample points. And then inputting the spatial distance matrix into the SWNN, carrying out nonlinear mapping on the original data in a high-dimensional space by the deep neural network, and obtaining a corresponding spatial weight matrix through learning the data. And finally, taking the space weight matrix as the input of a linear regression model to obtain a final fitting value.

The mathematical modeling of the GNNWR for the non-stationary process is limited to a single geospatial domain, and only the distance characteristic between the sample point and the estimation point is considered. And the non-stationarity of the space in reality is also influenced by the attribute, and the GNNWR is not fully considered on the representation of the non-stationarity data feature of the space, so that the accuracy is unstable.

Disclosure of Invention

The invention aims to provide a regression analysis method of spatial non-stationarity relation based on GSNNR, so as to make up for the defects of the prior art.

In order to achieve the purpose, the invention adopts the following specific technical scheme:

a regression analysis method of spatial non-stationarity relation based on total space neural network regression (GSNNR), comprising the following steps:

s1: collecting spatial information data, dividing the spatial information data into a training set and a test set, and preprocessing the data to obtain characteristic information including spatial characteristics and attribute spatial characteristics;

s2: inputting the spatial features and attribute spatial features obtained in the step S1 into a full-space adjacent nonlinear fusion neural network model (SAPDNN), wherein the SAPDNN neural network model takes GNNWR as a basic model, and attribute spatial features are added into an input layer; obtaining a full-space adjacent expression matrix through the SAPDNN neural network model;

s3: inputting the full-space adjacent expression matrix into an SWNN module for processing, and outputting a weight matrix W;

outputting a final prediction result y ^ by the weight matrix W after being input into a linear regression model OLR;

s4: and the GNNWR and the SWNN form a GSNNR model, the GSNNR model is trained by using the training set to obtain the trained GSNNR model, test data are input into the trained GSNNR model, and a result is output.

Further, in S1: the spatial characteristics refer to position information in a geographic space, such as information characteristics of longitude and latitude, altitude, position coordinates and the like; the attribute spatial features refer to self attributes owned by the geographic entities, such as information features of temperature, wind direction, vegetation types, tree diameters and the like.

Further, in S1: and measuring the spatial features by adopting Euclidean distance:

for the measurement of the attribute space feature, the absolute difference distance of a designated attribute value or the weighted difference distance of a plurality of attribute values of the geographic attribute in the vector space is referred to; the mathematical expression of the Attribute Distance (Attribute Distance) is as follows:

wherein the content of the first and second substances,

indicating the attribute distance between the ith and j sample points, superscriptAIs the mark of the attribute characteristic, n is the number of the attribute categories of the sample points participating in the calculation,

is a weighting coefficient of the k-th attribute value and satisfies

；

In order to eliminate the difference of the position distance and the attribute distance in the measurement scale in the vector space, a scale weight parameter is introduced, and the position distance is measured

Distance from attribute

Fusing to construct 'position-attribute' unified distance expression

Expressed as follows:

wherein λ and φ are a position distance scale weight parameter and an attribute distance scale weight parameter, respectively.

Further, in S2: for in space𝑖And𝑗two sample points, assuming a uniform distance representation that takes into account location distance and attribute distance

The non-linear fusion function of (a), which is mathematically expressed as follows:

fitting a "location-attribute" uniform distance representation using a neural network

Constructing a 'position-attribute' fusion Neural Network (SAPNN) between two sample points by using the nonlinear fusion function of the system; by distance of position

Distance from attribute

As input, through several fully connected layers, obtain𝑖And with𝑗Uniform distance between two sample points characterisation: by distance of position

Distance from attribute

As input, through several fully connected layers, obtain𝑖And𝑗the uniform distance between two sample points is characterized by the following formula:

the SAPNN is used for fusing the spatial features and the attribute features of the two sample points; considering the interaction that the uniform distance relation of the space-attribute exists between any two sample points in the point set, a space-attribute fusion Deep Neural Network (SAPDNN) is constructed;

for any sample point i, the position distance characterization vector of the point and other points in the point set in the sample space can be obtained

Characterization vector of distance from attribute

Wherein n is the total number of sample points; for simplicity, the above two distance characterization vectors are simplified as

And

(ii) a By the distance between the position of the sample point i and all other sample points

Distance from attribute

As input, to the sample points

The distance from the position between two points of each sample point

Distance from attribute

The sample point can be obtained by adopting the SAPNN network to carry out the unified distance fusion calculation of the position-attribute

Unified distance characterization vector for fused position distance and attribute distance with all sample points

And then carrying out nonlinear fusion on a plurality of full connection layers to obtain a sample point in a representable space

And all other sample points are subjected to a 'position-attribute' unified distance measurement

The formula is as follows:

。

furthermore, the SAPDNN neural network model adopts a neural network architecture with three layers, namely an input layer, a hidden layer and an output layer, and technologies such as He parameter initialization, a PReLU activation function, batch normalization and a variable learning rate are used in training to improve the generalization of the model.

Further, the He parameter initialization, the PReLU activation function, the batch normalization and the learning variation rate are specifically as follows:

initializing the He parameter: exponential amplification or reduction of signals during antecedent propagation and backward propagation in the network is avoided, and therefore gradient disappearance or explosion is avoided;

the PReLU activation function, ai, is a learnable parameter,

the PReLU activation function improves the fitting performance of the model under the condition of almost not increasing parameters, and reduces the risk of overfitting;

the batch normalization is as follows: the output of each layer of the model is normalized before passing through the activation function, so that the numerical value is ensured to be stable when being transmitted in the middle of the network, the network is easier to converge, and the overfitting risk is reduced;

the learning rate is as follows: in model training, it is generally desired that the initial learning rate is slightly higher and the late learning rate is slightly lower. The learning rate can be adapted to the degree of model training by using the variable learning rate, and the learning rate is smaller and smaller when the model is more and more accurate.

Further, the SWNN module is a neural network architecture of an input layer, two hidden layers (which may be more than two layers), and an output layer; calculating the weight of the input group full-space adjacent expression matrix; the same training optimization technique as SAPDNN is used in training.

Compared with the prior art, the invention has the advantages and beneficial effects that:

(1) the method introduces an attribute space as an important characteristic for analyzing a non-stationarity process, and the attribute space is brought into the input of a spatial non-stationarity detection model; attribute Space (Attribute Space) refers to the attributes that are owned within a geospatial scope. The spatial difference of the geographic attributes in combination with the geographic spatio-temporal distribution is of great significance for revealing complex geographic phenomena.

(2) The invention provides the full-space expression of the fusion space and the attribute space, the deep neural network is used for fusing the geographic space and the attribute space, and the fused composite feature can more accurately represent the actual space non-stationarity process compared with the single geographic space feature, thereby further improving the accuracy of the measurement of the non-stationarity.

(3) The invention also provides a full-space adjacent nonlinear fusion neural network (SAPDNN), and the neural network model takes GNNWR as a basic model, adds attribute space characteristics and improves the accuracy of prediction. The network is used for fusing the geographic spatial characteristics and the geographic attribute characteristics to obtain a full-space expression of the geographic characteristics.

Drawings

Fig. 1 is a basic framework diagram of the SAPDNN neural network model.

Fig. 2 is a process diagram of the SAPNN neural network model.

Fig. 3 is a process diagram of the SAPDNN neural network model.

Fig. 4 is a basic framework diagram of the SWNN module.

FIG. 5 is a flow chart of the SWNN module output weight matrix.

Fig. 6 is a diagram of the input and output structure of the GSNNWR model.

FIG. 7 is a cross-training validation flow diagram of the present invention.

Detailed Description

The technical solution of the present invention will be further described and illustrated with reference to the following examples.

Example 1:

a regression analysis method of spatial non-stationarity relation based on whole space neural network regression (GSNNR) comprises the following steps:

s1: collecting spatial information data, dividing the spatial information data into a training set and a test set, and preprocessing the data to obtain characteristic information including spatial characteristics and attribute spatial characteristics; the spatial characteristics refer to position information in a geographic space, such as information characteristics of longitude and latitude, altitude, position coordinates and the like; the attribute spatial features refer to self attributes owned by the geographic entities, such as information features of temperature, wind direction, vegetation types, tree diameters and the like.

And measuring the spatial features by adopting Euclidean distance:

wherein the content of the first and second substances,

is a weighting coefficient of the kth attribute value and satisfies

；

For eliminating location distance and attribute distanceThe difference of distance on the measurement scale in the vector space is introduced into the scale weight parameter, and the position distance is measured

Distance from attribute

Fusing to construct 'position-attribute' unified distance expression

Expressed as follows:

S2: inputting the spatial features and attribute spatial features obtained in S1 into a full-space proximity nonlinear fusion neural network model (SAPDNN), which takes GNNWR as a basic model and adds attribute spatial features to an input layer, as shown in fig. 1; obtaining a full-space adjacent expression matrix through the SAPDNN neural network model;

for in space𝑖And𝑗two sample points, assuming a uniform distance representation that takes into account location distance and attribute distance

To construct a 'position-attribute' fusion between two sample pointsSynthetic Neural networks (SAPNN), as shown in fig. 2; by distance of position

Distance from attribute

As input, through several fully connected layers, obtain𝑖And𝑗uniform distance between two sample points characterisation: by distance of position

Distance from attribute

the SAPNN is used for fusing the spatial features and the attribute features of the two sample points; considering the interaction of the uniform distance relationship of the space-attribute between any two sample points in the point set, a Spatial-attribute fused Deep Neural Network (SAPDNN) is constructed, as shown in fig. 3;

for any one sample point

All the characteristic vectors of the position distance between the point and other points in the point set in the sample space can be obtained

Characterization vector of distance from attribute

Wherein

Is the total number of sample points; for simplicity, the above two distance characterization vectors are simplified as

And

Distance from attribute

As input, to the sample points

The distance from the position between two points of each sample point

Distance from attribute

And then nonlinear fusion is carried out on the plurality of full connection layers to obtain the position-attribute unified distance measurement between the sample point and all other sample points in the representation space

The formula is as follows:

。

the SAPDNN neural network model adopts a neural network architecture with three layers of an input layer, a hidden layer and an output layer, and the generalization of the model is improved by using the technologies of He parameter initialization, a PReLU activation function, batch normalization, learning rate variation and the like in training.

The He parameter initialization, the PReLU activation function, the batch normalization and the learning variation rate are specifically as follows:

the PReLU activation function, ai, is a learnable parameter,

S3: inputting the full-space adjacent expression matrix into an SWNN module for processing, and outputting a weight matrix W as shown in FIG. 5;

as shown in fig. 4, the SWNN module is a neural network architecture with four layers, an input layer, two hidden layers (which may be more than two layers), and an output layer; calculating the weight of the input group full-space adjacent expression matrix; the same training optimization technique as SAPDNN is used in training.

S4: the GNNWR and SWNN constitute a GSNNR model, the GSNNR model is trained by using the training set to obtain a trained GSNNR model, and then the test data is input into the trained GSNNR model to output a result, as shown in fig. 6.

The technical features of this embodiment include the following:

(1) the "space-attribute" feature fusion. And inputting the spatial characteristics and the geographic attribute characteristics corresponding to each sample point into SAPDNN, and obtaining a space-attribute full-proximity characteristic expression matrix through operation. The outputs of multiple sample points are combined into one large matrix as input to the next module.

(2) And calculating a space-attribute characteristic weight matrix. For the fusion characteristic matrix output by the former module, a deep neural network is adopted to extract characteristics, the neural network adopts a multi-layer perceptron structure, and optimization technologies such as Dropout, He parameter initialization, a PReLU activation function and the like are adopted in training to enhance the generalization capability of the model.

(3) And (5) calculating a prediction result. And multiplying the non-stationary weight value by the least square coefficient to obtain the non-stationary coefficient. The final output fit value y ̂ _ i of the model is the result of the multiplicative sum of all the non-stationary coefficients and their corresponding arguments. The least squares coefficients are derived from the training set.

(4) And (5) verifying and testing. In the future, the effectiveness of algorithm design is verified, a data set is divided into a training set and a test set according to the proportion of 3:1, 10-fold cross validation is performed in the training set according to the proportion of 9:1, and the process of cross validation is shown in fig. 7.

Example 2

In this embodiment, based on embodiment 1, the spatial non-stationarity relationship of the PM2.5 concentration in the atmosphere is used as a research object, and the actual PM2.5 concentration value is predicted by using the algorithm model.

In order to ensure the representativeness of data, monitoring point data in 2018 nationwide is selected as research data, and the influence of full-space proximity expression fused with space-attribute characteristics on the calculation precision of the non-stationarity relation is emphasized and contrasted. For data processing, Wind Direction (WD) is selected as a geographical attribute feature related to PM2.5 concentration, PM2.5 concentration is taken as a prediction object, and input features of the model further include Data of Elevation (DEM), relative humidity (r), 10m Wind Speed (WS), Aerosol (AOD), precipitation (TP) and 2m Temperature (TEMP).

In the training set and the test set, all data samples were set according to 3:1 into a cross-validation set and a test set, and a ratio of 9: the 10-fold cross-validation at the ratio of 1 ensures the generalization capability of the model. The used data is obtained by randomly sampling monitoring points in the whole country, and the data is randomly distributed in the geographical space range in the whole country, so that the conclusion of the case is general and representative.

Compared with GWR and GNNWR models, the improvement of the invention mainly lies in the following two points:

firstly, introducing a geographic attribute space as one of characteristics input by an algorithm, and performing fusion processing on the space-attribute characteristics by using SAPDNN to obtain a full-space expression. Compared with the original scheme only considering the geographical space position relation, the new scheme provided by the invention has more representativeness on the basic data level, and the actual geographical space non-stationarity relation can be represented by a characteristic processing mode combining space and attributes.

And secondly, the resolving precision is improved after a new characteristic representation mode is introduced, and compared with a GNNWR model without considering the geographic attribute spatial characteristics and a GWR model adopting different kernel functions, the resolving precision is improved by about 10% on average.

Introduction of geographic attribute space

The existing model scheme only excavates the spatial non-stationarity relation between samples from the two-dimensional distance on the geographic spatial position relation, however, in practice, the relation between the samples is influenced by various factors, and after the geographic attribute characteristics are introduced, the expression of the samples on the data is closer to the actual situation, and more semantic information is included.

Two features are fused by adopting a deep neural network

After new feature expression is introduced, the two different features are processed in a deep neural network mode for fusion, and a full-space expression matrix of space-attribute is obtained to be used as input data of subsequent calculation. As the data contains more semantic information, the resolving precision of the model is improved.

On the basis of the above embodiments, the present invention continues to describe the technical features and functions of the technical features in the present invention in detail to help those skilled in the art fully understand the technical solutions of the present invention and reproduce them.

Finally, although the present description refers to embodiments, not every embodiment contains only a single technical solution, and such description of the present description is for clarity reasons only, and those skilled in the art should make the description as a whole, and the technical solutions in the embodiments can be appropriately combined to form other embodiments that can be understood by those skilled in the art.

Claims

1. A regression analysis method of spatial non-stationarity relation based on GSNNR is characterized by comprising the following steps:

s2: inputting the spatial features and the attribute spatial features obtained in the step S1 into a full-space adjacent nonlinear fusion neural network model SAPDNN, wherein the SAPDNN takes GNNWR as a basic model, and attribute spatial features are added into an input layer; obtaining a full-space adjacent expression matrix through the SAPDNN;

2. The regression analysis method according to claim 1, wherein in S1: the spatial characteristics refer to position information in a geographic space, and comprise longitude and latitude, altitude and position coordinates; the attribute spatial characteristics refer to the attributes of the geographic entity, including temperature, wind direction, vegetation type and tree diameter.

3. The regression analysis method according to claim 1, wherein in S1: the spatial features are measured using Euclidean distances:

；

the measurement of the attribute space feature is the absolute difference distance of a designated attribute value or the weighted difference distance of a plurality of attribute values of the geographic attribute in the vector space; the mathematical expression of the attribute distance is as follows:

wherein the content of the first and second substances,

represents the attribute distance between the ith and j sample pointsSign boardAIs the mark of the attribute characteristic, n is the number of the attribute categories of the sample points participating in the calculation,

is a weighting coefficient of the k-th attribute value and satisfies

；

Introducing a scale weight parameter and calculating the position distance

Distance from attribute

Fusing to construct 'position-attribute' unified distance expression

Expressed as follows:

4. The regression analysis method according to claim 1, wherein in S2: for in space𝑖And𝑗two sample points, assuming a uniform distance representation that takes into account location distance and attribute distance

The nonlinear fusion function of (1) and constructing a 'position-attribute' fusion neural network SAPNN between two sample points; by distance of position

Distance from attribute

Distance from attribute

the SAPNN is used for fusing the spatial features and the attribute features of the two sample points; considering the interaction of the unified distance relation of 'space-attribute' between any two sample points in the point set, constructing a 'space-attribute' fusion deep neural network SAPDNN;

Characterization vector of distance from attribute

And

Distance from attribute

As input, to the sample points

The distance from the position between two points of each sample point

Distance from attribute

And then nonlinear fusion is carried out on the sample points through a plurality of full connection layers to obtain sample points in the representational space

The formula is as follows:

。

5. the regression analysis method of claim 1, wherein in S2, said SAPDNN adopts a neural network architecture with three layers of input layer, hidden layer and output layer, and He parameter initialization, prilu activation function, batch normalization, and learning rate variation are used in training to improve the generalization of the model.

6. The regression analysis method of claim 5, wherein the He parameter initialization, the PReLU activation function, the batch normalization, and the learning rate variation are as follows:

the PReLU activation function, ai, is a learnable parameter,

the PReLU activation function improves the fitting performance of the model under the condition of almost not increasing parameters;

the batch normalization is as follows: normalizing the output of each layer of the model before the output passes through the activation function;

the learning rate is as follows: the learning rate can be adapted to the degree of model training by using the variable learning rate, and the learning rate is smaller and smaller when the model is more and more accurate.

7. The regression analysis method of claim 1, wherein said SWNN module is a neural network architecture of an input layer, two or more hidden layers, and an output layer; calculating the weight of the input group full-space adjacent expression matrix; the same training optimization technique as the SAPDNN is used in training.