CN104764868B - A kind of soil organic matter Forecasting Methodology based on Geographical Weighted Regression - Google Patents

A kind of soil organic matter Forecasting Methodology based on Geographical Weighted Regression Download PDF

Info

Publication number
CN104764868B
CN104764868B CN201510154714.2A CN201510154714A CN104764868B CN 104764868 B CN104764868 B CN 104764868B CN 201510154714 A CN201510154714 A CN 201510154714A CN 104764868 B CN104764868 B CN 104764868B
Authority
CN
China
Prior art keywords
organic matter
sampled point
soil organic
independent variable
predictive value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510154714.2A
Other languages
Chinese (zh)
Other versions
CN104764868A (en
Inventor
宋效东
刘峰
张甘霖
赵玉国
李德成
杨金玲
吴华勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Soil Science of CAS
Original Assignee
Institute of Soil Science of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Soil Science of CAS filed Critical Institute of Soil Science of CAS
Priority to CN201510154714.2A priority Critical patent/CN104764868B/en
Publication of CN104764868A publication Critical patent/CN104764868A/en
Application granted granted Critical
Publication of CN104764868B publication Critical patent/CN104764868B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention relates to a kind of soil organic matter Forecasting Methodology based on Geographical Weighted Regression, covering multicollinearity diagnosis technology and integrated conduct method in local regression, its main method includes: a) combines the overall situation and returns and the preconditioning technique of independent variable in local regression Forecasting Methodology;B) diagnosis of independent variable synteny question synthesis and treatment mechanism in general Geographical Weighted Regression;C) Geographically weighted regression procedure is at the applicability analysis of specific set of data;D) optimum independent variable collection choosing method;E) spatial trend of different homing method residual error is considered;By relative analysis difference independent variable collection and the synteny degree in local regression thereof, consider the spatial trend of residual error, and then improve computational efficiency and the precision of space attribute prediction.

Description

A kind of soil organic matter Forecasting Methodology based on Geographical Weighted Regression
Technical field
The invention belongs to the spacial analytical method of space-oriented attribute forecast, be specifically related to a kind of soil organic matter Forecasting Methodology based on Geographical Weighted Regression.
Background technology
In spatial analysis research field, the observation data of variable are obtain by specific geographic unit sampling under normal circumstances.Therefore, this value generally changes along with geospatial location change, and notable change also can occur the relation between independent variable, and the change of this relationship between variables caused because of the change in geographical position or structure is referred to as Space atmosphere.In Geostatistics and economic statistics, Space atmosphere key factor is in three reasons: caused by (1) random sampling error;(2) caused by the difference of each department physical geographic environment, social management system, artificial custom etc.;(3) model for analysis space data does not conform to the actual conditions, or the Space atmosphere problem that have ignored due part regression variable in model and cause.
Geographical Weighted Regression can effectively process the modeling technique of Space atmosphere problem in regression analysis as a kind of, it is allowed to local regression parameter changes along with the change of geospatial location.The method is by the additional dependency of expression spatial object own and heterogeneous running parameter, reflected sample is to regression equation contribution spatially point different, namely each spatial point is different for the impact of dependent variable for identical influence factor, this is consistent with real situation, makes regression result more credible.Owing to the method is not only simple, estimated result has clear and definite analytic representation, and the parameter estimation obtained can also carry out statistical test, therefore more and more studied at the numerous areas such as socioeconomic, urban geography, meteorology, forestry, agrology and applied.But, the Problems of Multiple Synteny tupe in Geographical Weighted Regression is effectively recognized by people but without being formed unified, it does not have collinearity diagnostics method complete, efficient and software;How to use the collinearity diagnostics method in the existing overall regression model of unified Mode integrating, reasonably utilize spatial coherence complicated between independent variable, how to guarantee that the residuals squares of Geographical Weighted Regression Model is necessarily less than the residuals squares of normal linear regression model, and obtaining more truly, predict the outcome reliably, this is all the problem that pendulum is in the urgent need to address in face of numerous researcheres.
The geographical homing method in local is mainly adopted due to Geographical Weighted Regression, the Problems of Multiple Synteny of local regression parameter do not take into account in the geographical regression model in this local in different spatial, thus causing that the method faces problems in expanded application process, it is summed up some limitation following:
(1) Geographical Weighted Regression Model it is crucial that weight matrix in model, be the emphasis of this technical limit spacing Accurate Prediction result to the determination of wherein element.Therefore, in local regression is analyzed, model parameter estimation can be produced bigger impact by abnormity point, the pointwise local least-squares estimation mechanism of Geographical Weighted Regression fundamentally causes that abnormity point is difficult to detect, this allow for commonly used person be difficult to hold how according to concrete analysis situation to input data prediction, constrain development and the application of this technology to a certain extent.
(2) independent variable Problems of Multiple Synteny in local regression is very big on the impact that predicts the outcome, and single dimensionality reduction operation can not accurately select the optimal set of different types of data independent variable.The precision of Geographical Weighted Regression depends on a number of independent variable, sufficient amount and with target variable, there is the independent variable of certain dependency and can express the Characteristics of spatial variability of target variable to a greater degree, but, typically exhibit higher correlation coefficient between independent variable, potentially result in multicollinearity more serious in local regression.As part significant variable is got rid of outside model, by the stability of notable damage model, and then the reasonability of impact analysis problem and the problem of solution.In the regression forecasting process of space, how effectively diagnosis, process Problems of Multiple Synteny have also shown its distinctive importance.
(3) in Spatial Regression Model, the autocorrelation of random error is ubiquitous.The impact how elimination or reduction random error autocorrelation bring in Geographical Weighted Regression analysis is still the challenge of current Geographical Weighted Regression research.
In sum, for the above-mentioned deficiency analyzed, it is equally present in the practical application aspect of soil organic matter, has influence on the accuracy predicted the outcome about soil organic matter.
Summary of the invention
For above-mentioned technical problem, the technical problem to be solved is to provide one and covers local regression multicollinearity diagnosis technology and the big key technique of multicollinearity integrated treatment two, it is possible to solve the soil organic matter Forecasting Methodology based on Geographical Weighted Regression of independent variable synteny problem in existing local regression forecast analysis.
The present invention is to solve above-mentioned technical problem by the following technical solutions: the present invention devises a kind of soil organic matter Forecasting Methodology based on Geographical Weighted Regression, comprises the steps:
Step 001. is for each sampled point in soil sampling district, gather the soil organic matter corresponding to each sampled point and various types of soil independent variable respectively, and these all variablees are carried out pretreatment, obtain the soil organic matter corresponding respectively to each sampled point and gather numerical value and various types of pending soil independent variable, enter step 002;
Step 002., for all sampled points, selects the sampled point of preset percentage quantity as modeling sampled point, and remaining sampled point, as checking sampled point, enters step 003;
Step 003. adopts the Stepwise Regression Method based on minimum information criterion, the all pending soil independent variable corresponding for all modeling sampled points is screened, obtain based on the soil independent variable set returned, and adopt Pearson's correlation coefficient index to calculate based on correlation matrix between variety classes soil independent variable between two in the soil independent variable set returned, constitute the correlation matrix based on the soil independent variable set returned;Simultaneously, adopt principal component analytical method, the all pending soil independent variable corresponding for all modeling sampled points processes, obtain the soil independent variable set based on main constituent, and adopt Pearson's correlation coefficient index to calculate based on correlation matrix between variety classes soil independent variable between two in the soil independent variable set of main constituent, constitute the correlation matrix of the soil independent variable set based on main constituent, enter step 004;
Step 004. adopts Geographically weighted regression procedure, respectively using based on the soil independent variable set returned and based on the soil independent variable set of main constituent as independent variable source data set, the prediction being verified sampled point soil organic matter calculates, obtain the predictive value corresponding to this homing method, each checking sampled point soil organic matter of these two kinds of independent variable source data set respectively, and record the Geographical Weighted Regression coefficient sets corresponding to these two kinds of independent variable source data set respectively;
Adopt overall situation homing method, using based on the soil independent variable set returned as independent variable source data set, the prediction being verified sampled point soil organic matter calculates, obtain the predictive value corresponding to this homing method, each checking sampled point soil organic matter of this independent variable source data set, and record the overall regression coefficient corresponding to this independent variable source data set;
Adopt geographical weighting Ridge Regression Modeling Method, using based on the soil independent variable set returned as independent variable source data set, the prediction being verified sampled point soil organic matter calculates, obtain the predictive value corresponding to this homing method, each checking sampled point soil organic matter of this independent variable source data set, and record the geographical weighting coefficients of ridge regression set corresponding to this independent variable source data set;Enter step 005;
Step 005. is according to root-mean-square error index and average error criterion, respectively above-mentioned each group being verified, the predictive value of sampled point soil organic matter gathers numerical value with the soil organic matter of checking sampled point and carries out cross validation, obtain optimum checking sampled point soil organic matter predictive value, and obtain this optimum checking homing method corresponding to sampled point soil organic matter predictive value and independent variable source data set, enter step 006;
If the homing method that the checking of this optimum of step 006. is corresponding to sampled point soil organic matter predictive value is Geographically weighted regression procedure, then enter step 007;If the homing method that the checking of this optimum is corresponding to sampled point soil organic matter predictive value is overall situation homing method, then enter step 011;If the homing method that the checking of this optimum is corresponding to sampled point soil organic matter predictive value is geographical weighting Ridge Regression Modeling Method, then enter step 014;
Step 007. obtains this optimum checking Geographical Weighted Regression coefficient sets corresponding to sampled point soil organic matter predictive value, adopt Pearson's correlation coefficient index to calculate in this Geographical Weighted Regression coefficient sets correlation matrix between variety classes soil independent variable between two, constitute the correlation matrix of this Geographical Weighted Regression coefficient sets;And verify the Geographical Weighted Regression coefficient sets corresponding to the correlation matrix of independent variable source data set corresponding to sampled point soil organic matter predictive value, this optimum checking sampled point soil organic matter predictive value for this optimum, and the correlation matrix of this Geographical Weighted Regression coefficient sets, at least one collinearity diagnostics instrument is adopted to be analyzed, judge to obtain the local regression synteny problem condition of homing method and independent variable source data set corresponding to this optimum checking sampled point soil organic matter predictive value, and enter step 008;
Step 008. gathers numerical value according to the soil organic matter of each checking sampled point and adopts Geographically weighted regression procedure to obtain the predictive value of each checking sampled point soil organic matter, obtain the residual error set corresponding to predictive value of this each checking sampled point soil organic matter, analyze the spatial trend of this residual error set, enter step 009;
Step 009. carries out Ordinary Kriging Interpolation interpolation for this residual error set, the interpolation result adopting Geographically weighted regression procedure to obtain each predictive value verifying sampled point soil organic matter corresponding with this residual error set is overlapped, generate the predictive value of this each checking new soil organic matter of sampled point, namely this each checking sampled point soil organic matter Geographical Weighted Regression Krieger predictive value, enter step 010;
Step 010. verifies the Geographical Weighted Regression Krieger predictive value of sampled point soil organic matter by adopting Geographically weighted regression procedure to obtain the predictive value of each checking sampled point soil organic matter with this each, carry out cross validation, determine Optimal calculation method, enter step 018;
Step 011. gathers numerical value according to the soil organic matter of each checking sampled point and adopts overall situation homing method to obtain the predictive value of each checking sampled point soil organic matter, obtain the residual error set corresponding to predictive value of this each checking sampled point soil organic matter, analyze the spatial trend of this residual error set, enter step 012;
Step 012. carries out Ordinary Kriging Interpolation interpolation for this residual error set, the interpolation result adopting overall situation homing method to obtain each predictive value verifying sampled point soil organic matter corresponding with this residual error set is overlapped, generate the predictive value of this each checking new soil organic matter of sampled point, namely this each checking sampled point soil organic matter Geographical Weighted Regression Krieger predictive value, enter step 013;
Step 013. verifies the Geographical Weighted Regression Krieger predictive value of sampled point soil organic matter by adopting overall situation homing method to obtain the predictive value of each checking sampled point soil organic matter with this each, carries out cross validation, it is determined that Optimal calculation method, enters step 018;
Step 014. obtains this optimum checking geographical weighting coefficients of ridge regression set corresponding to sampled point soil organic matter predictive value, adopt Pearson's correlation coefficient index to calculate in this geography weighting coefficients of ridge regression set correlation matrix between variety classes soil independent variable between two, constitute the correlation matrix of this geography weighting coefficients of ridge regression set;And verify the geographical weighting coefficients of ridge regression set corresponding to the correlation matrix of independent variable source data set corresponding to sampled point soil organic matter predictive value, this optimum checking sampled point soil organic matter predictive value for this optimum, and the correlation matrix of this geography weighting coefficients of ridge regression set, at least one collinearity diagnostics instrument is adopted to be analyzed, judge to obtain the local regression synteny problem condition of homing method and independent variable source data set corresponding to this optimum checking sampled point soil organic matter predictive value, and enter step 015;
Step 015. gathers numerical value according to the soil organic matter of each checking sampled point and adopts geographical weighting Ridge Regression Modeling Method to obtain the predictive value of each checking sampled point soil organic matter, obtain the residual error set corresponding to predictive value of this each checking sampled point soil organic matter, analyze the spatial trend of this residual error set, enter step 016;
Step 016. carries out Ordinary Kriging Interpolation interpolation for this residual error set, the interpolation result adopting geographical weighting Ridge Regression Modeling Method to obtain each predictive value verifying sampled point soil organic matter corresponding with this residual error set is overlapped, generate the predictive value of this each checking new soil organic matter of sampled point, namely this each checking sampled point soil organic matter Geographical Weighted Regression Krieger predictive value, enter step 017;
Step 017. verifies the Geographical Weighted Regression Krieger predictive value of sampled point soil organic matter by adopting geographical weighting Ridge Regression Modeling Method to obtain the predictive value of each checking sampled point soil organic matter with this each, carry out cross validation, determine Optimal calculation method, enter step 018;
Step 018. is according to the best practice obtained, and the independent variable source data set that the checking of this optimum is corresponding to sampled point soil organic matter predictive value, and the survey region for place, soil sampling district carries out soil organic matter prediction.
As a preferred technical solution of the present invention: in described step 001, described various types of soil independent variable includes soil altitude data, soil Gradient, soil slope aspect data, soil relief humidity index, soil profile curvature, soil planar curvature, Soil Utilization, the average annual temperature of soil, the average annual rainfall of soil.
As a preferred technical solution of the present invention: in described step 001, the described pretreatment carried out for all variablees, specifically include following steps:
Step 00101. is for the continuous variable in all variablees, pretreatment is carried out respectively by the kind of each variable, wherein, first the meansigma methods m and standard deviation s of each variable kind are obtained respectively, then each variable it is respectively directed to, whether the value of judgment variable is positioned at it in corresponding [m-2s, the m+2s] of dependent variable kind, is, judges that the value of this variable is as normal value;Whether otherwise continue the value judging this variable less than its m-2s that dependent variable kind is corresponding, be that the value updating this variable is m-2s;The value otherwise updating this variable is m+2s;
Step 00102., for the continuous variable in all variablees, is respectively directed to each variable, it is judged that whether variable meets normal distribution, is, does not do any operation, otherwise adopts the conversion of natural logrithm method so that this variable meets normal distribution;
Step 00103. adopts standard score method, is standardized for the continuous variable in all variablees;
Classified variable in all variablees is processed as dummy variable by step 00104..
As a preferred technical solution of the present invention: in described step 00102, for the continuous variable in all variablees, it is respectively directed to each variable, by whether single sample k-s inspection or frequency histogram judgment variable meet normal distribution.
As a preferred technical solution of the present invention: in described step 007, the correlation matrix of independent variable source data set corresponding to sampled point soil organic matter predictive value is verified for this optimum, this optimum checking Geographical Weighted Regression coefficient sets corresponding to sampled point soil organic matter predictive value, and the correlation matrix of this Geographical Weighted Regression coefficient sets, at least one collinearity diagnostics instrument is adopted to be analyzed, judge to obtain the local regression synteny problem condition of homing method and independent variable source data set corresponding to this optimum checking sampled point soil organic matter predictive value, specifically include following process:
The correlation matrix of independent variable source data set corresponding to sampled point soil organic matter predictive value is verified first against this optimum, and the correlation matrix of the Geographical Weighted Regression coefficient sets that the checking of this optimum is corresponding to sampled point soil organic matter predictive value, contrast one by one, if the correlation matrix of the Geographical Weighted Regression coefficient sets of this correspondence existing more than 50% relative index less than the relative index of same location in the correlation matrix of this correspondence independent variable source data set, the local regression synteny problem not especially severe of homing method and independent variable source data set corresponding to this optimum checking sampled point soil organic matter predictive value is then described;
Obtain the variance inflation factor matrix that independent variable source data set corresponding to this optimum checking sampled point soil organic matter predictive value is corresponding, constitute local variance expansion factor matrix;Obtain simultaneously all sampled points in described soil sampling district to should the variance inflation factor of each variable kind in independent variable source data set, constitute overall situation variance inflation factor set, and make following to judge:
If the average of every string is less than 10 in local variance expansion factor matrix, then illustrate that corresponding to this optimum checking sampled point soil organic matter predictive value, the local regression synteny problem of homing method and independent variable source data set is not serious;
If the average of every string is more than 10 in local variance expansion factor matrix, and in overall situation variance inflation factor set to the variance inflation factor of dependent variable kind also greater than 10, then illustrate that this optimum verifies that corresponding to sampled point soil organic matter predictive value, the local regression synteny problem of homing method and independent variable source data set is not serious;
Otherwise illustrate that corresponding to this optimum checking sampled point soil organic matter predictive value, homing method and independent variable source data set exist potential synteny problem;
The variance inflation factor value independent variable more than 10 in independent variable source data set corresponding to sampled point soil organic matter predictive value is verified for this optimum, and the variance inflation factor value independent variable more than 10 in regression coefficient set corresponding to this optimum checking sampled point soil organic matter predictive value, carry out Discrete point analysis between two respectively, if the scatterplot of independent variable is more irregular than the scatterplot of independent variable in corresponding independent variable source data set in corresponding regression coefficient set, then illustrate that corresponding to this optimum checking sampled point soil organic matter predictive value, the synteny problem of homing method and independent variable source data set can be ignored;Otherwise, then illustrate that corresponding to this optimum checking sampled point soil organic matter predictive value, homing method and independent variable source data set exist comparatively serious synteny problem.
A kind of soil organic matter Forecasting Methodology based on Geographical Weighted Regression of the present invention adopts above technical scheme compared with prior art, has following technical effect that
(1) the soil organic matter Forecasting Methodology based on Geographical Weighted Regression of present invention design, combine the overall situation to return and the preconditioning technique of independent variable in other space attribute Forecasting Methodology, used abnormality value removing mechanism, data normalization mechanism, dummy variable change the mechanism the independent variable to different types of data be standardized process, farthest reduce the redundancy of independent variable, it is ensured that the correct enforcement of local regression method;
(2) present invention design based in the soil organic matter Forecasting Methodology of Geographical Weighted Regression, proposing contrast uses principal component analysis to select independent variable method with successive Regression, can farthest express the local space Differentiation Features of target variable, simultaneously, pass through cross validation results, synteny can be eliminated better, improve some and eliminate the undue shortcoming deleting independent quantities in synteny method, retain abundant important independent variable, fully ensure that independent variable can express the Characteristics of spatial variability of non-independent variable;And under the calculating background that FUTURE ENVIRONMENT data are all the more enriched, can be used for independent variable objective, efficiently choose, it is provided that the computational efficiency higher than traditional method and precision of prediction;
(3) the soil organic matter Forecasting Methodology based on Geographical Weighted Regression of present invention design, specifically propose the diagnosis of independent variable synteny question synthesis and treatment mechanism in a set of general Geographical Weighted Regression, by the multiple existing collinearity diagnostics instrument of integrated utilization, relative analysis independent variable synteny degree in local regression, the method finally adopting cross validation, local regression is analyzed with ridge, local method, judge the suitability in specific set of data of the current geographic weighted regression method, improve computational efficiency and the precision of space attribute prediction, and the method is dynamic, synthetically collinearity diagnostics method is expected to overcome the limitation of single diagnostic method, there is good universality and stability, there is wide industrial applications prospect;
(4) present invention design based in the soil organic matter Forecasting Methodology of Geographical Weighted Regression, the working method also relating to black box contrast selects super ensemble two kinds optimum by different mechanism, carry out three kinds of recurrence modes according to different mathematical modeies to be predicted with a kind of compound geo-statistic method, take into full account dependency and the variability of target variable, promote the accuracy of result of calculation to the full extent.
Accompanying drawing explanation
Fig. 1 is the broad flow diagram of variable pretreatment;
Fig. 2 is the broad flow diagram that independent variable is chosen;
Fig. 3 uses different independent variable set to carry out regression analysis flow chart;
Fig. 4 is Regression Analysis Result precision test flow chart;
Fig. 5 is the broad flow diagram of local regression collinearity diagnostics;
Fig. 6 is that the residual error that Geographical Weighted Regression is predicted the outcome carries out Geostatistics analysis flow chart;
Fig. 7 (a) is the spatial distribution map of continuous independent variable and soil sampling point: Law of DEM Data (DEM)
Fig. 7 (b) is the spatial distribution map of continuous independent variable and soil sampling point: average annual temperature (MAAT);
Fig. 7 (c) is the spatial distribution map of continuous independent variable and soil sampling point: average annual rainfall (MAP);
Fig. 7 (d) is the spatial distribution map of continuous independent variable and soil sampling point: the gradient (Slope);
Fig. 8 (a) is the histogram frequency distribution diagram of sampling soil organic carbon data (SOC);
Fig. 8 (b) is soil organic matter (LnSOC) histogram frequency distribution diagram after natural logrithm is changed;
Fig. 9 (a) is classified variable, land use data LandUse variable;
Fig. 9 (b) is by dummy variable transformation result LandUse1 variable;
Fig. 9 (c) is by dummy variable transformation result LandUse2 variable;
Figure 10 (a) is independent variable scatterplot;
Figure 10 (b) is the scatterplot of independent variable local regression coefficient;
Figure 11 is based on the semi-variogram of Geographical Weighted Regression prediction residual;
Figure 12 (a) is the soil organic matter spatial distribution map using the prediction of Optimal calculation method in the embodiment of the present invention: Geographical Weighted Regression Kriging method;
Figure 12 (b) is the soil organic matter spatial distribution map using worst computational methods to predict in the embodiment of the present invention: overall situation homing method.
Detailed description of the invention
It is described in further detail for the specific embodiment of the present invention below in conjunction with Figure of description.
Present invention design is in independent variable selection based on the basic thought of the soil organic matter Forecasting Methodology of Geographical Weighted Regression, processes and in the process of local regression, complete diagnosis and the process of synteny problem between independent variable collection, it is achieved more efficient while dissimilar independent variable space exploration relation in local regression process is non-stationary, predict target variable exactly;While ensureing local regression synteny problem diagnosis, by contrasting multiple local regression technology, based on trend surface equation analysis trend term, eliminate uneven stability, thus improving the spatial prediction precision of objective attribute target attribute in geocomputation greatly.
Soil organic matter (SOC) is the important component part of soil, and its content spatial distribution map all plays extremely important effect in soil fertility, environmental conservation, agricultural sustainable development etc..Soil sampling is the important way of estimation regional soil organic carbon (SOC) content and Spatial variability model.But, restriction by sampling funds and field sampling condition, gathered sampling point is often difficult to the space distribution rule of image study district soil attribute all sidedly, especially for highly heterogeneous view region, conventional method is more difficult detects the non-stationary of spatial relationship, and draws desirable spatial prediction result.The conventional independent variable of prediction soil organic matter (SOC) includes digital elevation model and derivative terrain factor, Land_use change, geology, climatic data etc..
Here soil organic matter (SOC) is non-independent variable, for observing the sequence of values with space attribute (longitude and latitude), next, the present invention designs a kind of soil organic matter Forecasting Methodology based on Geographical Weighted Regression, in actual application, by based on soil organic matter (SOC) data set observed and independent variable set, it was predicted that soil organic matter (SOC) data of unknown locus, specifically include following steps:
Step 001. is for each sampled point in soil sampling district, gather the soil organic matter (SOC) corresponding to each sampled point and various types of soil independent variable respectively, and these all variablees are carried out pretreatment, obtain the soil organic matter corresponding respectively to each sampled point and gather numerical value and various types of pending soil independent variable, enter step 002.
nullWherein,Soil independent variable (also referred to as " covariant ") figure layer such as Fig. 7 (a)、Fig. 7 (b)、Fig. 7 (c)、Shown in Fig. 7 (d),Various types of soil independent variable includes soil altitude data (DEM)、Soil Gradient (Slope)、Soil slope aspect data (Aspect)、Soil relief humidity index (TWI)、Soil profile curvature (ProCur)、Soil planar curvature (PlaCur)、Soil Utilization (LandUse)、The average annual temperature of soil (MAAT)、The average annual rainfall of soil (MAP),Wherein,Soil organic matter (SOC)、Soil altitude data (DEM)、Soil Gradient (Slope)、Soil slope aspect data (Aspect)、Soil relief humidity index (TWI)、Soil profile curvature (ProCur)、Soil planar curvature (PlaCur)、The average annual temperature of soil (MAAT)、The average annual rainfall of soil (MAP) is continuous variable,Soil Utilization (LandUse) is classified variable;For the pretreatment that all variablees carry out, as it is shown in figure 1, specifically include following steps:
Step 00101. is for the continuous variable in all variablees, pretreatment is carried out respectively by the kind of each variable, wherein, first the meansigma methods m and standard deviation s of each variable kind are obtained respectively, then each variable it is respectively directed to, whether the value of judgment variable is positioned at it in corresponding [m-2s, the m+2s] of dependent variable kind, is, judges that the value of this variable is as normal value;Whether otherwise continue the value judging this variable less than its m-2s that dependent variable kind is corresponding, be that the value updating this variable is m-2s;The value otherwise updating this variable is m+2s.
Step 00102., for the continuous variable in all variablees, is respectively directed to each variable, by whether single sample k-s inspection or frequency histogram judgment variable meet normal distribution, is do not do any operation, otherwise by such as drag:
Process_X=Ln (X)
Adopt the conversion of natural logrithm method, this variable is made to meet normal distribution, the frequency histogram that before and after conversion, variable is corresponding is such as shown in Fig. 8 (a), Fig. 8 (b), wherein, X is continuous variable, and Process_X carries out the result after natural logrithm method conversion for continuous variable X.
Step 00103. is by such as drag:
ZY = ( ZX - m ) s
Adopt standard score method, it is standardized for the continuous variable in all variablees, wherein, ZX represents the result that the continuous variable in all variablees obtains after sequentially passing through step 00101, step 00102, and ZY represents that the continuous variable in all variablees adopts the result that standard score method obtains after being standardized after sequentially passing through step 00101, step 00102;After standardization, the standard deviation of continuous variable, meansigma methods respectively 1,0 in all variablees, eliminate the different independent variable difference dimension impacts on non-independent variable in continuous variable in all variablees.
Step 00104. due in all variablees part classified variable value be likely to more, one factor analysis of variance method Duncan method is used to be analyzed, to reduce the group number of classification, on this basis, as shown in Fig. 9 (a), Fig. 9 (b), Fig. 9 (c), it is dummy variable that classified variable Soil Utilization (LandUse) in all variablees is processed by the handling process of use standard, and 6 original categorical attributes are converted to two figure layers (LandUse1, LandUse2) with 0,1 attribute.
Step 002., for all sampled points, selects the sampled point of preset percentage quantity 80% as modeling sampled point, and the sampled point of residue 20%, as checking sampled point, enters step 003.
Step 003. is as in figure 2 it is shown, adopt the Stepwise Regression Method based on minimum information criterion, and all pending soil independent variable corresponding for all modeling sampled points is screened, it is thus achieved that based on the soil independent variable set StepVari={DEM returnedi,Slopei,TWIi,LandUse1i,MAATi; i=1,2 ..., n × 0.8; n is the number of sampled point in soil sampling district; successive Regression in order that reject affect inapparent independent variable; its result is to retain in former index system to affect significant several index; and adopt Pearson's correlation coefficient index to calculate based on correlation matrix between variety classes soil independent variable between two in the soil independent variable set StepVari returned, constitute the correlation matrix based on the soil independent variable set StepVari returned;Meanwhile, adopting principal component analytical method, all pending soil independent variable corresponding for all modeling sampled points processes, it is thus achieved that based on the soil independent variable set PcaVari={DEM of main constituenti,TWIi,LandUse1i,MAPi,ProCuri; i=1,2 ..., n × 0.8; n is the number of sampled point in soil sampling district; and adopt Pearson's correlation coefficient index to calculate based on correlation matrix between variety classes soil independent variable between two in the soil independent variable set PcaVari of main constituent; constitute the correlation matrix of the soil independent variable set PcaVari based on main constituent, enter step 004.
It is different from Stepwise Regression Method, owing to principal component analytical method has contemplated that the relation between independent variable, it is carried out dimension-reduction treatment, and then challenge is simplified, obtain the soil independent variable set PcaVari based on main constituent of negligible amounts.Using the soil independent variable set PcaVari based on main constituent, carrying out local regression is as to based on one of method of inspection of soil independent variable set StepVari data set Geographical Weighted Regression correctness returned.
Step 004. is as shown in Figure 3, adopt Geographically weighted regression procedure, respectively using based on the soil independent variable set StepVari returned and based on the soil independent variable set PcaVari of main constituent as independent variable source data set, the prediction being verified sampled point soil organic matter calculates, obtain the predictive value corresponding to this homing method, each checking sampled point soil organic matter of these two kinds of independent variable source data set respectively, and record the Geographical Weighted Regression coefficient sets corresponding to these two kinds of independent variable source data set respectively LoR _ GWR _ StepVari = { LS DEM i , LS Slop e i , LS T WI i , LS LandUsel i , LS MAAT i } , LoR _ GWR _ PcaVari = { LP DEM i , LP TWI i , LP LandUsel i , LP MAP i , LP ProCur i } , (note: this set is vector set, namely the regression coefficient of each independent variable changes along with spatial displacement).
Adopt overall situation homing method, using based on the soil independent variable set StepVari returned as independent variable source data set, the prediction being verified sampled point soil organic matter calculates, obtain the predictive value corresponding to this homing method, each checking sampled point soil organic matter of this independent variable source data set, and record the overall regression coefficient GoR_StepVari={GS corresponding to this independent variable source data setDEM,GSSlope,GSTWI,GSLandUse1,GSMAAT(note: this set is not vector set, that is to say each independent variable only one of which coefficient).
Adopt geographical weighting Ridge Regression Modeling Method, using based on the soil independent variable set StepVari returned as independent variable source data set, the prediction being verified sampled point soil organic matter calculates, obtain the predictive value corresponding to this homing method, each checking sampled point soil organic matter of this independent variable source data set, and record the geographical weighting coefficients of ridge regression set LoR_GWRR_StepVari corresponding to this independent variable source data set, wherein, LoR_GWRR_StepVari is as follows:
LoR _ GWRR _ StepVari = { LSR DEM i , LSR Slop e i , LSR T WI i , LSR LandUsel i , LSR MAAT i } ; Enter step 005.
Step 005. is according to root-mean-square error index (RMSE) and average error criterion (ME), respectively above-mentioned each group being verified, the predictive value of sampled point soil organic matter gathers numerical value with the soil organic matter of checking sampled point and carries out cross validation, obtain optimum checking sampled point soil organic matter predictive value, and obtain this optimum checking homing method corresponding to sampled point soil organic matter predictive value and independent variable source data set, enter step 006.
Step 006. as shown in Figure 4, if the homing method that the checking of this optimum is corresponding to sampled point soil organic matter predictive value is Geographically weighted regression procedure, then enters step 007;If the homing method that the checking of this optimum is corresponding to sampled point soil organic matter predictive value is overall situation homing method, then enter step 011;If the homing method that the checking of this optimum is corresponding to sampled point soil organic matter predictive value is geographical weighting Ridge Regression Modeling Method, then enter step 014.
Step 007. verifies sampled point soil organic matter predictive value according to optimum, obtain this optimum checking Geographical Weighted Regression coefficient sets corresponding to sampled point soil organic matter predictive value, namely verify that the correspondence of sampled point soil organic matter predictive value obtains one of them in LoR_GWR_StepVari, LoR_GWR_PcaVari according to optimum, adopt Pearson's correlation coefficient index to calculate in this Geographical Weighted Regression coefficient sets correlation matrix between variety classes soil independent variable between two, constitute the correlation matrix of this Geographical Weighted Regression coefficient sets;And verify the Geographical Weighted Regression coefficient sets corresponding to the correlation matrix of independent variable source data set corresponding to sampled point soil organic matter predictive value, this optimum checking sampled point soil organic matter predictive value for this optimum, and the correlation matrix of this Geographical Weighted Regression coefficient sets, at least one collinearity diagnostics instrument is adopted to be analyzed, judge to obtain the local regression synteny problem condition of homing method and independent variable source data set corresponding to this optimum checking sampled point soil organic matter predictive value, and enter step 008.
As shown in Figure 5, the above-mentioned Geographical Weighted Regression coefficient sets verified for this optimum corresponding to the correlation matrix of independent variable source data set corresponding to sampled point soil organic matter predictive value, this optimum checking sampled point soil organic matter predictive value, and the correlation matrix of this Geographical Weighted Regression coefficient sets, at least one collinearity diagnostics instrument is adopted to be analyzed, judge to obtain the local regression synteny problem condition of homing method and independent variable source data set corresponding to this optimum checking sampled point soil organic matter predictive value, specifically include following process:
The correlation matrix of independent variable source data set corresponding to sampled point soil organic matter predictive value is verified first against this optimum, and the correlation matrix of the Geographical Weighted Regression coefficient sets that the checking of this optimum is corresponding to sampled point soil organic matter predictive value, contrast one by one, if the correlation matrix of the Geographical Weighted Regression coefficient sets of this correspondence existing more than 50% relative index less than the relative index of same location in the correlation matrix of this correspondence independent variable source data set, the local regression synteny problem not especially severe of homing method and independent variable source data set corresponding to this optimum checking sampled point soil organic matter predictive value is then described;
Obtain the variance inflation factor matrix that independent variable source data set corresponding to this optimum checking sampled point soil organic matter predictive value is corresponding, constitute local variance expansion factor matrix;Obtain simultaneously all sampled points in described soil sampling district to should the variance inflation factor of each variable kind in independent variable source data set, constitute overall situation variance inflation factor set, and make following to judge:
If the average of every string is less than 10 in local variance expansion factor matrix, then illustrate that corresponding to this optimum checking sampled point soil organic matter predictive value, the local regression synteny problem of homing method and independent variable source data set is not serious;
If the average of every string is more than 10 in local variance expansion factor matrix, and in overall situation variance inflation factor set to the variance inflation factor of dependent variable kind also greater than 10, then illustrate that this optimum verifies that corresponding to sampled point soil organic matter predictive value, the local regression synteny problem of homing method and independent variable source data set is not serious;
Otherwise illustrate that corresponding to this optimum checking sampled point soil organic matter predictive value, homing method and independent variable source data set exist potential synteny problem;
The variance inflation factor value independent variable more than 10 in independent variable source data set corresponding to sampled point soil organic matter predictive value is verified for this optimum, and the variance inflation factor value independent variable more than 10 in regression coefficient set corresponding to this optimum checking sampled point soil organic matter predictive value, carry out Discrete point analysis between two respectively, if the scatterplot of independent variable is more irregular than the scatterplot of independent variable in corresponding independent variable source data set in corresponding regression coefficient set, then illustrate that corresponding to this optimum checking sampled point soil organic matter predictive value, the synteny problem of homing method and independent variable source data set can be ignored;Otherwise, then illustrate that corresponding to this optimum checking sampled point soil organic matter predictive value, homing method and independent variable source data set exist comparatively serious synteny problem.
Step 008. is as shown in Figure 6, soil organic matter according to each checking sampled point gathers numerical value and adopts Geographically weighted regression procedure to obtain the predictive value of each checking sampled point soil organic matter, obtain the residual error set corresponding to predictive value of this each checking sampled point soil organic matter, adopt matrix method (MOM) to analyze the spatial trend of this residual error set, enter step 009.
Step 009. carries out Ordinary Kriging Interpolation interpolation for this residual error set, the interpolation result adopting Geographically weighted regression procedure to obtain each predictive value verifying sampled point soil organic matter corresponding with this residual error set is overlapped, generate the predictive value of this each checking new soil organic matter of sampled point, namely this each checking sampled point soil organic matter Geographical Weighted Regression Krieger predictive value, enter step 010.
Step 010. verifies the Geographical Weighted Regression Krieger predictive value of sampled point soil organic matter by adopting Geographically weighted regression procedure to obtain the predictive value of each checking sampled point soil organic matter with this each, carry out cross validation, determine Optimal calculation method, enter step 018.
Step 011. gathers numerical value according to the soil organic matter of each checking sampled point and adopts overall situation homing method to obtain the predictive value of each checking sampled point soil organic matter, obtain the residual error set corresponding to predictive value of this each checking sampled point soil organic matter, adopt matrix method (MOM) to analyze the spatial trend of this residual error set, enter step 012.
Step 012. carries out Ordinary Kriging Interpolation interpolation for this residual error set, the interpolation result adopting overall situation homing method to obtain each predictive value verifying sampled point soil organic matter corresponding with this residual error set is overlapped, generate the predictive value of this each checking new soil organic matter of sampled point, namely this each checking sampled point soil organic matter Geographical Weighted Regression Krieger predictive value, enter step 013.
Step 013. verifies the Geographical Weighted Regression Krieger predictive value of sampled point soil organic matter by adopting overall situation homing method to obtain the predictive value of each checking sampled point soil organic matter with this each, carries out cross validation, it is determined that Optimal calculation method, enters step 018.
Step 014. obtains this optimum checking geographical weighting coefficients of ridge regression set LoR_GWRR_StepVar corresponding to sampled point soil organic matter predictive value, adopt Pearson's correlation coefficient index to calculate in this geography weighting coefficients of ridge regression set LoR_GWRR_StepVar correlation matrix between variety classes soil independent variable between two, constitute the correlation matrix of this geography weighting coefficients of ridge regression set LoR_GWRR_StepVar;And verify the correlation matrix of independent variable source data set corresponding to sampled point soil organic matter predictive value for this optimum, this optimum checking geographical weighting coefficients of ridge regression set LoR_GWRR_StepVar corresponding to sampled point soil organic matter predictive value, and the correlation matrix of this geography weighting coefficients of ridge regression set LoR_GWRR_StepVar, at least one collinearity diagnostics instrument is adopted to be analyzed, judge to obtain the local regression synteny problem condition of homing method and independent variable source data set corresponding to this optimum checking sampled point soil organic matter predictive value, and enter step 015.
Step 015. gathers numerical value according to the soil organic matter of each checking sampled point and adopts geographical weighting Ridge Regression Modeling Method to obtain the predictive value of each checking sampled point soil organic matter, obtain the residual error set corresponding to predictive value of this each checking sampled point soil organic matter, adopt matrix method (MOM) to analyze the spatial trend of this residual error set, enter step 016.
Step 016. carries out Ordinary Kriging Interpolation interpolation for this residual error set, the interpolation result adopting geographical weighting Ridge Regression Modeling Method to obtain each predictive value verifying sampled point soil organic matter corresponding with this residual error set is overlapped, generate the predictive value of this each checking new soil organic matter of sampled point, namely this each checking sampled point soil organic matter Geographical Weighted Regression Krieger predictive value, enter step 017.
Step 017. verifies the Geographical Weighted Regression Krieger predictive value of sampled point soil organic matter by adopting geographical weighting Ridge Regression Modeling Method to obtain the predictive value of each checking sampled point soil organic matter with this each, carry out cross validation, determine Optimal calculation method, enter step 018.
According to above-mentioned steps 006 to step 017, if the result performing step 006 shows the checking of this optimum, the homing method corresponding to sampled point soil organic matter predictive value is Geographically weighted regression procedure, and point out that this optimum checking Geographical Weighted Regression coefficient sets corresponding to sampled point soil organic matter predictive value is LoR_GWR_StepVari, independent variable set of source data is combined into StepVari, then enter step 007, continue executing with step 007 to step 010, finally perform step 018.Wherein, step 007 is as follows to step 010 detailed process:
Step 007. verifies sampled point soil organic matter predictive value according to optimum, obtain this optimum checking Geographical Weighted Regression coefficient sets LoR_GWR_StepVari corresponding to sampled point soil organic matter predictive value, adopt Pearson's correlation coefficient index to calculate in this Geographical Weighted Regression coefficient sets LoR_GWR_StepVari correlation matrix between variety classes soil independent variable between two, constitute the correlation matrix of this Geographical Weighted Regression coefficient sets LoR_GWR_StepVari;And verify the correlation matrix of independent variable source data set StepVari corresponding to sampled point soil organic matter predictive value for this optimum, this optimum checking Geographical Weighted Regression coefficient sets LoR_GWR_StepVari corresponding to sampled point soil organic matter predictive value, and the correlation matrix of this Geographical Weighted Regression coefficient sets LoR_GWR_StepVari, at least one collinearity diagnostics instrument is adopted to be analyzed, judge to obtain the local regression synteny problem condition of Geographically weighted regression procedure and independent variable source data set StepVari corresponding to this optimum checking sampled point soil organic matter predictive value, and enter step 008.
nullWherein,The correlation matrix of independent variable source data set StepVari corresponding to sampled point soil organic matter predictive value is verified first against this optimum,And the correlation matrix of the Geographical Weighted Regression coefficient sets LoR_GWR_StepVari that the checking of this optimum is corresponding to sampled point soil organic matter predictive value,Contrast one by one,If the correlation matrix of the Geographical Weighted Regression coefficient sets LoR_GWR_StepVari of this correspondence exists the relative index of same location in more than 50% relative index correlation matrix less than this correspondence independent variable source data set StepVari,The local regression synteny problem not especially severe of Geographically weighted regression procedure and independent variable source data set StepVari corresponding to this optimum checking sampled point soil organic matter predictive value is then described;
Obtain variance inflation factor matrix corresponding for independent variable source data set StepVari corresponding to this optimum checking sampled point soil organic matter predictive value VIF _ GWR _ StepVari = { VIF DEM i , VIF Slop e i , VIF TWI i , VIF LandUsel i , VIF MAAT i } , Constitute local variance expansion factor matrix;Obtain simultaneously all sampled points in described soil sampling district to should the variance inflation factor of each variable kind in independent variable source data set, constitute overall situation variance inflation factor set VIF_OLS={VIFDEM,VIFSlope,VIFTWI,VIFLandUse1,VIFMAAT, and do following judgement:
If the average of every string is less than 10 in local variance expansion factor matrix, then illustrate that corresponding to this optimum checking sampled point soil organic matter predictive value, the local regression synteny problem of Geographically weighted regression procedure and independent variable source data set StepVari is not serious;
If the average of every string is more than 10 in local variance expansion factor matrix, and in overall situation variance inflation factor set to the variance inflation factor of dependent variable kind also greater than 10, then illustrate that this optimum verifies that corresponding to sampled point soil organic matter predictive value, the local regression synteny problem of Geographically weighted regression procedure and independent variable source data set StepVari is not serious;
Otherwise illustrate that corresponding to this optimum checking sampled point soil organic matter predictive value, Geographically weighted regression procedure and independent variable source data set StepVari exist potential synteny problem;
nullThe variance inflation factor value independent variable more than 10 in independent variable source data set StepVari corresponding to sampled point soil organic matter predictive value is verified for this optimum,And the variance inflation factor value independent variable more than 10 in Geographical Weighted Regression coefficient sets LoR_GWR_StepVari corresponding to this optimum checking sampled point soil organic matter predictive value,Carry out Discrete point analysis between two respectively,If the scatterplot of independent variable is more irregular than the scatterplot of independent variable in corresponding independent variable source data set StepVari in corresponding Geographical Weighted Regression coefficient sets LoR_GWR_StepVari,Then illustrate that corresponding to this optimum checking sampled point soil organic matter predictive value, the synteny problem of Geographically weighted regression procedure and independent variable source data set StepVari can be ignored;Otherwise, then illustrate that corresponding to this optimum checking sampled point soil organic matter predictive value, Geographically weighted regression procedure and independent variable source data set StepVari exist comparatively serious synteny problem, such as Figure 10 (a), Figure 10 (b), illustrate between soil altitude data (DEM) and soil average annual temperature (MAAT) local regression coefficient that dependency (Figure 10 (b)) is significantly lower than between the two independent variable dependency (Figure 10 (a)), it was demonstrated that current synteny problem can accept.
Step 008. gathers numerical value according to the soil organic matter of each checking sampled point and adopts Geographically weighted regression procedure to obtain the predictive value of each checking sampled point soil organic matter, obtain the residual error set corresponding to predictive value of this each checking sampled point soil organic matter, matrix method (MOM) is adopted to analyze the spatial trend of this residual error set, analyze result as shown in figure 11, enter step 009.
Step 009. carries out Ordinary Kriging Interpolation interpolation for this residual error set, the interpolation result adopting Geographically weighted regression procedure to obtain each predictive value verifying sampled point soil organic matter corresponding with this residual error set is overlapped, generate the predictive value of this each checking new soil organic matter of sampled point, namely this each checking sampled point soil organic matter Geographical Weighted Regression Krieger predictive value, enter step 010.
Step 010. verifies the Geographical Weighted Regression Krieger predictive value of sampled point soil organic matter by adopting Geographically weighted regression procedure to obtain the predictive value of each checking sampled point soil organic matter with this each, carry out cross validation, determine Optimal calculation method, enter step 018.
Step 018. is according to the best practice obtained, and the independent variable source data set that the checking of this optimum is corresponding to sampled point soil organic matter predictive value, survey region for place, soil sampling district carries out soil organic matter prediction, finally export destination file, as shown in Figure 12 (a), Figure 12 (b).
In sum, the soil organic matter Forecasting Methodology based on Geographical Weighted Regression of present invention design, combine the overall situation to return and the preconditioning technique of independent variable in other space attribute Forecasting Methodology, used abnormality value removing mechanism, data normalization mechanism, dummy variable change the mechanism the independent variable to different types of data be standardized process, farthest reduce the redundancy of independent variable, it is ensured that the correct enforcement of local regression method;Proposing contrast uses principal component analysis to select independent variable method with successive Regression, can farthest express the local space Differentiation Features of target variable, simultaneously, pass through cross validation results, synteny can be eliminated better, improve some and eliminate the undue shortcoming deleting independent quantities in synteny method, retain abundant important independent variable, fully ensure that independent variable can express the Characteristics of spatial variability of non-independent variable;And under the calculating background that FUTURE ENVIRONMENT data are all the more enriched, can be used for independent variable objective, efficiently choose, it is provided that the computational efficiency higher than traditional method and precision of prediction;Specifically propose the diagnosis of independent variable synteny question synthesis and treatment mechanism in a set of general Geographical Weighted Regression, by the multiple existing collinearity diagnostics instrument of integrated utilization, relative analysis independent variable synteny degree in local regression, the method finally adopting cross validation, local regression is analyzed with ridge, local method, judge the suitability in specific set of data of the current geographic weighted regression method, improve computational efficiency and the precision of space attribute prediction, and the method is dynamic, synthetically collinearity diagnostics method is expected to overcome the limitation of single diagnostic method, there is good universality and stability, there is wide industrial applications prospect;The working method also relating to black box contrast selects super ensemble two kinds optimum by different mechanism, carry out three kinds of recurrence modes according to different mathematical modeies to be predicted with a kind of compound geo-statistic method, take into full account dependency and the variability of target variable, promote the accuracy of result of calculation to the full extent.
Above in conjunction with accompanying drawing, embodiments of the present invention are explained in detail, but the present invention is not limited to above-mentioned embodiment, for example with above-mentioned design technology project framework, the Temperature prediction in treatment research region can also be applied to, the per capita income prediction of survey region, by above-mentioned design structure, using space as survey region, prediction project does objective attribute target attribute (soil organic matter (SOC) in similar present invention design), the objective attribute target attribute prediction in the Study of the Realization field, in the ken that those of ordinary skill in the art possess, can also make a variety of changes under the premise without departing from present inventive concept.

Claims (5)

1. the soil organic matter Forecasting Methodology based on Geographical Weighted Regression, it is characterised in that comprise the steps:
Step 001. is for each sampled point in soil sampling district, gather the soil organic matter corresponding to each sampled point and various types of soil independent variable respectively, and all variablees are carried out pretreatment, obtain the soil organic matter corresponding respectively to each sampled point and gather numerical value and various types of pending soil independent variable, enter step 002;
Step 002., for all sampled points, selects the sampled point of preset percentage quantity as modeling sampled point, and remaining sampled point, as checking sampled point, enters step 003;
Step 003. adopts the Stepwise Regression Method based on minimum information criterion, the all pending soil independent variable corresponding for all modeling sampled points is screened, obtain based on the soil independent variable set returned, and adopt Pearson's correlation coefficient index to calculate based on correlation matrix between variety classes soil independent variable between two in the soil independent variable set returned, constitute the correlation matrix based on the soil independent variable set returned;Simultaneously, adopt principal component analytical method, the all pending soil independent variable corresponding for all modeling sampled points processes, obtain the soil independent variable set based on main constituent, and adopt Pearson's correlation coefficient index to calculate based on correlation matrix between variety classes soil independent variable between two in the soil independent variable set of main constituent, constitute the correlation matrix of the soil independent variable set based on main constituent, enter step 004;
Step 004. adopts Geographically weighted regression procedure, respectively using based on the soil independent variable set returned and based on the soil independent variable set of main constituent as independent variable source data set, the prediction being verified sampled point soil organic matter calculates, obtain the predictive value corresponding to this homing method, each checking sampled point soil organic matter of these two kinds of independent variable source data set respectively, and record the Geographical Weighted Regression coefficient sets corresponding to these two kinds of independent variable source data set respectively;
Adopt overall situation homing method, using based on the soil independent variable set returned as independent variable source data set, the prediction being verified sampled point soil organic matter calculates, obtain the predictive value corresponding to this homing method, each checking sampled point soil organic matter of this independent variable source data set, and record the overall regression coefficient corresponding to this independent variable source data set;Adopt geographical weighting Ridge Regression Modeling Method, using based on the soil independent variable set returned as independent variable source data set, the prediction being verified sampled point soil organic matter calculates, obtain the predictive value corresponding to this homing method, each checking sampled point soil organic matter of this independent variable source data set, and record the geographical weighting coefficients of ridge regression set corresponding to this independent variable source data set;Enter step 005;
Step 005. is according to root-mean-square error index and average error criterion, respectively above-mentioned each group being verified, the predictive value of sampled point soil organic matter gathers numerical value with the soil organic matter of checking sampled point and carries out cross validation, obtain optimum checking sampled point soil organic matter predictive value, and obtain this optimum checking homing method corresponding to sampled point soil organic matter predictive value and independent variable source data set, enter step 006;
If the homing method that the checking of this optimum of step 006. is corresponding to sampled point soil organic matter predictive value is Geographically weighted regression procedure, then enter step 007;If the homing method that the checking of this optimum is corresponding to sampled point soil organic matter predictive value is overall situation homing method, then enter step 011;If the homing method that the checking of this optimum is corresponding to sampled point soil organic matter predictive value is geographical weighting Ridge Regression Modeling Method, then enter step 014;
Step 007. obtains this optimum checking Geographical Weighted Regression coefficient sets corresponding to sampled point soil organic matter predictive value, adopt Pearson's correlation coefficient index to calculate in this Geographical Weighted Regression coefficient sets correlation matrix between variety classes soil independent variable between two, constitute the correlation matrix of this Geographical Weighted Regression coefficient sets;And verify the Geographical Weighted Regression coefficient sets corresponding to the correlation matrix of independent variable source data set corresponding to sampled point soil organic matter predictive value, this optimum checking sampled point soil organic matter predictive value for this optimum, and the correlation matrix of this Geographical Weighted Regression coefficient sets, at least one collinearity diagnostics instrument is adopted to be analyzed, judge to obtain the local regression synteny problem condition of homing method and independent variable source data set corresponding to this optimum checking sampled point soil organic matter predictive value, and enter step 008;
Step 008. gathers numerical value according to the soil organic matter of each checking sampled point and adopts Geographically weighted regression procedure to obtain the predictive value of each checking sampled point soil organic matter, obtain the residual error set corresponding to predictive value of this each checking sampled point soil organic matter, analyze the spatial trend of this residual error set, enter step 009;
Step 009. carries out Ordinary Kriging Interpolation interpolation for this residual error set, the interpolation result adopting Geographically weighted regression procedure to obtain each predictive value verifying sampled point soil organic matter corresponding with this residual error set is overlapped, generate the predictive value of this each checking new soil organic matter of sampled point, namely this each checking sampled point soil organic matter Geographical Weighted Regression Krieger predictive value, enter step 010;
Step 010. verifies the Geographical Weighted Regression Krieger predictive value of sampled point soil organic matter by adopting Geographically weighted regression procedure to obtain the predictive value of each checking sampled point soil organic matter with this each, carry out cross validation, determine Optimal calculation method, enter step 018;
Step 011. gathers numerical value according to the soil organic matter of each checking sampled point and adopts overall situation homing method to obtain the predictive value of each checking sampled point soil organic matter, obtain the residual error set corresponding to predictive value of this each checking sampled point soil organic matter, analyze the spatial trend of this residual error set, enter step 012;
Step 012. carries out Ordinary Kriging Interpolation interpolation for this residual error set, the interpolation result adopting overall situation homing method to obtain each predictive value verifying sampled point soil organic matter corresponding with this residual error set is overlapped, generate the predictive value of this each checking new soil organic matter of sampled point, namely this each checking sampled point soil organic matter Geographical Weighted Regression Krieger predictive value, enter step 013;
Step 013. verifies the Geographical Weighted Regression Krieger predictive value of sampled point soil organic matter by adopting overall situation homing method to obtain the predictive value of each checking sampled point soil organic matter with this each, carries out cross validation, it is determined that Optimal calculation method, enters step 018;
Step 014. obtains this optimum checking geographical weighting coefficients of ridge regression set corresponding to sampled point soil organic matter predictive value, adopt Pearson's correlation coefficient index to calculate in this geography weighting coefficients of ridge regression set correlation matrix between variety classes soil independent variable between two, constitute the correlation matrix of this geography weighting coefficients of ridge regression set;And verify the geographical weighting coefficients of ridge regression set corresponding to the correlation matrix of independent variable source data set corresponding to sampled point soil organic matter predictive value, this optimum checking sampled point soil organic matter predictive value for this optimum, and the correlation matrix of this geography weighting coefficients of ridge regression set, at least one collinearity diagnostics instrument is adopted to be analyzed, judge to obtain the local regression synteny problem condition of homing method and independent variable source data set corresponding to this optimum checking sampled point soil organic matter predictive value, and enter step 015;
Step 015. gathers numerical value according to the soil organic matter of each checking sampled point and adopts geographical weighting Ridge Regression Modeling Method to obtain the predictive value of each checking sampled point soil organic matter, obtain the residual error set corresponding to predictive value of this each checking sampled point soil organic matter, analyze the spatial trend of this residual error set, enter step 016;
Step 016. carries out Ordinary Kriging Interpolation interpolation for this residual error set, the interpolation result adopting geographical weighting Ridge Regression Modeling Method to obtain each predictive value verifying sampled point soil organic matter corresponding with this residual error set is overlapped, generate the predictive value of this each checking new soil organic matter of sampled point, namely this each checking sampled point soil organic matter Geographical Weighted Regression Krieger predictive value, enter step 017;
Step 017. verifies the Geographical Weighted Regression Krieger predictive value of sampled point soil organic matter by adopting geographical weighting Ridge Regression Modeling Method to obtain the predictive value of each checking sampled point soil organic matter with this each, carry out cross validation, determine Optimal calculation method, enter step 018;
Step 018. is according to the best practice obtained, and the independent variable source data set that the checking of this optimum is corresponding to sampled point soil organic matter predictive value, and the survey region for place, soil sampling district carries out soil organic matter prediction.
2. a kind of soil organic matter Forecasting Methodology based on Geographical Weighted Regression according to claim 1, it is characterized in that: in described step 001, described various types of soil independent variable includes soil altitude data, soil Gradient, soil slope aspect data, soil relief humidity index, soil profile curvature, soil planar curvature, Soil Utilization, the average annual temperature of soil, the average annual rainfall of soil.
3. a kind of soil organic matter Forecasting Methodology based on Geographical Weighted Regression according to claim 1, it is characterised in that: in described step 001, the described pretreatment carried out for all variablees, specifically include following steps:
Step 00101. is for the continuous variable in all variablees, pretreatment is carried out respectively by the kind of each variable, wherein, first the meansigma methods m and standard deviation s of each variable kind are obtained respectively, then each variable it is respectively directed to, whether the value of judgment variable is positioned at it in corresponding [m-2s, the m+2s] of dependent variable kind, is, judges that the value of this variable is as normal value;Whether otherwise continue the value judging this variable less than its m-2s that dependent variable kind is corresponding, be that the value updating this variable is m-2s;The value otherwise updating this variable is m+2s;
Step 00102., for the continuous variable in all variablees, is respectively directed to each variable, it is judged that whether variable meets normal distribution, is, does not do any operation, otherwise adopts the conversion of natural logrithm method so that this variable meets normal distribution;
Step 00103. adopts standard score method, is standardized for the continuous variable in all variablees;
Classified variable in all variablees is processed as dummy variable by step 00104..
4. a kind of soil organic matter Forecasting Methodology based on Geographical Weighted Regression according to claim 3, it is characterized in that: in described step 00102, for the continuous variable in all variablees, it is respectively directed to each variable, by whether single sample k-s inspection or frequency histogram judgment variable meet normal distribution.
5. a kind of soil organic matter Forecasting Methodology based on Geographical Weighted Regression according to claim 1, it is characterized in that: in described step 007, the correlation matrix of independent variable source data set corresponding to sampled point soil organic matter predictive value is verified for this optimum, this optimum checking Geographical Weighted Regression coefficient sets corresponding to sampled point soil organic matter predictive value, and the correlation matrix of this Geographical Weighted Regression coefficient sets, at least one collinearity diagnostics instrument is adopted to be analyzed, judge to obtain the local regression synteny problem condition of homing method and independent variable source data set corresponding to this optimum checking sampled point soil organic matter predictive value, specifically include following process:
The correlation matrix of independent variable source data set corresponding to sampled point soil organic matter predictive value is verified first against this optimum, and the correlation matrix of the Geographical Weighted Regression coefficient sets that the checking of this optimum is corresponding to sampled point soil organic matter predictive value, contrast one by one, if the correlation matrix of the Geographical Weighted Regression coefficient sets of this correspondence existing more than 50% relative index less than the relative index of same location in the correlation matrix of this correspondence independent variable source data set, the local regression synteny problem not especially severe of homing method and independent variable source data set corresponding to this optimum checking sampled point soil organic matter predictive value is then described;
Obtain the variance inflation factor matrix that independent variable source data set corresponding to this optimum checking sampled point soil organic matter predictive value is corresponding, constitute local variance expansion factor matrix;Obtain simultaneously all sampled points in described soil sampling district to should the variance inflation factor of each variable kind in independent variable source data set, constitute overall situation variance inflation factor set, and make following to judge:
If the average of every string is less than 10 in local variance expansion factor matrix, then illustrate that corresponding to this optimum checking sampled point soil organic matter predictive value, the local regression synteny problem of homing method and independent variable source data set is not serious;
If the average of every string is more than 10 in local variance expansion factor matrix, and in overall situation variance inflation factor set to the variance inflation factor of dependent variable kind also greater than 10, then illustrate that this optimum verifies that corresponding to sampled point soil organic matter predictive value, the local regression synteny problem of homing method and independent variable source data set is not serious;
Otherwise illustrate that corresponding to this optimum checking sampled point soil organic matter predictive value, homing method and independent variable source data set exist potential synteny problem;
The variance inflation factor value independent variable more than 10 in independent variable source data set corresponding to sampled point soil organic matter predictive value is verified for this optimum, and the variance inflation factor value independent variable more than 10 in regression coefficient set corresponding to this optimum checking sampled point soil organic matter predictive value, carry out Discrete point analysis between two respectively, if the scatterplot of independent variable is more irregular than the scatterplot of independent variable in corresponding independent variable source data set in corresponding regression coefficient set, then illustrate that corresponding to this optimum checking sampled point soil organic matter predictive value, the synteny problem of homing method and independent variable source data set can be ignored;Otherwise, then illustrate that corresponding to this optimum checking sampled point soil organic matter predictive value, homing method and independent variable source data set exist comparatively serious synteny problem.
CN201510154714.2A 2015-04-02 2015-04-02 A kind of soil organic matter Forecasting Methodology based on Geographical Weighted Regression Active CN104764868B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510154714.2A CN104764868B (en) 2015-04-02 2015-04-02 A kind of soil organic matter Forecasting Methodology based on Geographical Weighted Regression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510154714.2A CN104764868B (en) 2015-04-02 2015-04-02 A kind of soil organic matter Forecasting Methodology based on Geographical Weighted Regression

Publications (2)

Publication Number Publication Date
CN104764868A CN104764868A (en) 2015-07-08
CN104764868B true CN104764868B (en) 2016-07-06

Family

ID=53646829

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510154714.2A Active CN104764868B (en) 2015-04-02 2015-04-02 A kind of soil organic matter Forecasting Methodology based on Geographical Weighted Regression

Country Status (1)

Country Link
CN (1) CN104764868B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243435B (en) * 2015-09-15 2018-10-26 中国科学院南京土壤研究所 A kind of soil moisture content prediction technique based on deep learning cellular Automation Model
CN105809321A (en) * 2016-01-26 2016-07-27 南京信息工程大学 Quality control method of temperature data acquired by ground meteorological observation station
CN105675245B (en) * 2016-02-15 2020-03-03 浙江浙能技术研究院有限公司 High-precision Kriging test method for predicting flow field distribution based on measured values
CN105699624B (en) * 2016-03-07 2017-07-07 中国科学院南京土壤研究所 A kind of Soil Carbon Stock evaluation method based on soil genetic horizon thickness prediction
CN106372277B (en) * 2016-05-13 2021-12-28 新疆农业大学 Method for optimizing variation function model in forest land index space-time estimation
CN106227965B (en) * 2016-07-29 2020-01-10 武汉大学 Soil organic carbon space sampling network design method considering non-stationary characteristics of space-time distribution
CN106980603B (en) * 2017-02-23 2019-05-17 中国科学院南京土壤研究所 Soil sulphur element content prediction method based on soil types merger and multiple regression
CN107169653A (en) * 2017-05-12 2017-09-15 江苏警官学院 The method that land used for urban and rural construction projects extending space detail characteristic is analyzed based on GWR
CN109541172B (en) * 2018-10-25 2019-12-17 北京农业信息技术研究中心 Soil attribute value calculation method and device
CN110321528B (en) * 2019-07-11 2022-11-11 生态环境部南京环境科学研究所 Hyperspectral image soil heavy metal concentration assessment method based on semi-supervised geospatial regression analysis
CN117271968B (en) * 2023-11-22 2024-02-23 中国农业科学院农业环境与可持续发展研究所 Accounting method and system for carbon sequestration amount of soil

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A geographically weighted regression kriging approach for mapping soil organic carbon stock;Sandeep Kumar 等;《Geoderma》;20120902;第189-190卷;第627-634页 *
地理加权回归及其在土壤和环境科学上的应用前景;瞿明凯 等;《土壤》;20140228;第46卷(第1期);第15-22页 *
基于回归克里格和遥感的紫色土区土壤有机质含量空间预测;代富强 等;《土壤通报》;20140630;第45卷(第3期);第562-567页 *
基于局域统计量的黑龙江省多尺度森林碳储量空间分布变化;刘畅 等;《应用生态学报》;20140930;第25卷(第9期);第2493-2500页 *

Also Published As

Publication number Publication date
CN104764868A (en) 2015-07-08

Similar Documents

Publication Publication Date Title
CN104764868B (en) A kind of soil organic matter Forecasting Methodology based on Geographical Weighted Regression
Ali et al. Review of urban building energy modeling (UBEM) approaches, methods and tools using qualitative and quantitative analysis
Husak et al. Use of the gamma distribution to represent monthly rainfall in Africa for drought monitoring applications
CN111665575B (en) Medium-and-long-term rainfall grading coupling forecasting method and system based on statistical power
Cherkassky et al. Computational intelligence in earth sciences and environmental applications: Issues and challenges
Solaiman et al. Development of probability based intensity-duration-frequency curves under climate change
CN105740991A (en) Climate change prediction method and system for fitting various climate modes based on modified BP neural network
CN101853290A (en) Meteorological service performance evaluation method based on geographical information system (GIS)
Fu et al. Modelling runoff with statistically downscaled daily site, gridded and catchment rainfall series
CN113704693B (en) High-precision effective wave height data estimation method
Lin et al. Geostatistical approaches and optimal additional sampling schemes for spatial patterns and future sampling of bird diversity
Srivastava et al. A unified approach to evaluating precipitation frequency estimates with uncertainty quantification: Application to Florida and California watersheds
CN115544889A (en) Numerical mode precipitation deviation correction method based on deep learning
CN116680658A (en) Heat wave monitoring station site selection method and system based on risk evaluation
Serrano‐Notivoli et al. From rain to data: A review of the creation of monthly and daily station‐based gridded precipitation datasets
CN113011455B (en) Air quality prediction SVM model construction method
CN113902580A (en) Historical farmland distribution reconstruction method based on random forest model
Carnevale et al. An integrated air quality forecast system for a metropolitan area
Zheng et al. Application of a Large‐Scale Terrain‐Analysis‐Based Flood Mapping System to Hurricane Harvey
CN117114176A (en) Land utilization change prediction method and system based on data analysis and machine learning
Rochester Uncertainty in hydrological modelling: a case study in the Tern catchment, Shropshire, UK
Renschler et al. Implementing a process-based decision support tool for natural resource management-the GeoWEPP example
Liu et al. Uncertainty quantification of machine learning models to improve streamflow prediction under changing climate and environmental conditions
Assis et al. A model-based site selection approach associated with regional frequency analysis for modeling extreme rainfall depths in Minas Gerais state, Southeast Brazil
Liu et al. The June 2012 North American derecho: A testbed for evaluating regional and global climate modeling systems at cloud‐resolving scales

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant