CN106980603B

CN106980603B - Soil sulphur element content prediction method based on soil types merger and multiple regression

Info

Publication number: CN106980603B
Application number: CN201710099018.5A
Authority: CN
Inventors: 宋效东; 元野; 吴华勇; 刘峰; 杨金玲; 张甘霖; 李德成; 赵玉国
Original assignee: Institute of Soil Science of CAS
Current assignee: Institute of Soil Science of CAS
Priority date: 2017-02-23
Filing date: 2017-02-23
Publication date: 2019-05-17
Anticipated expiration: 2037-02-23
Also published as: CN106980603A

Abstract

The present invention relates to the soil sulphur element content prediction methods based on soil types merger and multiple regression, it is related to soil trace element in soil type, the processing of dividing and ruling of the different spaces variation features shown, its special heterogeneity can be detected by analyzing the dispersion degree of the effective manganese content spatial distribution of soil, and can go out the Problems of Multiple Synteny in local regression analysis by local regression analyzing and diagnosing；Especially during spatial prediction, Comprehensive Model is constructed by estimating the effective manganese content spatial distribution characteristic of soil under different Soil Sample density.

Description

Soil sulphur element content prediction method based on soil types merger and multiple regression

Technical field

The present invention relates to the soil sulphur element content prediction methods based on soil types merger and multiple regression, belong to soil attribute Electric powder prediction.

Background technique

As microelement necessary to plant production, manganese (Mn) can directly participate in the photosynthesis of plant growth, make It is sprouted for activator, the promotion seed of a variety of enzymes, determination study has highly important agrology, biological significance.Soil Whether microelement is deficient, is generally not to calculate the total content of its different shape, but depend on its bio-available Zn concentration.Manganese ion Main forms be Mn²⁺、Mn³⁺、Mn⁴⁺, reservation mode in the soil mainly have organic manganese, water-soluble state manganese, replacement state The modes such as manganese, mineral state manganese and the inorganic salts containing manganese.Domestic and foreign scholars generally define exchange state, water-soluble state and labile reduction manganese For available state manganese (Soil Available Mn).Available state manganese has influence the most direct, content height to plant growth Directly determine soil for manganese intensity.The measuring method of the effective manganese content of soil generallys use DTPA extraction agent method.Soil The content of microelement is mainly determined by soil parent material and soil-forming process.The soil manganese content of different shape is in nature Under remain the state of balance a kind of, main influence factor include the soil moisture, soil moisture, soil organic matter content, Soil redox potential and pH etc..The effective manganese content of soil on China's red soil, yellow earth is higher, and northern calacareous soil relatively lacks Manganese.

Soil investigation and drawing are the basic means of production area pedological map, Pedo-transfer functions, environment at home and abroad Planning, plant nutrient analysis aspect play a significant role.In the late three decades, conventional soil is investigated, especially second of China Whole nation generaI investigation, achieves more detailed soil resources inventory inventory in China, has accumulated valuable soil data, produces one The soil trace element spatial distribution map of series.However, making rapid progress due to present farming technique, microelement contents of soil Also fast-changing space-time situation is presented in recent years, history soil attribute figure has been difficult to meet real-time agricultural projects demand. In terms of the effective manganese of soil (Soil Available Mn) content prediction, the sky of the more concern effective manganese content of soil of domestic scholars Between the macroscopic analyses method such as distribution, Characteristics of spatial variability, influence factor.On survey region scale, it is concentrated mainly on field ruler Degree, small watershed scale, studied on the biggish regional scale of area it is less, trace it to its cause be the effective manganese content of soil space it is different Matter is higher, is difficult to adopt a kind of prediction model systematically to predict the manganese content spatial distribution of all areas.

Soil types, which has microelement contents of soil, to be significantly affected.Layer is specifically occurring for the effective manganese content of soil Inside often with the feature of homogeneous, and soil types is divided according to generation channel type and diagnostic feature.By difference Soil-forming process influences, and the effective manganese content spatial distribution of soil has the special heterogeneity of height, and classical Fisher statistical theory exists Exist in terms of studying soil property spatial varying law obvious insufficient.Geostatistical lays particular emphasis on the analysis of area variable space structure With simulation, the spatial spreading degree of regionalized variable is characterized using variation function and variation curve.However, the reason of Geostatistical Assume that (regionalized variable covariance exist and identical) shortcoming considers in practical applications by basic second-order stationary.Multiple linear returns Return model to be often used as the fundamental analysis tool of relationship between research soil property and landscape property, is based on Soil-landscape It is most widely used in the soil attribute prediction of model, however this method is determined by global exploratory analysis or correlation analysis Relationship between soil attribute and landscape attribute has certain limitation.In view of the correlation of spatial prediction and geographical location, According to First Law of Geography, neighbouring between geographical location promotes data to have different spatial coherences, therefore local regression Model (Geographical Weighted Regression Model) is obtaining remarkable effect in the latest 20 years.It is non-that this method is capable of handling space in regression analysis Steady phenomenon.But environmental variance is processed into dummy variable, ignored by this method when handling discrete variable (such as soil types) The space distribution rule that soil effective manganese content is shown in soil type.

Statistical model (linear regression model (LRM), Geostatistical model etc.) copes with soil information requirements for high precision, passes through To the quantification of Soil-landscape model, the prediction of soil attribute spatial reasoning is realized.Since the effective manganese content of soil is in different soils The potential variation trend shown in type, Classical forecast model often have ignored the particularity of this variable of soil types, from And the effective manganese content of soil is caused to face problems in specific spatial prediction, being summed up mainly has:

(1) working method of the conventional soil drawing based on " field investigation → interior interpretation → field check → demarcation is at figure " Fast-developing precision agriculture, environmental management, land management demand can not have been adapted to, and different application field is to the micro member of soil Precision, the timeliness of the spatial distribution prediction of element propose requirements at the higher level.A series of the universal of geographical information management technique such as 3S are compeled Be essential the spatial analysis for wanting a kind of facing area rank, prediction technique.

(2) in terms of the classification of soils, the order of soil, subclass, great soil group, subclass, soil are contained to Soil Taxonomy cascade Belong to and six ranks of the soil series.Soil type rank corresponds to the effective manganese content space distribution rule of different soil, by these Complicated rule fusion can inevitably cause to advise on the soil attribute spatial distribution map (raster data) of a rule-based grid Conflict between then.To find out its cause, in order to minimize models fitting error, regularization function is multiple in model training regularisation parameter Miscellaneous degree is generally monotonically increasing function, and all more rules cause model complexity to steeply rise.And if to each rank soil class Type provides individual regular data, then will limit the use of model.How the rank of quantification soil types is to soil cartography shadow It rings, is the important difficult point for promoting quantitative model generalization ability, restricting statistical model application.

(3) different sampling densities often reflects the Different Variation rule of the effective manganese content of soil.Optional sampling density Determine the accuracy for seriously reducing soil quality assessment work.Theoretically, high sampling density can promote precision of prediction, but excessively high Sampling density will lead to the rapid growth of human and material resources expense.Comprehensive measurement difference is adopted when minimizing objective function The feature of sample density, though smaller training error can be obtained, when the effective manganese content of prediction different sampling soil, mould Type only considers the feature extraction automatically selected, the generalization ability of restricted model.Limit of the different sampling in spatial prediction model System etc. causes prediction result often to there is uncertainty, which passes through the processes such as subsequent modeling, analysis and decision It is transmitted and profound impact is generated to final result.

(4) it is the prerequisite for determining to use prediction model that analysis space data space is non-stationary.Conventional method be by The linear function of geospatial location is integrated to conventional linear regression model, estimates spreading parameter, mould using least square method The parameter of type changes with spatial displacement.However the method that this method still falls within trend fitting, when parameter becomes It is still subject to certain restrictions when changing complex.It is suitable how to be used according to the spatial stationarity property detection event under different sampling Prediction technique be still soil attribute spatial prediction research challenge.

In conclusion being equally present in other analysis application aspects of the effective manganese of soil for the above-mentioned deficiency analyzed.

Summary of the invention

Technical problem to be solved by the invention is to provide the soil manganese contents based on soil types merger and multiple regression Prediction technique, under the conditions of capable of systematically covering different soils sampling density, the Analysis on spatial variability of the effective manganese content of soil, Soil types merger and second order stationarity test three big key techniques, effectively solve to have in existing multiple regression analysis The select permeability of body Model.

In order to solve the above-mentioned technical problem the present invention uses following technical scheme: the present invention is devised to be returned based on soil types And the soil sulphur element content prediction method with multiple regression, include the following steps:

Step 1. obtains in soil region to be predicted that each sampled point is corresponding respectively to specify each soil information to refer to respectively Mark and corresponding soil manganese content, and total data set is constructed, meanwhile, it is independent variable that each soil information index is specified in definition, Soil manganese content is dependent variable；It is then based on different soils sampling density, validation data set and at least two is constructed by total data set A training dataset；

Step 2. is directed to each training dataset respectively, is based on merger soil types, obtains training dataset and is based on merger The soil sulphur element content coefficient of variation data set of soil types rank, and construct corresponding all training datas and concentrate all collection points Soil manganese content coefficient of variation histogram；Meanwhile it constructing all training datasets and corresponding respectively to soil type rank Principal component analysis scatter plot；It is then based on soil manganese content coefficient of variation histogram and principal component analysis scatter plot, determines soil The conspicuousness that type influences different sampling soil attribute Spatial Variability；

Step 3. obtains the corresponding optimal independent variable set of each training dataset difference respectively, and obtains each training Data set respectively corresponds the riding quality of its all independent variable, judges whether model constructed by independent variable and dependent variable meets two Rank stationarity；

Step 4. assumes test result, base to soil attribute spatial variability influence degree and second-order stationary according to soil types In each training dataset, prediction model set is selected；

Step 5. is trained using prediction model set for one of training dataset, and optimum prediction mould is selected Type is predicted for soil region to be predicted, obtains soil region soil manganese content spatial distribution map to be predicted.

As a preferred technical solution of the present invention, the step 1 includes the following steps:

Step 1a. obtains the projection coordinate collection Site of each sampled point in soil region to be predicted, and obtains neighbouring sample Then Euclidean distance between point is adjusted update for the projection coordinate of sampled point in soil region to be predicted, so that phase Euclidean distance between adjacent sampled point is not less than d, and d indicates that soil region to be predicted specifies the space point of each soil information index Resolution；

The corresponding soil information index of step 1b. preset time, weather, matrix, landform, biological top 5 factor difference is total With being obtained respectively as each soil information index is specified, each sampled point in soil region to be predicted is corresponding specified each respectively A soil information index constitutes Pred={ X as independent variable₁,…,X_k,…,X_K, k={ 1 ..., K }, K expression are specified each The species number of soil information index, vector X_kIndicate that each sampled point respectively corresponds kth kind and soil information index is specified to be constituted Vector, be the vector of n × 1, n indicate sampled point sum；

Step 1c. obtains the soil types rank that each sampled point difference is corresponding in soil region to be predicted, and by each The corresponding sampled point of soil types rank difference, constitutes Soil_T=(Type₁,…,Type_m,…,Type_M), Soil_T n × M vector, m={ 1 ..., M }, Type_mIndicate collection point corresponding to m kind soil types rank, M indicates soil types grade Other type；

Step 1d. is using soil manganese content as dependent variable, according to Pred, the collection of each sampled point position soil manganese content S, Site, Soil_T are closed, is constructed total data set Data=(Pred, S, Site, Soil_T)；Meanwhile according to soil area to be predicted The area in domain obtains sampling density Density corresponding to sampled point；

Step 1e. is using sampled point as extracting object, by total data set Data, any extract accounts for the default ratio of whole sampled points Data corresponding to each sampled point of number of cases amount, constitute validation data set, and remaining sampled point constitutes trained sampling point set to be selected It closes；

Step 1f. corresponds to trained sampling point set to be selected using sampled point as extracting object, by extracting in total data set Data Data corresponding to sampled point in conjunction construct at least two training datasets, also, corresponding to one of training dataset Sampled point is whole sampled points in trained sampled point set to be selected, and sampled point corresponding to remaining training dataset is instruction to be selected Practice the fractional-sample point in sampled point set.

As a preferred technical solution of the present invention, in the step 1a, for all phases in soil region to be predicted Adjacent sampled point is handled as follows, and realizes that the adjustment for sampled point projection coordinate in soil region to be predicted updates, so that phase Euclidean distance between adjacent sampled point is not less than d:

If the Euclidean distance of sampled point p1 and p2 are less than d, using sampled point p1 as the center of circle, d is the sampled point in radius Number is n1, and using sampled point p2 as the center of circle, d is that the number of sampling points in radius is n2, if n1 < n2, adjusts the throwing of p1 Shadow coordinate makes the Euclidean distance d+g between sampled point p1 and sampled point p2；If n1 >=n2, the projection coordinate of p2 is adjusted, is made Euclidean distance between sampled point p1 and sampled point p2 is d+g；Wherein, d indicates that soil region to be predicted specifies each soil to believe The spatial resolution of index is ceased, g indicates default adjustment distance.

As a preferred technical solution of the present invention, in the step 1b, it is directed to using Z-score standardized method Pred is standardized, and makes its data fit standardized normal distribution.

As a preferred technical solution of the present invention, in the step 1f, using sampled point as extracting object, by total data Collect to extract in Data and correspond to data corresponding to sampled point in trained sampled point set to be selected, constructs four training datasets, Wherein, it is respectively r, 50% × r, 25% × r, 12% × r that each training data, which concentrates vector line number, wherein r indicates instruction to be selected Practice the number of collection point corresponding to sampled point set.

As a preferred technical solution of the present invention, the step 2 includes the following steps:

Step 2a. is directed to each training dataset respectively, according to the following formula, it is right to calculate separately acquisition training dataset institute Answer the coefficient of variation CV of soil manganese content under each soil types rank_m；

Then under each soil types rank as corresponding to the training dataset soil manganese content coefficient of variation CV_m, building should Soil sulphur element content coefficient of variation data set of the training dataset based on soil types rank obtains each training dataset difference Soil sulphur element content coefficient of variation data set CV_Soil_T based on soil types rank；Wherein, SD_mWith Mean_mFor m kind soil The standard deviation and average value of soil manganese content under type level；

Step 2b. is directed to each training dataset respectively, for the soil of each soil types rank corresponding to training dataset Earth data are respectively adopted Duncan method and carry out multiple groups inter-sample difference significance analysis, obtain corresponding to the training dataset Significance analysis is as a result, obtain the corresponding Duncan analysis result Dun_S of each training dataset difference；

Step 2c. is directed to each training dataset respectively, and the Duncan according to corresponding to training dataset analyzes result Soil type rank corresponding to Dun_S and the training dataset, for each soil corresponding to the training dataset Earth type carries out merger processing, and calculates the change for updating soil manganese content under each soil types rank corresponding to the training dataset Different coefficient constructs soil sulphur element content coefficient of variation data set of the training dataset based on merger soil types rank, that is, obtains Each training dataset is based respectively on the soil sulphur element content coefficient of variation data set CV_Soil_T_ of merger soil types rank Dun；

Step 2d. is based respectively on the soil sulphur element content coefficient of variation of merger soil types rank according to each training dataset Data set CV_Soil_T_Dun, the corresponding all training datas of building concentrate the soil manganese content coefficient of variation of all collection points straight Fang Tu；

Step 2e. uses Principal Component Analysis, concentrates each collection point affiliated soil types respectively according to each training data Rank, for all training datasets, building corresponds respectively to the principal component analysis scatter plot of soil type rank；

Step 2f. is based on soil manganese content coefficient of variation histogram and principal component analysis scatter plot, determines soil types pair The conspicuousness that different sampling soil attribute Spatial Variability influences.

As a preferred technical solution of the present invention, the step 3 includes the following steps:

Step 3a. obtains optimal independent variable set corresponding to each training dataset by stepwise regression method respectively Pred_OLS；

Step 3b. is directed to each training dataset respectively, the optimal independent variable set according to corresponding to training dataset Pred_OLS carries out models fitting to training dataset using Geographically weighted regression procedure, obtains corresponding to the training dataset The regression coefficient collection Coeff of collection point；

Step 3c. is directed to each training dataset respectively, according to corresponding to training dataset and the training dataset Optimal independent variable set Pred_OLS calculates the standard deviation for obtaining Multivariable regressive analysis model corresponding to the training dataset, i.e., Obtain the standard deviation STD of each corresponding Multivariable regressive analysis model of training dataset difference_MLR；

Step 3d. is directed to each training dataset respectively, according to the following formula:

It calculates and obtains the riding quality SI that training dataset corresponds to all independents variable, that is, obtain each training dataset point Do not correspond to the riding quality SI of its all independent variable, in formula, INTQ_GWRIndicate that collection point corresponding to training dataset returns system Several interquartile-range IQRs；

Step 3e., which is calculated, obtains the mean value that each training dataset respectively corresponds its all independent variable riding quality SI Average_SI；

Step 3f. compared with 1, judges whether model constructed by independent variable and dependent variable meets two according to Average_SI Rank stationarity.

As a preferred technical solution of the present invention, the step 4 includes the following steps:

Step 4a. is directed to each training dataset respectively, and using Geostatistics analysis method, building obtains training dataset Optimal semivariable function model, and calculate its space effective distance and determine coefficient with model；

If the model of the optimal semivariable function model of all training datasets of step 4b. determines that coefficient is all larger than 0.5, and block Gold number and the ratio of base station value are respectively less than 25%, then all training datasets use geo-statistic model, and enter step 4c；Otherwise All training datasets do not use geo-statistic model, and enter step 4d；

Step 4c. influences soil attribute spatial variability according to step 2, the conclusion of step 3 and step 4b, i.e. soil types Degree and second-order stationary assume test result, by following selection prediction model:

If soil types soil attribute spatial variability is influenced it is not significant and meet second order stationarity it is assumed that if selection it is more Member returns and geo-statistic constitutes prediction model set；

If soil types soil attribute spatial variability is influenced it is not significant, be unsatisfactory for second order stationarity it is assumed that if selection it is more Member returns and Geographical Weighted Regression constitutes prediction model set；

If soil types soil attribute spatial variability is influenced it is significant, meet second order stationarity it is assumed that if select polynary time Return and constitutes prediction model set with subregion geo-statistic model；

If soil types soil attribute spatial variability is influenced it is significant, be unsatisfactory for second order stationarity it is assumed that if select it is polynary It returns, Geographical Weighted Regression and subregion Geographical Weighted Regression constitute prediction model set；

Step 4d. is according to the conclusion namely soil types of step 2, step 3 and step 4b to soil attribute spatial variability shadow The degree of sound and second-order stationary assume test result, by following selection prediction model:

If soil types soil attribute spatial variability is influenced it is not significant, meet second order stationarity it is assumed that if select it is polynary It returns, Regression-kriging constitutes prediction model set；

If soil types soil attribute spatial variability is influenced it is significant, meet second order stationarity it is assumed that if select polynary time Return and constitutes prediction model set；

If soil types soil attribute spatial variability is influenced it is significant, be unsatisfactory for second order stationarity it is assumed that if select it is polynary It returns, Geographical Weighted Regression and subregion Geographical Weighted Regression constitute prediction model set.

As a preferred technical solution of the present invention, the step 5 includes the following steps:

Step 5a. is the training dataset of sampled points whole in trained sampled point set to be selected based on sampled point, for step Model in rapid 4 obtained prediction model set is trained, and obtains trained prediction model set；

Step 5b. concentrates each collection point for verify data using the model in trained prediction model set Soil manganese content is predicted, and according to validation data set, calculates the root mean square mistake for obtaining model prediction in prediction model set Difference；

Step 5c. repeats step 5b preset times, calculates the root-mean-square error mean value of each cross validation, and select minimum Model corresponding to root-mean-square error mean value constitutes optimum prediction model set as optimum prediction model；

Step 5d. according to step 3a obtain each training dataset corresponding to optimal independent variable set Pred_OLS, obtain Obtain argument data corresponding to non-acquired point in soil region to be predicted；

Step 5e. uses obtained optimum prediction model set and the be obtained from variable data of step 5d in step 5c, for The soil manganese content of non-acquired point is predicted in soil region to be predicted, realizes that the soil sulphur element for soil region to be predicted contains Amount prediction, obtains soil region soil manganese content spatial distribution map to be predicted.

As a preferred technical solution of the present invention, in the step 5a, if the obtained prediction model set of the step 4 In include subregion Geographical Weighted Regression or subregion geo-statistic model, then for subregion Geographical Weighted Regression training method it is as follows:

Using the region of the soil types of merger as individual survey region, by the subregion as follows based on soil types Manage Weight Regression Model:

Carry out the training of partial model, wherein (u_i,v_i) be sampling point i coordinate, β₀(u_i,v_i), β_km(u_i,v_i) and ε_iPoint Constant term, local regression coefficient and the prediction deviation in local regression are not represented, and m indicates that soil types grade variable, M indicate soil Earth type level sum, x_ikmIndicate that the corresponding kth kind of sampling point i specifies soil information index under m kind soil types rank.

Soil sulphur element content prediction method of the present invention based on soil types merger and multiple regression uses the above technology Scheme compared with prior art, has following technical effect that

(1) the soil sulphur element content prediction method based on soil types merger and multiple regression that the present invention designs, Neng Gougeng Add the Spatial Differentiation of accurately analog variable, meanwhile, soil-scape can preferably be embodied by soil types merger operation Minimal geographical unit in model is seen, promotes the generalization ability of prediction model, and then model is prevented excessively to be fitted training data.Not Rapidly be minimized according to the correlation of independent variable and soil attribute under conditions of soil types data update and train Error provides computational efficiency more higher than traditional Multiple Regression Analysis Method and precision of prediction；

(2) it in the soil sulphur element content prediction method based on soil types merger and multiple regression that the present invention designs, proposes Integration test second order stationarity assumes mechanism, can integrate using a variety of existing test methods, global more by comparative analysis The model coefficient of first linear regression and local weighted recurrence detects optimal characteristics choosing of the objective attribute target attribute on different spaces scale Select, lift scheme it is explanatory, determine optimum prediction model, avoid blindly using prediction brought by Individual forecast model Error.This method also avoids the limitation that measuring technology is assumed using a kind of second order stationarity, and integrated test machine fixture has more High universality and stability has wide industrial applications prospect；

(3) in the soil sulphur element content prediction method based on soil types merger and multiple regression that the present invention designs, pass through The special heterogeneity for increasing the mode comprehensive analysis soil attribute of sampling density, more can accurately measure high sampling density condition Latent space variation law present in lower soil attribute, and then in the case where precision of prediction does not reduce, it is expected in certain journey Sampling point quantity is reduced on degree, and then reduces the cost of the following region class sampling.

Detailed description of the invention

Fig. 1 is the broad flow diagram of dependent variable and the building of argument data collection；

Fig. 2 a be neighbouring sample point two-by-two between Euclidean distance be less than argument data figure layer spatial resolution schematic diagram；

Fig. 2 b is schematic diagram after sampling point position micro-shifting

Fig. 3 is comprehensive point of the Spatial Variability based on soil attribute under statistical method and principal component method soil type Analyse flow chart；

Fig. 4 is that the second order stationarity of training dataset assumes test flow chart；

Fig. 5 is the flow chart for selecting optimum prediction model and predicting soil attribute spatial distribution；

Fig. 6 be to after soil type, soil types merger in the embodiment of the present invention, the coefficient of variation of sampling density it is straight Fang Tu；

Fig. 7 a is the effective manganese content spatial distribution map of soil predicted in the embodiment of the present invention using multiple regression procedure；

Fig. 7 b is the effective manganese content space of soil predicted in the embodiment of the present invention using subregion Geographically weighted regression procedure Distribution map.

Specific embodiment

Specific embodiments of the present invention will be described in further detail with reference to the accompanying drawings of the specification.

The invention belongs to the analysis methods towards soil attribute spatial prediction in metering agrology, are related to soil trace element The processing of dividing and ruling of the different spaces variation features shown in soil type, can be by analyzing the effective manganese content of soil The dispersion degree of spatial distribution detects its special heterogeneity, and can pass through local regression analyzing and diagnosing and go out local regression analysis In Problems of Multiple Synteny.It is effective by estimating soil under different Soil Sample density especially during spatial prediction Manganese content spatial distribution characteristic constructs Comprehensive Model.

The present invention devises the soil sulphur element content prediction method based on soil types merger and multiple regression, specifically include as Lower step:

Step 1. as shown in Figure 1, obtain in soil region to be predicted respectively each sampled point respectively corresponding to specify it is each Soil information index and corresponding soil manganese content, and total data set is constructed, meanwhile, definition specifies each soil information to refer to It is designated as independent variable, soil manganese content is dependent variable；It is then based on different soils sampling density, verify data is constructed by total data set Collection and at least two training datasets.

Wherein, step 1 includes the following steps:

Step 1a. obtains the projection coordinate collection Site of each sampled point in soil region to be predicted, and obtains neighbouring sample Then Euclidean distance between point is adjusted update for the projection coordinate of sampled point in soil region to be predicted, so that phase Euclidean distance between adjacent sampled point is not less than d, and d indicates that soil region to be predicted specifies the space point of each soil information index Resolution.Wherein, as shown in Figure 2 a and 2 b, it in step 1a, is carried out such as neighbouring sample points all in soil region to be predicted Lower processing realizes that the adjustment for sampled point projection coordinate in soil region to be predicted updates, so that between neighbouring sample point Euclidean distance is not less than d:

The corresponding soil information index of step 1b. preset time, weather, matrix, landform, biological top 5 factor difference is total With being obtained respectively as each soil information index is specified, each sampled point in soil region to be predicted is corresponding specified each respectively A soil information index constitutes Pred={ X as independent variable₁,…,X_k,…,X_K, k={ 1 ..., K }, K expression are specified each The species number of soil information index, vector X_kIndicate that each sampled point respectively corresponds kth kind and soil information index is specified to be constituted Vector, be the vector of n × 1, n indicates the sum of sampled point, then, is marked using Z-score standardized method for Pred Standardization makes its data fit standardized normal distribution.

Step 1c. obtains the soil types rank that each sampled point difference is corresponding in soil region to be predicted, and by each The corresponding sampled point of soil types rank difference, constitutes Soil_T=(Type₁,…,Type_m,…,Type_M), Soil_T n × M vector, m={ 1 ..., M }, Type_mIndicate collection point corresponding to m kind soil types rank, M indicates soil types grade Other type.

Step 1d. is using soil manganese content as dependent variable, according to Pred, the collection of each sampled point position soil manganese content S, Site, Soil_T are closed, is constructed total data set Data=(Pred, S, Site, Soil_T)；Meanwhile according to soil area to be predicted The area in domain obtains sampling density Density corresponding to sampled point.

Step 1e. is using sampled point as extracting object, by total data set Data, any extract accounts for the default ratio of whole sampled points Data corresponding to each sampled point of number of cases amount, constitute validation data set, and remaining sampled point constitutes trained sampling point set to be selected It closes.

Step 1f. corresponds to trained sampling point set to be selected using sampled point as extracting object, by extracting in total data set Data Data corresponding to sampled point in conjunction construct at least two training datasets, also, corresponding to one of training dataset Sampled point is whole sampled points in trained sampled point set to be selected, and sampled point corresponding to remaining training dataset is instruction to be selected Practice the fractional-sample point in sampled point set, above-mentioned steps 1f in practical application, specifically can be designed as with sampled point to extract Object corresponds to data corresponding to sampled point in trained sampled point set to be selected as extracting in total data set Data, building four A training dataset, wherein it is respectively r, 50% × r, 25% × r, 12% × r that each training data, which concentrates vector line number, In, r indicates the number of collection point corresponding to trained sampled point set to be selected.

Step 2. is directed to each training dataset respectively, is based on merger soil types, obtains training dataset and is based on merger The soil sulphur element content coefficient of variation data set of soil types rank, and construct corresponding all training datas and concentrate all collection points Soil manganese content coefficient of variation histogram；Meanwhile it constructing all training datasets and corresponding respectively to soil type rank Principal component analysis scatter plot；It is then based on soil manganese content coefficient of variation histogram and principal component analysis scatter plot, determines soil The conspicuousness that type influences different sampling soil attribute Spatial Variability.

As shown in figure 3, above-mentioned steps 2 include the following steps:

Then under each soil types rank as corresponding to the training dataset soil manganese content coefficient of variation CV_m, building should Soil sulphur element content coefficient of variation data set of the training dataset based on soil types rank obtains each training dataset difference Soil sulphur element content coefficient of variation data set CV_Soil_T based on soil types rank；Wherein, SD_mWith Mean_mFor m kind soil The standard deviation and average value of soil manganese content under type level.

Step 2b. is directed to each training dataset respectively, for the soil of each soil types rank corresponding to training dataset Earth data are respectively adopted Duncan method and carry out multiple groups inter-sample difference significance analysis, obtain corresponding to the training dataset Significance analysis is as a result, obtain the corresponding Duncan analysis result Dun_S of each training dataset difference.

Step 2c. is directed to each training dataset respectively, and the Duncan according to corresponding to training dataset analyzes result Soil type rank corresponding to Dun_S and the training dataset, for each soil corresponding to the training dataset Earth type carries out merger processing, and calculates the change for updating soil manganese content under each soil types rank corresponding to the training dataset Different coefficient constructs soil sulphur element content coefficient of variation data set of the training dataset based on merger soil types rank, that is, obtains Each training dataset is based respectively on the soil sulphur element content coefficient of variation data set CV_Soil_T_ of merger soil types rank Dun。

Step 2d. is based respectively on the soil sulphur element content coefficient of variation of merger soil types rank according to each training dataset Data set CV_Soil_T_Dun, the corresponding all training datas of building concentrate the soil manganese content coefficient of variation of all collection points straight Fang Tu.

Step 2e. uses Principal Component Analysis, concentrates each collection point affiliated soil types respectively according to each training data Rank, for all training datasets, building corresponds respectively to the principal component analysis scatter plot of soil type rank.

Step 3. obtains the corresponding optimal independent variable set of each training dataset difference respectively, and obtains each training Data set respectively corresponds the riding quality of its all independent variable, judges whether model constructed by independent variable and dependent variable meets two Rank stationarity.

As shown in figure 4, above-mentioned steps 3 include the following steps:

Step 3a. obtains optimal independent variable set corresponding to each training dataset by stepwise regression method respectively Pred_OLS。

Step 3b. is directed to each training dataset respectively, the optimal independent variable set according to corresponding to training dataset Pred_OLS carries out models fitting to training dataset using Geographically weighted regression procedure, obtains corresponding to the training dataset The regression coefficient collection Coeff of collection point.

Step 3c. is directed to each training dataset respectively, according to corresponding to training dataset and the training dataset Optimal independent variable set Pred_OLS calculates the standard deviation for obtaining Multivariable regressive analysis model corresponding to the training dataset, i.e., Obtain the standard deviation STD of each corresponding Multivariable regressive analysis model of training dataset difference_MLR。

It calculates and obtains the riding quality SI that training dataset corresponds to all independents variable, that is, obtain each training dataset point Do not correspond to the riding quality SI of its all independent variable, in formula, INTQ_GWRIndicate that collection point corresponding to training dataset returns system Several interquartile-range IQRs.

Step 3e., which is calculated, obtains the mean value that each training dataset respectively corresponds its all independent variable riding quality SI Average_SI。

Step 4. assumes test result, base to soil attribute spatial variability influence degree and second-order stationary according to soil types In each training dataset, prediction model set is selected；Specifically comprise the following steps:

Step 4a. is directed to each training dataset respectively, and using Geostatistics analysis method, building obtains training dataset Optimal semivariable function model, and calculate its space effective distance and determine coefficient with model.

If the model of the optimal semivariable function model of all training datasets of step 4b. determines that coefficient is all larger than 0.5, and block Gold number and the ratio of base station value are respectively less than 25%, then all training datasets use geo-statistic model, and enter step 4c；Otherwise All training datasets do not use geo-statistic model, and enter step 4d.

Step 5. is trained using prediction model set for one of training dataset, and optimum prediction mould is selected Type is predicted for soil region to be predicted, soil region soil manganese content spatial distribution map to be predicted is obtained, such as Fig. 5 institute Show, specifically comprises the following steps:

Step 5a. is the training dataset of sampled points whole in trained sampled point set to be selected based on sampled point, for step Model in rapid 4 obtained prediction model set is trained, and obtains trained prediction model set.Wherein, if the step Include subregion Geographical Weighted Regression or subregion geo-statistic model in rapid 4 obtained prediction model set, is then weighted for subregion geography The training method of recurrence is as follows:

Step 5b. concentrates each collection point for verify data using the model in trained prediction model set Soil manganese content is predicted, and according to validation data set, calculates the root mean square mistake for obtaining model prediction in prediction model set Difference.

Step 5c. repeats step 5b preset times, calculates the root-mean-square error mean value of each cross validation, and select minimum Model corresponding to root-mean-square error mean value constitutes optimum prediction model set as optimum prediction model.

Step 5d. according to step 3a obtain each training dataset corresponding to optimal independent variable set Pred_OLS, obtain Obtain argument data corresponding to non-acquired point in soil region to be predicted.

Next, by the soil sulphur element content prediction method designed by the present invention based on soil types merger and multiple regression, It is applied in the effective manganese content prediction of Bozhou City soil, specific as follows:

Different from soil a great number of elements, the enrichment of soil manganese element and shortage can all cause crop growth to be obstructed.For example, After plant manganese deficiency, it will appear different degrees of minus green spot on spire or old leaf.When plant leaf blade manganese content is greater than 600mg/ When kg, it may appear that manganese poisoning, cause Pressure trunk injection, leaf wither here it is sagging situations such as.Therefore, according to limited soil-like point data Predict that the effective manganese content of soil of non-sampling area is worth with important practical with the independent variable (at native influence factor) easily obtained.

The effective manganese content prediction technique process of soil based on soil types merger and multiple regression are as follows:

The first step, the training dataset and validation data set for constructing different sampling

(1): preparing to influence the geodata figure layer of Zinc fractions and evolution, extract soil sampling dot position information, and will All data are merged into a data and concentrate.

(2): by the different stage information extraction of the soil types of sampled point and update into data set Data=(Pred, S, Site, Soil_T)=(X₁,X₂,…,X_K, S, Long, Lati, Type₁, Type₂..., Type_M)。

(3): Euclidean distance between calculating neighbouring sample point two-by-two updates the coordinate apart from too small sampling point, more new data set Data=(Pred, S, Site, Soil_T).

(1.d): the sampling density of sampling point is calculated, and validation data set and training dataset are constructed according to sampling density.It is false If sampled point number is L, validation data set data volume is 20% × L, constructs 4 training datasets (d1, d2, d3, d4), data Amount is respectively 80% × L, 40% × L, 20% × L, 10% × L.

Second step, the Spatial Variability that soil attribute under soil type is analyzed using statistical method and principal component method

(1): containing six order of soil, subclass, great soil group, subclass, penus and the soil series grades to Soil Taxonomy cascade Not.Here by taking great soil group as an example, training dataset training dataset (d1, d2, d3, d4) is calculated in each soil types region Coefficient of variation CV.The region contains 6 kinds of soil types, is expressed as (s1, s2, s3, s4, s5, s6).

(2): multiple groups inter-sample difference significance analysis, 6 kinds of soil classes being carried out to training dataset d1 using Duncan method Type merger is two major classes, i.e. gs1, gs2.

(3): calculating soil types area after merger training dataset (d1, d2, d3, d4) of different sampling data The coefficient of variation in domain, as shown in Figure 6.

(4): under soil type rank, for each soil types to different sampling training dataset Execute principal component analysis.The histogram and principal component scatter plot that comprehensive analysis step generates, determine soil types to different samplings The conspicuousness that density soils attribute space variability influences.Analyze as the result is shown: the spatial variability of the regional soil attribute is by soil The influence of earth type is significant.

Third step uses the second order stationarity vacation of a variety of homing methods and Geographical Weighted Regression Model test training dataset If

(1): optimal independent variable set is selected by stepwise regression method；

(2): models fitting being carried out to training dataset d1 using Geographically weighted regression procedure, records returning for all sampling points Return coefficient；

(3): calculating the standard deviation of d1 Multivariable regressive analysis model；

(4): calculating the riding quality of all independents variable of d1 data set；

(5) (two)-(four) step is repeated, the riding quality of d1, d2, d3, d4 data set independent variable is calculated, and calculates it Mean value.Its mean value is greater than 1, illustrates to be unsatisfactory for second order stationarity hypothesis.

4th step assumes test result to soil attribute spatial variability influence degree and second-order stationary according to soil types, Select prediction model set

(1): using Geostatistics analysis method, construct the semivariable function model of all training datasets.All trained numbers Be respectively less than 0.5 according to the determination coefficient of the optimal semivariable function model of collection, the space correlation degree of data is lower, be not suitable for using Geo-statistic model, model choice set are combined into (multiple regression, Geographical Weighted Regression)；

(2): in view of the research area the effective manganese content of soil by soil types influenced it is significant, be unsatisfactory for second order stationarity It is assumed that model choice set is combined into (multiple regression, Geographical Weighted Regression, subregion Geographical Weighted Regression)；

5th step is based on Monte-carlo Simulation Method, is trained, is selected for a post to training dataset using the model set of selection The prediction of optimum prediction method is selected to generate destination file

(1): using the prediction technique of above-mentioned steps, the prediction model of training soil attribute and independent variable, and recording each Computational accuracy (use root-mean-square error)；

(2): repeating the above steps 1000 times, take computational accuracy mean value as the index of every kind of method of assessment.Poor verifying The precision of prediction of multiple regression procedure is minimum as the result is shown, the precision of prediction highest of subregion Geographically weighted regression procedure；

(3): the uncertainty occurred for comparative analysis result, using it is worst with optimal prediction model chart it is defeated Out.Using the argument data of non-sampled point in multiple regression procedure, subregion Geographical Weighted Regression Model and survey region, generate Prediction data, and export, respectively as shown in Fig. 7 a, 7b.

Based on the above analysis, the prediction result of example case is as shown in Fig. 7 a, 7b, integration test second order stationarity of the present invention Assuming that mechanism can integrate using a variety of existing test methods, by comparative analysis overall situation multiple linear regression with it is local weighted The model coefficient of recurrence, detects optimal feature selection of the objective attribute target attribute on different spaces scale, lift scheme it is explanatory, really Determine optimum prediction model, avoids blindly using prediction error brought by Individual forecast model.This method has good Feasibility and stability, not only can be towards soil types merger, it is also contemplated that by vegetation pattern, land use pattern etc. pair Pedogenic process has the variable of great influence, and precision test mechanism is expected to obtain ideal precision of prediction.The method Up for being applied in more fields, to examine its performance.

Embodiments of the present invention are explained in detail above in conjunction with attached drawing, but the present invention is not limited to above-mentioned implementations Mode within the knowledge of a person skilled in the art can also be without departing from the purpose of the present invention It makes a variety of changes.

Claims

1. the soil sulphur element content prediction method based on soil types merger and multiple regression, which comprises the steps of:

Step 1. obtain in soil region to be predicted respectively each sampled point respectively corresponding to specify each soil information index, with And corresponding soil manganese content, and total data set is constructed, meanwhile, it is independent variable, soil that each soil information index is specified in definition Manganese content is dependent variable；It is then based on different soils sampling density, validation data set and at least two instructions are constructed by total data set Practice data set；

Step 2. is directed to each training dataset respectively, is based on merger soil types, obtains training dataset and is based on merger soil The soil manganese content coefficient of variation data set of type level, and construct the soil that corresponding all training datas concentrate all collection points Manganese content coefficient of variation histogram；Meanwhile construct all training datasets correspond respectively to soil type rank it is main at Analysis scatter plot；It is then based on soil manganese content coefficient of variation histogram and principal component analysis scatter plot, determines soil types The conspicuousness that different sampling soil attribute Spatial Variability is influenced；

Step 3. obtains the corresponding optimal independent variable set of each training dataset difference respectively, and obtains each training data Collection respectively corresponds the riding quality of its all independent variable, and it is flat to judge whether model constructed by independent variable and dependent variable meets second order Stability；

Step 4. assumes test result to soil attribute spatial variability influence degree and second-order stationary according to soil types, based on each A training dataset selects prediction model set；

Step 5. is trained using prediction model set for one of training dataset, and optimum prediction model, needle are selected Soil region to be predicted is predicted, soil region soil manganese content spatial distribution map to be predicted is obtained.

2. the soil sulphur element content prediction method based on soil types merger and multiple regression according to claim 1, feature It is, the step 1 includes the following steps:

Step 1a. obtain each sampled point in soil region to be predicted projection coordinate collection Site, and obtain neighbouring sample point it Between Euclidean distance, then update is adjusted for the projection coordinate of sampled point in soil region to be predicted, so that adjacent adopt Euclidean distance between sampling point is not less than d, and d indicates that soil region to be predicted specifies the spatial discrimination of each soil information index Rate；

Corresponding soil information index is made jointly respectively for step 1b. preset time, weather, matrix, landform, biological top 5 factor To specify each soil information index, obtain in soil region to be predicted respectively each sampled point respectively corresponding to specify each soil Earth information index constitutes Pred={ X as independent variable₁,…,X_k,…,X_K, each soil is specified in k={ 1 ..., K }, K expression The species number of information index, vector X_kIndicate each sampled point respectively correspond kth kind specify soil information index constituted to Amount, is the vector of n × 1, and n indicates the sum of sampled point；

Step 1c. obtains the soil types rank that each sampled point difference is corresponding in soil region to be predicted, and presses each soil The corresponding sampled point of type level difference, constitutes Soil_T=(Type₁,…,Type_m,…,Type_M), Soil_T is n × M Vector, m={ 1 ..., M }, Type_mIndicate collection point corresponding to m kind soil types rank, M indicates soil types rank Type；

Step 1d. using soil manganese content as dependent variable, according to Pred, each sampled point position soil manganese content set S, Site, Soil_T are constructed total data set Data=(Pred, S, Site, Soil_T)；Meanwhile according to soil region to be predicted Area obtains sampling density Density corresponding to sampled point；

Step 1e. is using sampled point as extracting object, by total data set Data, any extract accounts for whole sampled point preset ratio numbers Data corresponding to each sampled point of amount, constitute validation data set, and remaining sampled point constitutes trained sampled point set to be selected；

Step 1f. is corresponded in trained sampled point set to be selected using sampled point as extracting object by extracting in total data set Data Data corresponding to sampled point construct at least two training datasets, also, sampling corresponding to one of training dataset Point is whole sampled points in trained sampled point set to be selected, and sampled point corresponding to remaining training dataset is that training to be selected is adopted Fractional-sample point in sampling point set.

3. the soil sulphur element content prediction method based on soil types merger and multiple regression according to claim 2, feature It is, in the step 1a, is handled as follows for all neighbouring sample points in soil region to be predicted, is realized for pre- The adjustment for surveying sampled point projection coordinate in soil region updates, so that the Euclidean distance between neighbouring sample point is not less than d: if adopting The Euclidean distance of sampling point p1 and p2 are less than d, and using sampled point p1 as the center of circle, d is that the number of sampling points in radius is n1, to adopt Sampling point p2 is the center of circle, and d is that the number of sampling points in radius is n2, if n1 < n2, adjusts the projection coordinate of p1, makes to sample Euclidean distance between point p1 and sampled point p2 is d+g；If n1 >=n2, the projection coordinate of p2 is adjusted, makes sampled point p1 and adopts Euclidean distance between sampling point p2 is d+g；Wherein, d indicates that soil region to be predicted specifies the space of each soil information index Resolution ratio, g indicate default adjustment distance.

4. the soil sulphur element content prediction method based on soil types merger and multiple regression according to claim 2, feature It is, in the step 1b, is standardized using Z-score standardized method for Pred, makes its data fit standard just State distribution.

5. the soil sulphur element content prediction method based on soil types merger and multiple regression according to claim 2, feature It is, in the step 1f, using sampled point as extracting object, corresponds to trained sampled point to be selected by extracting in total data set Data Data corresponding to sampled point in set construct four training datasets, wherein each training data concentrates vector line number difference For r, 50% × r, 25% × r, 12% × r, wherein r indicates the number of collection point corresponding to trained sampled point set to be selected.

6. the soil sulphur element content prediction method based on soil types merger and multiple regression according to claim 2, feature It is, the step 2 includes the following steps:

Step 2a. is directed to each training dataset respectively, according to the following formula, calculates separately and obtains corresponding to training dataset respectively The coefficient of variation CV of soil manganese content under soil types rank_m；

Then under each soil types rank as corresponding to the training dataset soil manganese content coefficient of variation CV_m, construct the training number According to soil sulphur element content coefficient of variation data set of the collection based on soil types rank, that is, obtains each training dataset and be based respectively on soil The soil manganese content coefficient of variation data set CV_Soil_T of earth type level；Wherein, SD_mWith Mean_mFor m kind soil types grade The standard deviation and average value of soil manganese content are not descended；

Step 2b. is directed to each training dataset respectively, for the soil number of each soil types rank corresponding to training dataset According to, be respectively adopted Duncan method carry out multiple groups inter-sample difference significance analysis, obtain significant corresponding to the training dataset Property analysis as a result, obtaining each training dataset corresponding Duncan analysis result Dun_S respectively；

Step 2c. is directed to each training dataset respectively, and the Duncan according to corresponding to training dataset analyzes result Dun_S, And soil type rank corresponding to the training dataset, for each soil types corresponding to the training dataset Merger processing is carried out, and calculates the variation lines for updating soil manganese content under each soil types rank corresponding to the training dataset Number, constructs soil sulphur element content coefficient of variation data set of the training dataset based on merger soil types rank, that is, obtains each Training dataset is based respectively on the soil sulphur element content coefficient of variation data set CV_Soil_T_Dun of merger soil types rank；

Step 2d. is based respectively on the soil sulphur element content coefficient of variation data of merger soil types rank according to each training dataset Collect CV_Soil_T_Dun, the corresponding all training datas of building concentrate the soil manganese content coefficient of variation histogram of all collection points；

Step 2e. uses Principal Component Analysis, concentrates each collection point affiliated soil types rank respectively according to each training data, For all training datasets, building corresponds respectively to the principal component analysis scatter plot of soil type rank；

Step 2f. is based on soil manganese content coefficient of variation histogram and principal component analysis scatter plot, determines soil types to difference The conspicuousness that sampling density soil attribute Spatial Variability influences.

7. the soil sulphur element content prediction method based on soil types merger and multiple regression according to claim 6, feature It is, the step 3 includes the following steps:

Step 3a. obtains optimal independent variable set Pred_ corresponding to each training dataset by stepwise regression method respectively OLS；

Step 3b. is directed to each training dataset respectively, the optimal independent variable set Pred_ according to corresponding to training dataset OLS carries out models fitting to training dataset using Geographically weighted regression procedure, obtains collection point corresponding to the training dataset Regression coefficient collection Coeff；

Step 3c. is directed to each training dataset respectively, according to optimal corresponding to training dataset and the training dataset Independent variable set Pred_OLS calculates the standard deviation for obtaining Multivariable regressive analysis model corresponding to the training dataset, that is, obtains The standard deviation STD of each corresponding Multivariable regressive analysis model of training dataset difference_MLR；

It calculates and obtains the riding quality SI that training dataset corresponds to all independents variable, that is, it is right respectively to obtain each training dataset Answer the riding quality SI of its all independent variable, in formula, INTQ_GWRIndicate collection point regression coefficient corresponding to training dataset Interquartile-range IQR；

Compared with 1, it is flat to judge whether model constructed by independent variable and dependent variable meets second order according to Average_SI by step 3f. Stability.

8. the soil sulphur element content prediction method based on soil types merger and multiple regression according to claim 7, feature It is, the step 4 includes the following steps:

Step 4a. is directed to each training dataset respectively, and using Geostatistics analysis method, building obtains the optimal of training dataset Semivariable function model, and calculate its space effective distance and determine coefficient with model；

If the model of the optimal semivariable function model of all training datasets of step 4b. determines that coefficient is all larger than 0.5, and block gold number It is respectively less than 25% with the ratio of base station value, then all training datasets use geo-statistic model, and enter step 4c；Otherwise own Training dataset does not use geo-statistic model, and enters step 4d；

Step 4c. is according to step 2, the conclusion of step 3 and step 4b, i.e., soil types is to soil attribute spatial variability influence degree Test result is assumed with second-order stationary, by following selection prediction model:

If soil types soil attribute spatial variability is influenced it is not significant and meet second order stationarity it is assumed that if select polynary time Return and constitutes prediction model set with geo-statistic；

If soil types soil attribute spatial variability is influenced it is not significant, be unsatisfactory for second order stationarity it is assumed that if select polynary time Return and constitutes prediction model set with Geographical Weighted Regression；

If soil types soil attribute spatial variability is influenced it is significant, meet second order stationarity it is assumed that if select multiple regression and Subregion geo-statistic model constitutes prediction model set；

If soil types soil attribute spatial variability is influenced it is significant, be unsatisfactory for second order stationarity it is assumed that if select multiple regression, Geographical Weighted Regression and subregion Geographical Weighted Regression constitute prediction model set；

Step 4d. influences journey to soil attribute spatial variability according to the conclusion namely soil types of step 2, step 3 and step 4b Degree assumes test result with second-order stationary, by following selection prediction model:

If soil types soil attribute spatial variability is influenced it is not significant, meet second order stationarity it is assumed that if select multiple regression, Regression-kriging constitutes prediction model set；

If soil types soil attribute spatial variability is influenced it is significant, meet second order stationarity it is assumed that if select multiple regression structure At prediction model set；

If soil types soil attribute spatial variability is influenced it is significant, be unsatisfactory for second order stationarity it is assumed that if select multiple regression, Geographical Weighted Regression and subregion Geographical Weighted Regression constitute prediction model set.

9. the soil sulphur element content prediction method based on soil types merger and multiple regression according to claim 8, feature It is, the step 5 includes the following steps:

Step 5a. is the training dataset of sampled points whole in trained sampled point set to be selected based on sampled point, for step 4 institute The model obtained in prediction model set is trained, and obtains trained prediction model set；

Step 5b. concentrates the soil of each collection point using the model in trained prediction model set for verify data Manganese content is predicted, and according to validation data set, calculates the root-mean-square error for obtaining model prediction in prediction model set；

Step 5c. repeats step 5b preset times, calculates the root-mean-square error mean value of each cross validation, and select lowest mean square Model corresponding to root error mean constitutes optimum prediction model set as optimum prediction model；

Step 5d. according to step 3a obtain each training dataset corresponding to optimal independent variable set Pred_OLS, obtain to Predict argument data corresponding to non-acquired point in soil region；

Step 5e. is using obtained optimum prediction model set and the be obtained from variable data of step 5d in step 5c, for pre- The soil manganese content for surveying non-acquired point in soil region is predicted, realizes that the soil manganese content for being directed to soil region to be predicted is pre- It surveys, obtains soil region soil manganese content spatial distribution map to be predicted.

10. the soil sulphur element content prediction method based on soil types merger and multiple regression according to claim 9, feature It is, in the step 5a, if uniting in the obtained prediction model set of the step 4 including subregion Geographical Weighted Regression or subregion Count model, then as follows for the training method of subregion Geographical Weighted Regression:

Using the region of the soil types of merger as individual survey region, added by the subregion geography as follows based on soil types Weighted regression model:

Carry out the training of partial model, wherein (u_i,v_i) be sampling point i coordinate, β₀(u_i,v_i), β_km(u_i,v_i) and ε_iIt respectively represents Constant term, local regression coefficient and prediction deviation in local regression, m indicate that soil types grade variable, M indicate soil types Rank sum, x_ikmIndicate that the corresponding kth kind of sampling point i specifies soil information index under m kind soil types rank.