CN106980603B - Soil sulphur element content prediction method based on soil types merger and multiple regression - Google Patents
Soil sulphur element content prediction method based on soil types merger and multiple regression Download PDFInfo
- Publication number
- CN106980603B CN106980603B CN201710099018.5A CN201710099018A CN106980603B CN 106980603 B CN106980603 B CN 106980603B CN 201710099018 A CN201710099018 A CN 201710099018A CN 106980603 B CN106980603 B CN 106980603B
- Authority
- CN
- China
- Prior art keywords
- soil
- training dataset
- sampled point
- types
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
Abstract
The present invention relates to the soil sulphur element content prediction methods based on soil types merger and multiple regression, it is related to soil trace element in soil type, the processing of dividing and ruling of the different spaces variation features shown, its special heterogeneity can be detected by analyzing the dispersion degree of the effective manganese content spatial distribution of soil, and can go out the Problems of Multiple Synteny in local regression analysis by local regression analyzing and diagnosing;Especially during spatial prediction, Comprehensive Model is constructed by estimating the effective manganese content spatial distribution characteristic of soil under different Soil Sample density.
Description
Technical field
The present invention relates to the soil sulphur element content prediction methods based on soil types merger and multiple regression, belong to soil attribute
Electric powder prediction.
Background technique
As microelement necessary to plant production, manganese (Mn) can directly participate in the photosynthesis of plant growth, make
It is sprouted for activator, the promotion seed of a variety of enzymes, determination study has highly important agrology, biological significance.Soil
Whether microelement is deficient, is generally not to calculate the total content of its different shape, but depend on its bio-available Zn concentration.Manganese ion
Main forms be Mn2+、Mn3+、Mn4+, reservation mode in the soil mainly have organic manganese, water-soluble state manganese, replacement state
The modes such as manganese, mineral state manganese and the inorganic salts containing manganese.Domestic and foreign scholars generally define exchange state, water-soluble state and labile reduction manganese
For available state manganese (Soil Available Mn).Available state manganese has influence the most direct, content height to plant growth
Directly determine soil for manganese intensity.The measuring method of the effective manganese content of soil generallys use DTPA extraction agent method.Soil
The content of microelement is mainly determined by soil parent material and soil-forming process.The soil manganese content of different shape is in nature
Under remain the state of balance a kind of, main influence factor include the soil moisture, soil moisture, soil organic matter content,
Soil redox potential and pH etc..The effective manganese content of soil on China's red soil, yellow earth is higher, and northern calacareous soil relatively lacks
Manganese.
Soil investigation and drawing are the basic means of production area pedological map, Pedo-transfer functions, environment at home and abroad
Planning, plant nutrient analysis aspect play a significant role.In the late three decades, conventional soil is investigated, especially second of China
Whole nation generaI investigation, achieves more detailed soil resources inventory inventory in China, has accumulated valuable soil data, produces one
The soil trace element spatial distribution map of series.However, making rapid progress due to present farming technique, microelement contents of soil
Also fast-changing space-time situation is presented in recent years, history soil attribute figure has been difficult to meet real-time agricultural projects demand.
In terms of the effective manganese of soil (Soil Available Mn) content prediction, the sky of the more concern effective manganese content of soil of domestic scholars
Between the macroscopic analyses method such as distribution, Characteristics of spatial variability, influence factor.On survey region scale, it is concentrated mainly on field ruler
Degree, small watershed scale, studied on the biggish regional scale of area it is less, trace it to its cause be the effective manganese content of soil space it is different
Matter is higher, is difficult to adopt a kind of prediction model systematically to predict the manganese content spatial distribution of all areas.
Soil types, which has microelement contents of soil, to be significantly affected.Layer is specifically occurring for the effective manganese content of soil
Inside often with the feature of homogeneous, and soil types is divided according to generation channel type and diagnostic feature.By difference
Soil-forming process influences, and the effective manganese content spatial distribution of soil has the special heterogeneity of height, and classical Fisher statistical theory exists
Exist in terms of studying soil property spatial varying law obvious insufficient.Geostatistical lays particular emphasis on the analysis of area variable space structure
With simulation, the spatial spreading degree of regionalized variable is characterized using variation function and variation curve.However, the reason of Geostatistical
Assume that (regionalized variable covariance exist and identical) shortcoming considers in practical applications by basic second-order stationary.Multiple linear returns
Return model to be often used as the fundamental analysis tool of relationship between research soil property and landscape property, is based on Soil-landscape
It is most widely used in the soil attribute prediction of model, however this method is determined by global exploratory analysis or correlation analysis
Relationship between soil attribute and landscape attribute has certain limitation.In view of the correlation of spatial prediction and geographical location,
According to First Law of Geography, neighbouring between geographical location promotes data to have different spatial coherences, therefore local regression
Model (Geographical Weighted Regression Model) is obtaining remarkable effect in the latest 20 years.It is non-that this method is capable of handling space in regression analysis
Steady phenomenon.But environmental variance is processed into dummy variable, ignored by this method when handling discrete variable (such as soil types)
The space distribution rule that soil effective manganese content is shown in soil type.
Statistical model (linear regression model (LRM), Geostatistical model etc.) copes with soil information requirements for high precision, passes through
To the quantification of Soil-landscape model, the prediction of soil attribute spatial reasoning is realized.Since the effective manganese content of soil is in different soils
The potential variation trend shown in type, Classical forecast model often have ignored the particularity of this variable of soil types, from
And the effective manganese content of soil is caused to face problems in specific spatial prediction, being summed up mainly has:
(1) working method of the conventional soil drawing based on " field investigation → interior interpretation → field check → demarcation is at figure "
Fast-developing precision agriculture, environmental management, land management demand can not have been adapted to, and different application field is to the micro member of soil
Precision, the timeliness of the spatial distribution prediction of element propose requirements at the higher level.A series of the universal of geographical information management technique such as 3S are compeled
Be essential the spatial analysis for wanting a kind of facing area rank, prediction technique.
(2) in terms of the classification of soils, the order of soil, subclass, great soil group, subclass, soil are contained to Soil Taxonomy cascade
Belong to and six ranks of the soil series.Soil type rank corresponds to the effective manganese content space distribution rule of different soil, by these
Complicated rule fusion can inevitably cause to advise on the soil attribute spatial distribution map (raster data) of a rule-based grid
Conflict between then.To find out its cause, in order to minimize models fitting error, regularization function is multiple in model training regularisation parameter
Miscellaneous degree is generally monotonically increasing function, and all more rules cause model complexity to steeply rise.And if to each rank soil class
Type provides individual regular data, then will limit the use of model.How the rank of quantification soil types is to soil cartography shadow
It rings, is the important difficult point for promoting quantitative model generalization ability, restricting statistical model application.
(3) different sampling densities often reflects the Different Variation rule of the effective manganese content of soil.Optional sampling density
Determine the accuracy for seriously reducing soil quality assessment work.Theoretically, high sampling density can promote precision of prediction, but excessively high
Sampling density will lead to the rapid growth of human and material resources expense.Comprehensive measurement difference is adopted when minimizing objective function
The feature of sample density, though smaller training error can be obtained, when the effective manganese content of prediction different sampling soil, mould
Type only considers the feature extraction automatically selected, the generalization ability of restricted model.Limit of the different sampling in spatial prediction model
System etc. causes prediction result often to there is uncertainty, which passes through the processes such as subsequent modeling, analysis and decision
It is transmitted and profound impact is generated to final result.
(4) it is the prerequisite for determining to use prediction model that analysis space data space is non-stationary.Conventional method be by
The linear function of geospatial location is integrated to conventional linear regression model, estimates spreading parameter, mould using least square method
The parameter of type changes with spatial displacement.However the method that this method still falls within trend fitting, when parameter becomes
It is still subject to certain restrictions when changing complex.It is suitable how to be used according to the spatial stationarity property detection event under different sampling
Prediction technique be still soil attribute spatial prediction research challenge.
In conclusion being equally present in other analysis application aspects of the effective manganese of soil for the above-mentioned deficiency analyzed.
Summary of the invention
Technical problem to be solved by the invention is to provide the soil manganese contents based on soil types merger and multiple regression
Prediction technique, under the conditions of capable of systematically covering different soils sampling density, the Analysis on spatial variability of the effective manganese content of soil,
Soil types merger and second order stationarity test three big key techniques, effectively solve to have in existing multiple regression analysis
The select permeability of body Model.
In order to solve the above-mentioned technical problem the present invention uses following technical scheme: the present invention is devised to be returned based on soil types
And the soil sulphur element content prediction method with multiple regression, include the following steps:
Step 1. obtains in soil region to be predicted that each sampled point is corresponding respectively to specify each soil information to refer to respectively
Mark and corresponding soil manganese content, and total data set is constructed, meanwhile, it is independent variable that each soil information index is specified in definition,
Soil manganese content is dependent variable;It is then based on different soils sampling density, validation data set and at least two is constructed by total data set
A training dataset;
Step 2. is directed to each training dataset respectively, is based on merger soil types, obtains training dataset and is based on merger
The soil sulphur element content coefficient of variation data set of soil types rank, and construct corresponding all training datas and concentrate all collection points
Soil manganese content coefficient of variation histogram;Meanwhile it constructing all training datasets and corresponding respectively to soil type rank
Principal component analysis scatter plot;It is then based on soil manganese content coefficient of variation histogram and principal component analysis scatter plot, determines soil
The conspicuousness that type influences different sampling soil attribute Spatial Variability;
Step 3. obtains the corresponding optimal independent variable set of each training dataset difference respectively, and obtains each training
Data set respectively corresponds the riding quality of its all independent variable, judges whether model constructed by independent variable and dependent variable meets two
Rank stationarity;
Step 4. assumes test result, base to soil attribute spatial variability influence degree and second-order stationary according to soil types
In each training dataset, prediction model set is selected;
Step 5. is trained using prediction model set for one of training dataset, and optimum prediction mould is selected
Type is predicted for soil region to be predicted, obtains soil region soil manganese content spatial distribution map to be predicted.
As a preferred technical solution of the present invention, the step 1 includes the following steps:
Step 1a. obtains the projection coordinate collection Site of each sampled point in soil region to be predicted, and obtains neighbouring sample
Then Euclidean distance between point is adjusted update for the projection coordinate of sampled point in soil region to be predicted, so that phase
Euclidean distance between adjacent sampled point is not less than d, and d indicates that soil region to be predicted specifies the space point of each soil information index
Resolution;
The corresponding soil information index of step 1b. preset time, weather, matrix, landform, biological top 5 factor difference is total
With being obtained respectively as each soil information index is specified, each sampled point in soil region to be predicted is corresponding specified each respectively
A soil information index constitutes Pred={ X as independent variable1,…,Xk,…,XK, k={ 1 ..., K }, K expression are specified each
The species number of soil information index, vector XkIndicate that each sampled point respectively corresponds kth kind and soil information index is specified to be constituted
Vector, be the vector of n × 1, n indicate sampled point sum;
Step 1c. obtains the soil types rank that each sampled point difference is corresponding in soil region to be predicted, and by each
The corresponding sampled point of soil types rank difference, constitutes Soil_T=(Type1,…,Typem,…,TypeM), Soil_T n
× M vector, m={ 1 ..., M }, TypemIndicate collection point corresponding to m kind soil types rank, M indicates soil types grade
Other type;
Step 1d. is using soil manganese content as dependent variable, according to Pred, the collection of each sampled point position soil manganese content
S, Site, Soil_T are closed, is constructed total data set Data=(Pred, S, Site, Soil_T);Meanwhile according to soil area to be predicted
The area in domain obtains sampling density Density corresponding to sampled point;
Step 1e. is using sampled point as extracting object, by total data set Data, any extract accounts for the default ratio of whole sampled points
Data corresponding to each sampled point of number of cases amount, constitute validation data set, and remaining sampled point constitutes trained sampling point set to be selected
It closes;
Step 1f. corresponds to trained sampling point set to be selected using sampled point as extracting object, by extracting in total data set Data
Data corresponding to sampled point in conjunction construct at least two training datasets, also, corresponding to one of training dataset
Sampled point is whole sampled points in trained sampled point set to be selected, and sampled point corresponding to remaining training dataset is instruction to be selected
Practice the fractional-sample point in sampled point set.
As a preferred technical solution of the present invention, in the step 1a, for all phases in soil region to be predicted
Adjacent sampled point is handled as follows, and realizes that the adjustment for sampled point projection coordinate in soil region to be predicted updates, so that phase
Euclidean distance between adjacent sampled point is not less than d:
If the Euclidean distance of sampled point p1 and p2 are less than d, using sampled point p1 as the center of circle, d is the sampled point in radius
Number is n1, and using sampled point p2 as the center of circle, d is that the number of sampling points in radius is n2, if n1 < n2, adjusts the throwing of p1
Shadow coordinate makes the Euclidean distance d+g between sampled point p1 and sampled point p2;If n1 >=n2, the projection coordinate of p2 is adjusted, is made
Euclidean distance between sampled point p1 and sampled point p2 is d+g;Wherein, d indicates that soil region to be predicted specifies each soil to believe
The spatial resolution of index is ceased, g indicates default adjustment distance.
As a preferred technical solution of the present invention, in the step 1b, it is directed to using Z-score standardized method
Pred is standardized, and makes its data fit standardized normal distribution.
As a preferred technical solution of the present invention, in the step 1f, using sampled point as extracting object, by total data
Collect to extract in Data and correspond to data corresponding to sampled point in trained sampled point set to be selected, constructs four training datasets,
Wherein, it is respectively r, 50% × r, 25% × r, 12% × r that each training data, which concentrates vector line number, wherein r indicates instruction to be selected
Practice the number of collection point corresponding to sampled point set.
As a preferred technical solution of the present invention, the step 2 includes the following steps:
Step 2a. is directed to each training dataset respectively, according to the following formula, it is right to calculate separately acquisition training dataset institute
Answer the coefficient of variation CV of soil manganese content under each soil types rankm;
Then under each soil types rank as corresponding to the training dataset soil manganese content coefficient of variation CVm, building should
Soil sulphur element content coefficient of variation data set of the training dataset based on soil types rank obtains each training dataset difference
Soil sulphur element content coefficient of variation data set CV_Soil_T based on soil types rank;Wherein, SDmWith MeanmFor m kind soil
The standard deviation and average value of soil manganese content under type level;
Step 2b. is directed to each training dataset respectively, for the soil of each soil types rank corresponding to training dataset
Earth data are respectively adopted Duncan method and carry out multiple groups inter-sample difference significance analysis, obtain corresponding to the training dataset
Significance analysis is as a result, obtain the corresponding Duncan analysis result Dun_S of each training dataset difference;
Step 2c. is directed to each training dataset respectively, and the Duncan according to corresponding to training dataset analyzes result
Soil type rank corresponding to Dun_S and the training dataset, for each soil corresponding to the training dataset
Earth type carries out merger processing, and calculates the change for updating soil manganese content under each soil types rank corresponding to the training dataset
Different coefficient constructs soil sulphur element content coefficient of variation data set of the training dataset based on merger soil types rank, that is, obtains
Each training dataset is based respectively on the soil sulphur element content coefficient of variation data set CV_Soil_T_ of merger soil types rank
Dun;
Step 2d. is based respectively on the soil sulphur element content coefficient of variation of merger soil types rank according to each training dataset
Data set CV_Soil_T_Dun, the corresponding all training datas of building concentrate the soil manganese content coefficient of variation of all collection points straight
Fang Tu;
Step 2e. uses Principal Component Analysis, concentrates each collection point affiliated soil types respectively according to each training data
Rank, for all training datasets, building corresponds respectively to the principal component analysis scatter plot of soil type rank;
Step 2f. is based on soil manganese content coefficient of variation histogram and principal component analysis scatter plot, determines soil types pair
The conspicuousness that different sampling soil attribute Spatial Variability influences.
As a preferred technical solution of the present invention, the step 3 includes the following steps:
Step 3a. obtains optimal independent variable set corresponding to each training dataset by stepwise regression method respectively
Pred_OLS;
Step 3b. is directed to each training dataset respectively, the optimal independent variable set according to corresponding to training dataset
Pred_OLS carries out models fitting to training dataset using Geographically weighted regression procedure, obtains corresponding to the training dataset
The regression coefficient collection Coeff of collection point;
Step 3c. is directed to each training dataset respectively, according to corresponding to training dataset and the training dataset
Optimal independent variable set Pred_OLS calculates the standard deviation for obtaining Multivariable regressive analysis model corresponding to the training dataset, i.e.,
Obtain the standard deviation STD of each corresponding Multivariable regressive analysis model of training dataset differenceMLR;
Step 3d. is directed to each training dataset respectively, according to the following formula:
It calculates and obtains the riding quality SI that training dataset corresponds to all independents variable, that is, obtain each training dataset point
Do not correspond to the riding quality SI of its all independent variable, in formula, INTQGWRIndicate that collection point corresponding to training dataset returns system
Several interquartile-range IQRs;
Step 3e., which is calculated, obtains the mean value that each training dataset respectively corresponds its all independent variable riding quality SI
Average_SI;
Step 3f. compared with 1, judges whether model constructed by independent variable and dependent variable meets two according to Average_SI
Rank stationarity.
As a preferred technical solution of the present invention, the step 4 includes the following steps:
Step 4a. is directed to each training dataset respectively, and using Geostatistics analysis method, building obtains training dataset
Optimal semivariable function model, and calculate its space effective distance and determine coefficient with model;
If the model of the optimal semivariable function model of all training datasets of step 4b. determines that coefficient is all larger than 0.5, and block
Gold number and the ratio of base station value are respectively less than 25%, then all training datasets use geo-statistic model, and enter step 4c;Otherwise
All training datasets do not use geo-statistic model, and enter step 4d;
Step 4c. influences soil attribute spatial variability according to step 2, the conclusion of step 3 and step 4b, i.e. soil types
Degree and second-order stationary assume test result, by following selection prediction model:
If soil types soil attribute spatial variability is influenced it is not significant and meet second order stationarity it is assumed that if selection it is more
Member returns and geo-statistic constitutes prediction model set;
If soil types soil attribute spatial variability is influenced it is not significant, be unsatisfactory for second order stationarity it is assumed that if selection it is more
Member returns and Geographical Weighted Regression constitutes prediction model set;
If soil types soil attribute spatial variability is influenced it is significant, meet second order stationarity it is assumed that if select polynary time
Return and constitutes prediction model set with subregion geo-statistic model;
If soil types soil attribute spatial variability is influenced it is significant, be unsatisfactory for second order stationarity it is assumed that if select it is polynary
It returns, Geographical Weighted Regression and subregion Geographical Weighted Regression constitute prediction model set;
Step 4d. is according to the conclusion namely soil types of step 2, step 3 and step 4b to soil attribute spatial variability shadow
The degree of sound and second-order stationary assume test result, by following selection prediction model:
If soil types soil attribute spatial variability is influenced it is not significant, meet second order stationarity it is assumed that if select it is polynary
It returns, Regression-kriging constitutes prediction model set;
If soil types soil attribute spatial variability is influenced it is not significant, be unsatisfactory for second order stationarity it is assumed that if selection it is more
Member returns and Geographical Weighted Regression constitutes prediction model set;
If soil types soil attribute spatial variability is influenced it is significant, meet second order stationarity it is assumed that if select polynary time
Return and constitutes prediction model set;
If soil types soil attribute spatial variability is influenced it is significant, be unsatisfactory for second order stationarity it is assumed that if select it is polynary
It returns, Geographical Weighted Regression and subregion Geographical Weighted Regression constitute prediction model set.
As a preferred technical solution of the present invention, the step 5 includes the following steps:
Step 5a. is the training dataset of sampled points whole in trained sampled point set to be selected based on sampled point, for step
Model in rapid 4 obtained prediction model set is trained, and obtains trained prediction model set;
Step 5b. concentrates each collection point for verify data using the model in trained prediction model set
Soil manganese content is predicted, and according to validation data set, calculates the root mean square mistake for obtaining model prediction in prediction model set
Difference;
Step 5c. repeats step 5b preset times, calculates the root-mean-square error mean value of each cross validation, and select minimum
Model corresponding to root-mean-square error mean value constitutes optimum prediction model set as optimum prediction model;
Step 5d. according to step 3a obtain each training dataset corresponding to optimal independent variable set Pred_OLS, obtain
Obtain argument data corresponding to non-acquired point in soil region to be predicted;
Step 5e. uses obtained optimum prediction model set and the be obtained from variable data of step 5d in step 5c, for
The soil manganese content of non-acquired point is predicted in soil region to be predicted, realizes that the soil sulphur element for soil region to be predicted contains
Amount prediction, obtains soil region soil manganese content spatial distribution map to be predicted.
As a preferred technical solution of the present invention, in the step 5a, if the obtained prediction model set of the step 4
In include subregion Geographical Weighted Regression or subregion geo-statistic model, then for subregion Geographical Weighted Regression training method it is as follows:
Using the region of the soil types of merger as individual survey region, by the subregion as follows based on soil types
Manage Weight Regression Model:
Carry out the training of partial model, wherein (ui,vi) be sampling point i coordinate, β0(ui,vi), βkm(ui,vi) and εiPoint
Constant term, local regression coefficient and the prediction deviation in local regression are not represented, and m indicates that soil types grade variable, M indicate soil
Earth type level sum, xikmIndicate that the corresponding kth kind of sampling point i specifies soil information index under m kind soil types rank.
Soil sulphur element content prediction method of the present invention based on soil types merger and multiple regression uses the above technology
Scheme compared with prior art, has following technical effect that
(1) the soil sulphur element content prediction method based on soil types merger and multiple regression that the present invention designs, Neng Gougeng
Add the Spatial Differentiation of accurately analog variable, meanwhile, soil-scape can preferably be embodied by soil types merger operation
Minimal geographical unit in model is seen, promotes the generalization ability of prediction model, and then model is prevented excessively to be fitted training data.Not
Rapidly be minimized according to the correlation of independent variable and soil attribute under conditions of soil types data update and train
Error provides computational efficiency more higher than traditional Multiple Regression Analysis Method and precision of prediction;
(2) it in the soil sulphur element content prediction method based on soil types merger and multiple regression that the present invention designs, proposes
Integration test second order stationarity assumes mechanism, can integrate using a variety of existing test methods, global more by comparative analysis
The model coefficient of first linear regression and local weighted recurrence detects optimal characteristics choosing of the objective attribute target attribute on different spaces scale
Select, lift scheme it is explanatory, determine optimum prediction model, avoid blindly using prediction brought by Individual forecast model
Error.This method also avoids the limitation that measuring technology is assumed using a kind of second order stationarity, and integrated test machine fixture has more
High universality and stability has wide industrial applications prospect;
(3) in the soil sulphur element content prediction method based on soil types merger and multiple regression that the present invention designs, pass through
The special heterogeneity for increasing the mode comprehensive analysis soil attribute of sampling density, more can accurately measure high sampling density condition
Latent space variation law present in lower soil attribute, and then in the case where precision of prediction does not reduce, it is expected in certain journey
Sampling point quantity is reduced on degree, and then reduces the cost of the following region class sampling.
Detailed description of the invention
Fig. 1 is the broad flow diagram of dependent variable and the building of argument data collection;
Fig. 2 a be neighbouring sample point two-by-two between Euclidean distance be less than argument data figure layer spatial resolution schematic diagram;
Fig. 2 b is schematic diagram after sampling point position micro-shifting
Fig. 3 is comprehensive point of the Spatial Variability based on soil attribute under statistical method and principal component method soil type
Analyse flow chart;
Fig. 4 is that the second order stationarity of training dataset assumes test flow chart;
Fig. 5 is the flow chart for selecting optimum prediction model and predicting soil attribute spatial distribution;
Fig. 6 be to after soil type, soil types merger in the embodiment of the present invention, the coefficient of variation of sampling density it is straight
Fang Tu;
Fig. 7 a is the effective manganese content spatial distribution map of soil predicted in the embodiment of the present invention using multiple regression procedure;
Fig. 7 b is the effective manganese content space of soil predicted in the embodiment of the present invention using subregion Geographically weighted regression procedure
Distribution map.
Specific embodiment
Specific embodiments of the present invention will be described in further detail with reference to the accompanying drawings of the specification.
The invention belongs to the analysis methods towards soil attribute spatial prediction in metering agrology, are related to soil trace element
The processing of dividing and ruling of the different spaces variation features shown in soil type, can be by analyzing the effective manganese content of soil
The dispersion degree of spatial distribution detects its special heterogeneity, and can pass through local regression analyzing and diagnosing and go out local regression analysis
In Problems of Multiple Synteny.It is effective by estimating soil under different Soil Sample density especially during spatial prediction
Manganese content spatial distribution characteristic constructs Comprehensive Model.
The present invention devises the soil sulphur element content prediction method based on soil types merger and multiple regression, specifically include as
Lower step:
Step 1. as shown in Figure 1, obtain in soil region to be predicted respectively each sampled point respectively corresponding to specify it is each
Soil information index and corresponding soil manganese content, and total data set is constructed, meanwhile, definition specifies each soil information to refer to
It is designated as independent variable, soil manganese content is dependent variable;It is then based on different soils sampling density, verify data is constructed by total data set
Collection and at least two training datasets.
Wherein, step 1 includes the following steps:
Step 1a. obtains the projection coordinate collection Site of each sampled point in soil region to be predicted, and obtains neighbouring sample
Then Euclidean distance between point is adjusted update for the projection coordinate of sampled point in soil region to be predicted, so that phase
Euclidean distance between adjacent sampled point is not less than d, and d indicates that soil region to be predicted specifies the space point of each soil information index
Resolution.Wherein, as shown in Figure 2 a and 2 b, it in step 1a, is carried out such as neighbouring sample points all in soil region to be predicted
Lower processing realizes that the adjustment for sampled point projection coordinate in soil region to be predicted updates, so that between neighbouring sample point
Euclidean distance is not less than d:
If the Euclidean distance of sampled point p1 and p2 are less than d, using sampled point p1 as the center of circle, d is the sampled point in radius
Number is n1, and using sampled point p2 as the center of circle, d is that the number of sampling points in radius is n2, if n1 < n2, adjusts the throwing of p1
Shadow coordinate makes the Euclidean distance d+g between sampled point p1 and sampled point p2;If n1 >=n2, the projection coordinate of p2 is adjusted, is made
Euclidean distance between sampled point p1 and sampled point p2 is d+g;Wherein, d indicates that soil region to be predicted specifies each soil to believe
The spatial resolution of index is ceased, g indicates default adjustment distance.
The corresponding soil information index of step 1b. preset time, weather, matrix, landform, biological top 5 factor difference is total
With being obtained respectively as each soil information index is specified, each sampled point in soil region to be predicted is corresponding specified each respectively
A soil information index constitutes Pred={ X as independent variable1,…,Xk,…,XK, k={ 1 ..., K }, K expression are specified each
The species number of soil information index, vector XkIndicate that each sampled point respectively corresponds kth kind and soil information index is specified to be constituted
Vector, be the vector of n × 1, n indicates the sum of sampled point, then, is marked using Z-score standardized method for Pred
Standardization makes its data fit standardized normal distribution.
Step 1c. obtains the soil types rank that each sampled point difference is corresponding in soil region to be predicted, and by each
The corresponding sampled point of soil types rank difference, constitutes Soil_T=(Type1,…,Typem,…,TypeM), Soil_T n
× M vector, m={ 1 ..., M }, TypemIndicate collection point corresponding to m kind soil types rank, M indicates soil types grade
Other type.
Step 1d. is using soil manganese content as dependent variable, according to Pred, the collection of each sampled point position soil manganese content
S, Site, Soil_T are closed, is constructed total data set Data=(Pred, S, Site, Soil_T);Meanwhile according to soil area to be predicted
The area in domain obtains sampling density Density corresponding to sampled point.
Step 1e. is using sampled point as extracting object, by total data set Data, any extract accounts for the default ratio of whole sampled points
Data corresponding to each sampled point of number of cases amount, constitute validation data set, and remaining sampled point constitutes trained sampling point set to be selected
It closes.
Step 1f. corresponds to trained sampling point set to be selected using sampled point as extracting object, by extracting in total data set Data
Data corresponding to sampled point in conjunction construct at least two training datasets, also, corresponding to one of training dataset
Sampled point is whole sampled points in trained sampled point set to be selected, and sampled point corresponding to remaining training dataset is instruction to be selected
Practice the fractional-sample point in sampled point set, above-mentioned steps 1f in practical application, specifically can be designed as with sampled point to extract
Object corresponds to data corresponding to sampled point in trained sampled point set to be selected as extracting in total data set Data, building four
A training dataset, wherein it is respectively r, 50% × r, 25% × r, 12% × r that each training data, which concentrates vector line number,
In, r indicates the number of collection point corresponding to trained sampled point set to be selected.
Step 2. is directed to each training dataset respectively, is based on merger soil types, obtains training dataset and is based on merger
The soil sulphur element content coefficient of variation data set of soil types rank, and construct corresponding all training datas and concentrate all collection points
Soil manganese content coefficient of variation histogram;Meanwhile it constructing all training datasets and corresponding respectively to soil type rank
Principal component analysis scatter plot;It is then based on soil manganese content coefficient of variation histogram and principal component analysis scatter plot, determines soil
The conspicuousness that type influences different sampling soil attribute Spatial Variability.
As shown in figure 3, above-mentioned steps 2 include the following steps:
Step 2a. is directed to each training dataset respectively, according to the following formula, it is right to calculate separately acquisition training dataset institute
Answer the coefficient of variation CV of soil manganese content under each soil types rankm;
Then under each soil types rank as corresponding to the training dataset soil manganese content coefficient of variation CVm, building should
Soil sulphur element content coefficient of variation data set of the training dataset based on soil types rank obtains each training dataset difference
Soil sulphur element content coefficient of variation data set CV_Soil_T based on soil types rank;Wherein, SDmWith MeanmFor m kind soil
The standard deviation and average value of soil manganese content under type level.
Step 2b. is directed to each training dataset respectively, for the soil of each soil types rank corresponding to training dataset
Earth data are respectively adopted Duncan method and carry out multiple groups inter-sample difference significance analysis, obtain corresponding to the training dataset
Significance analysis is as a result, obtain the corresponding Duncan analysis result Dun_S of each training dataset difference.
Step 2c. is directed to each training dataset respectively, and the Duncan according to corresponding to training dataset analyzes result
Soil type rank corresponding to Dun_S and the training dataset, for each soil corresponding to the training dataset
Earth type carries out merger processing, and calculates the change for updating soil manganese content under each soil types rank corresponding to the training dataset
Different coefficient constructs soil sulphur element content coefficient of variation data set of the training dataset based on merger soil types rank, that is, obtains
Each training dataset is based respectively on the soil sulphur element content coefficient of variation data set CV_Soil_T_ of merger soil types rank
Dun。
Step 2d. is based respectively on the soil sulphur element content coefficient of variation of merger soil types rank according to each training dataset
Data set CV_Soil_T_Dun, the corresponding all training datas of building concentrate the soil manganese content coefficient of variation of all collection points straight
Fang Tu.
Step 2e. uses Principal Component Analysis, concentrates each collection point affiliated soil types respectively according to each training data
Rank, for all training datasets, building corresponds respectively to the principal component analysis scatter plot of soil type rank.
Step 2f. is based on soil manganese content coefficient of variation histogram and principal component analysis scatter plot, determines soil types pair
The conspicuousness that different sampling soil attribute Spatial Variability influences.
Step 3. obtains the corresponding optimal independent variable set of each training dataset difference respectively, and obtains each training
Data set respectively corresponds the riding quality of its all independent variable, judges whether model constructed by independent variable and dependent variable meets two
Rank stationarity.
As shown in figure 4, above-mentioned steps 3 include the following steps:
Step 3a. obtains optimal independent variable set corresponding to each training dataset by stepwise regression method respectively
Pred_OLS。
Step 3b. is directed to each training dataset respectively, the optimal independent variable set according to corresponding to training dataset
Pred_OLS carries out models fitting to training dataset using Geographically weighted regression procedure, obtains corresponding to the training dataset
The regression coefficient collection Coeff of collection point.
Step 3c. is directed to each training dataset respectively, according to corresponding to training dataset and the training dataset
Optimal independent variable set Pred_OLS calculates the standard deviation for obtaining Multivariable regressive analysis model corresponding to the training dataset, i.e.,
Obtain the standard deviation STD of each corresponding Multivariable regressive analysis model of training dataset differenceMLR。
Step 3d. is directed to each training dataset respectively, according to the following formula:
It calculates and obtains the riding quality SI that training dataset corresponds to all independents variable, that is, obtain each training dataset point
Do not correspond to the riding quality SI of its all independent variable, in formula, INTQGWRIndicate that collection point corresponding to training dataset returns system
Several interquartile-range IQRs.
Step 3e., which is calculated, obtains the mean value that each training dataset respectively corresponds its all independent variable riding quality SI
Average_SI。
Step 3f. compared with 1, judges whether model constructed by independent variable and dependent variable meets two according to Average_SI
Rank stationarity.
Step 4. assumes test result, base to soil attribute spatial variability influence degree and second-order stationary according to soil types
In each training dataset, prediction model set is selected;Specifically comprise the following steps:
Step 4a. is directed to each training dataset respectively, and using Geostatistics analysis method, building obtains training dataset
Optimal semivariable function model, and calculate its space effective distance and determine coefficient with model.
If the model of the optimal semivariable function model of all training datasets of step 4b. determines that coefficient is all larger than 0.5, and block
Gold number and the ratio of base station value are respectively less than 25%, then all training datasets use geo-statistic model, and enter step 4c;Otherwise
All training datasets do not use geo-statistic model, and enter step 4d.
Step 4c. influences soil attribute spatial variability according to step 2, the conclusion of step 3 and step 4b, i.e. soil types
Degree and second-order stationary assume test result, by following selection prediction model:
If soil types soil attribute spatial variability is influenced it is not significant and meet second order stationarity it is assumed that if selection it is more
Member returns and geo-statistic constitutes prediction model set;
If soil types soil attribute spatial variability is influenced it is not significant, be unsatisfactory for second order stationarity it is assumed that if selection it is more
Member returns and Geographical Weighted Regression constitutes prediction model set;
If soil types soil attribute spatial variability is influenced it is significant, meet second order stationarity it is assumed that if select polynary time
Return and constitutes prediction model set with subregion geo-statistic model;
If soil types soil attribute spatial variability is influenced it is significant, be unsatisfactory for second order stationarity it is assumed that if select it is polynary
It returns, Geographical Weighted Regression and subregion Geographical Weighted Regression constitute prediction model set;
Step 4d. is according to the conclusion namely soil types of step 2, step 3 and step 4b to soil attribute spatial variability shadow
The degree of sound and second-order stationary assume test result, by following selection prediction model:
If soil types soil attribute spatial variability is influenced it is not significant, meet second order stationarity it is assumed that if select it is polynary
It returns, Regression-kriging constitutes prediction model set;
If soil types soil attribute spatial variability is influenced it is not significant, be unsatisfactory for second order stationarity it is assumed that if selection it is more
Member returns and Geographical Weighted Regression constitutes prediction model set;
If soil types soil attribute spatial variability is influenced it is significant, meet second order stationarity it is assumed that if select polynary time
Return and constitutes prediction model set;
If soil types soil attribute spatial variability is influenced it is significant, be unsatisfactory for second order stationarity it is assumed that if select it is polynary
It returns, Geographical Weighted Regression and subregion Geographical Weighted Regression constitute prediction model set.
Step 5. is trained using prediction model set for one of training dataset, and optimum prediction mould is selected
Type is predicted for soil region to be predicted, soil region soil manganese content spatial distribution map to be predicted is obtained, such as Fig. 5 institute
Show, specifically comprises the following steps:
Step 5a. is the training dataset of sampled points whole in trained sampled point set to be selected based on sampled point, for step
Model in rapid 4 obtained prediction model set is trained, and obtains trained prediction model set.Wherein, if the step
Include subregion Geographical Weighted Regression or subregion geo-statistic model in rapid 4 obtained prediction model set, is then weighted for subregion geography
The training method of recurrence is as follows:
Using the region of the soil types of merger as individual survey region, by the subregion as follows based on soil types
Manage Weight Regression Model:
Carry out the training of partial model, wherein (ui,vi) be sampling point i coordinate, β0(ui,vi), βkm(ui,vi) and εiPoint
Constant term, local regression coefficient and the prediction deviation in local regression are not represented, and m indicates that soil types grade variable, M indicate soil
Earth type level sum, xikmIndicate that the corresponding kth kind of sampling point i specifies soil information index under m kind soil types rank.
Step 5b. concentrates each collection point for verify data using the model in trained prediction model set
Soil manganese content is predicted, and according to validation data set, calculates the root mean square mistake for obtaining model prediction in prediction model set
Difference.
Step 5c. repeats step 5b preset times, calculates the root-mean-square error mean value of each cross validation, and select minimum
Model corresponding to root-mean-square error mean value constitutes optimum prediction model set as optimum prediction model.
Step 5d. according to step 3a obtain each training dataset corresponding to optimal independent variable set Pred_OLS, obtain
Obtain argument data corresponding to non-acquired point in soil region to be predicted.
Step 5e. uses obtained optimum prediction model set and the be obtained from variable data of step 5d in step 5c, for
The soil manganese content of non-acquired point is predicted in soil region to be predicted, realizes that the soil sulphur element for soil region to be predicted contains
Amount prediction, obtains soil region soil manganese content spatial distribution map to be predicted.
Next, by the soil sulphur element content prediction method designed by the present invention based on soil types merger and multiple regression,
It is applied in the effective manganese content prediction of Bozhou City soil, specific as follows:
Different from soil a great number of elements, the enrichment of soil manganese element and shortage can all cause crop growth to be obstructed.For example,
After plant manganese deficiency, it will appear different degrees of minus green spot on spire or old leaf.When plant leaf blade manganese content is greater than 600mg/
When kg, it may appear that manganese poisoning, cause Pressure trunk injection, leaf wither here it is sagging situations such as.Therefore, according to limited soil-like point data
Predict that the effective manganese content of soil of non-sampling area is worth with important practical with the independent variable (at native influence factor) easily obtained.
The effective manganese content prediction technique process of soil based on soil types merger and multiple regression are as follows:
The first step, the training dataset and validation data set for constructing different sampling
(1): preparing to influence the geodata figure layer of Zinc fractions and evolution, extract soil sampling dot position information, and will
All data are merged into a data and concentrate.
(2): by the different stage information extraction of the soil types of sampled point and update into data set Data=(Pred,
S, Site, Soil_T)=(X1,X2,…,XK, S, Long, Lati, Type1, Type2..., TypeM)。
(3): Euclidean distance between calculating neighbouring sample point two-by-two updates the coordinate apart from too small sampling point, more new data set
Data=(Pred, S, Site, Soil_T).
(1.d): the sampling density of sampling point is calculated, and validation data set and training dataset are constructed according to sampling density.It is false
If sampled point number is L, validation data set data volume is 20% × L, constructs 4 training datasets (d1, d2, d3, d4), data
Amount is respectively 80% × L, 40% × L, 20% × L, 10% × L.
Second step, the Spatial Variability that soil attribute under soil type is analyzed using statistical method and principal component method
(1): containing six order of soil, subclass, great soil group, subclass, penus and the soil series grades to Soil Taxonomy cascade
Not.Here by taking great soil group as an example, training dataset training dataset (d1, d2, d3, d4) is calculated in each soil types region
Coefficient of variation CV.The region contains 6 kinds of soil types, is expressed as (s1, s2, s3, s4, s5, s6).
(2): multiple groups inter-sample difference significance analysis, 6 kinds of soil classes being carried out to training dataset d1 using Duncan method
Type merger is two major classes, i.e. gs1, gs2.
(3): calculating soil types area after merger training dataset (d1, d2, d3, d4) of different sampling data
The coefficient of variation in domain, as shown in Figure 6.
(4): under soil type rank, for each soil types to different sampling training dataset
Execute principal component analysis.The histogram and principal component scatter plot that comprehensive analysis step generates, determine soil types to different samplings
The conspicuousness that density soils attribute space variability influences.Analyze as the result is shown: the spatial variability of the regional soil attribute is by soil
The influence of earth type is significant.
Third step uses the second order stationarity vacation of a variety of homing methods and Geographical Weighted Regression Model test training dataset
If
(1): optimal independent variable set is selected by stepwise regression method;
(2): models fitting being carried out to training dataset d1 using Geographically weighted regression procedure, records returning for all sampling points
Return coefficient;
(3): calculating the standard deviation of d1 Multivariable regressive analysis model;
(4): calculating the riding quality of all independents variable of d1 data set;
(5) (two)-(four) step is repeated, the riding quality of d1, d2, d3, d4 data set independent variable is calculated, and calculates it
Mean value.Its mean value is greater than 1, illustrates to be unsatisfactory for second order stationarity hypothesis.
4th step assumes test result to soil attribute spatial variability influence degree and second-order stationary according to soil types,
Select prediction model set
(1): using Geostatistics analysis method, construct the semivariable function model of all training datasets.All trained numbers
Be respectively less than 0.5 according to the determination coefficient of the optimal semivariable function model of collection, the space correlation degree of data is lower, be not suitable for using
Geo-statistic model, model choice set are combined into (multiple regression, Geographical Weighted Regression);
(2): in view of the research area the effective manganese content of soil by soil types influenced it is significant, be unsatisfactory for second order stationarity
It is assumed that model choice set is combined into (multiple regression, Geographical Weighted Regression, subregion Geographical Weighted Regression);
5th step is based on Monte-carlo Simulation Method, is trained, is selected for a post to training dataset using the model set of selection
The prediction of optimum prediction method is selected to generate destination file
(1): using the prediction technique of above-mentioned steps, the prediction model of training soil attribute and independent variable, and recording each
Computational accuracy (use root-mean-square error);
(2): repeating the above steps 1000 times, take computational accuracy mean value as the index of every kind of method of assessment.Poor verifying
The precision of prediction of multiple regression procedure is minimum as the result is shown, the precision of prediction highest of subregion Geographically weighted regression procedure;
(3): the uncertainty occurred for comparative analysis result, using it is worst with optimal prediction model chart it is defeated
Out.Using the argument data of non-sampled point in multiple regression procedure, subregion Geographical Weighted Regression Model and survey region, generate
Prediction data, and export, respectively as shown in Fig. 7 a, 7b.
Based on the above analysis, the prediction result of example case is as shown in Fig. 7 a, 7b, integration test second order stationarity of the present invention
Assuming that mechanism can integrate using a variety of existing test methods, by comparative analysis overall situation multiple linear regression with it is local weighted
The model coefficient of recurrence, detects optimal feature selection of the objective attribute target attribute on different spaces scale, lift scheme it is explanatory, really
Determine optimum prediction model, avoids blindly using prediction error brought by Individual forecast model.This method has good
Feasibility and stability, not only can be towards soil types merger, it is also contemplated that by vegetation pattern, land use pattern etc. pair
Pedogenic process has the variable of great influence, and precision test mechanism is expected to obtain ideal precision of prediction.The method
Up for being applied in more fields, to examine its performance.
Embodiments of the present invention are explained in detail above in conjunction with attached drawing, but the present invention is not limited to above-mentioned implementations
Mode within the knowledge of a person skilled in the art can also be without departing from the purpose of the present invention
It makes a variety of changes.
Claims (10)
1. the soil sulphur element content prediction method based on soil types merger and multiple regression, which comprises the steps of:
Step 1. obtain in soil region to be predicted respectively each sampled point respectively corresponding to specify each soil information index, with
And corresponding soil manganese content, and total data set is constructed, meanwhile, it is independent variable, soil that each soil information index is specified in definition
Manganese content is dependent variable;It is then based on different soils sampling density, validation data set and at least two instructions are constructed by total data set
Practice data set;
Step 2. is directed to each training dataset respectively, is based on merger soil types, obtains training dataset and is based on merger soil
The soil manganese content coefficient of variation data set of type level, and construct the soil that corresponding all training datas concentrate all collection points
Manganese content coefficient of variation histogram;Meanwhile construct all training datasets correspond respectively to soil type rank it is main at
Analysis scatter plot;It is then based on soil manganese content coefficient of variation histogram and principal component analysis scatter plot, determines soil types
The conspicuousness that different sampling soil attribute Spatial Variability is influenced;
Step 3. obtains the corresponding optimal independent variable set of each training dataset difference respectively, and obtains each training data
Collection respectively corresponds the riding quality of its all independent variable, and it is flat to judge whether model constructed by independent variable and dependent variable meets second order
Stability;
Step 4. assumes test result to soil attribute spatial variability influence degree and second-order stationary according to soil types, based on each
A training dataset selects prediction model set;
Step 5. is trained using prediction model set for one of training dataset, and optimum prediction model, needle are selected
Soil region to be predicted is predicted, soil region soil manganese content spatial distribution map to be predicted is obtained.
2. the soil sulphur element content prediction method based on soil types merger and multiple regression according to claim 1, feature
It is, the step 1 includes the following steps:
Step 1a. obtain each sampled point in soil region to be predicted projection coordinate collection Site, and obtain neighbouring sample point it
Between Euclidean distance, then update is adjusted for the projection coordinate of sampled point in soil region to be predicted, so that adjacent adopt
Euclidean distance between sampling point is not less than d, and d indicates that soil region to be predicted specifies the spatial discrimination of each soil information index
Rate;
Corresponding soil information index is made jointly respectively for step 1b. preset time, weather, matrix, landform, biological top 5 factor
To specify each soil information index, obtain in soil region to be predicted respectively each sampled point respectively corresponding to specify each soil
Earth information index constitutes Pred={ X as independent variable1,…,Xk,…,XK, each soil is specified in k={ 1 ..., K }, K expression
The species number of information index, vector XkIndicate each sampled point respectively correspond kth kind specify soil information index constituted to
Amount, is the vector of n × 1, and n indicates the sum of sampled point;
Step 1c. obtains the soil types rank that each sampled point difference is corresponding in soil region to be predicted, and presses each soil
The corresponding sampled point of type level difference, constitutes Soil_T=(Type1,…,Typem,…,TypeM), Soil_T is n × M
Vector, m={ 1 ..., M }, TypemIndicate collection point corresponding to m kind soil types rank, M indicates soil types rank
Type;
Step 1d. using soil manganese content as dependent variable, according to Pred, each sampled point position soil manganese content set S,
Site, Soil_T are constructed total data set Data=(Pred, S, Site, Soil_T);Meanwhile according to soil region to be predicted
Area obtains sampling density Density corresponding to sampled point;
Step 1e. is using sampled point as extracting object, by total data set Data, any extract accounts for whole sampled point preset ratio numbers
Data corresponding to each sampled point of amount, constitute validation data set, and remaining sampled point constitutes trained sampled point set to be selected;
Step 1f. is corresponded in trained sampled point set to be selected using sampled point as extracting object by extracting in total data set Data
Data corresponding to sampled point construct at least two training datasets, also, sampling corresponding to one of training dataset
Point is whole sampled points in trained sampled point set to be selected, and sampled point corresponding to remaining training dataset is that training to be selected is adopted
Fractional-sample point in sampling point set.
3. the soil sulphur element content prediction method based on soil types merger and multiple regression according to claim 2, feature
It is, in the step 1a, is handled as follows for all neighbouring sample points in soil region to be predicted, is realized for pre-
The adjustment for surveying sampled point projection coordinate in soil region updates, so that the Euclidean distance between neighbouring sample point is not less than d: if adopting
The Euclidean distance of sampling point p1 and p2 are less than d, and using sampled point p1 as the center of circle, d is that the number of sampling points in radius is n1, to adopt
Sampling point p2 is the center of circle, and d is that the number of sampling points in radius is n2, if n1 < n2, adjusts the projection coordinate of p1, makes to sample
Euclidean distance between point p1 and sampled point p2 is d+g;If n1 >=n2, the projection coordinate of p2 is adjusted, makes sampled point p1 and adopts
Euclidean distance between sampling point p2 is d+g;Wherein, d indicates that soil region to be predicted specifies the space of each soil information index
Resolution ratio, g indicate default adjustment distance.
4. the soil sulphur element content prediction method based on soil types merger and multiple regression according to claim 2, feature
It is, in the step 1b, is standardized using Z-score standardized method for Pred, makes its data fit standard just
State distribution.
5. the soil sulphur element content prediction method based on soil types merger and multiple regression according to claim 2, feature
It is, in the step 1f, using sampled point as extracting object, corresponds to trained sampled point to be selected by extracting in total data set Data
Data corresponding to sampled point in set construct four training datasets, wherein each training data concentrates vector line number difference
For r, 50% × r, 25% × r, 12% × r, wherein r indicates the number of collection point corresponding to trained sampled point set to be selected.
6. the soil sulphur element content prediction method based on soil types merger and multiple regression according to claim 2, feature
It is, the step 2 includes the following steps:
Step 2a. is directed to each training dataset respectively, according to the following formula, calculates separately and obtains corresponding to training dataset respectively
The coefficient of variation CV of soil manganese content under soil types rankm;
Then under each soil types rank as corresponding to the training dataset soil manganese content coefficient of variation CVm, construct the training number
According to soil sulphur element content coefficient of variation data set of the collection based on soil types rank, that is, obtains each training dataset and be based respectively on soil
The soil manganese content coefficient of variation data set CV_Soil_T of earth type level;Wherein, SDmWith MeanmFor m kind soil types grade
The standard deviation and average value of soil manganese content are not descended;
Step 2b. is directed to each training dataset respectively, for the soil number of each soil types rank corresponding to training dataset
According to, be respectively adopted Duncan method carry out multiple groups inter-sample difference significance analysis, obtain significant corresponding to the training dataset
Property analysis as a result, obtaining each training dataset corresponding Duncan analysis result Dun_S respectively;
Step 2c. is directed to each training dataset respectively, and the Duncan according to corresponding to training dataset analyzes result Dun_S,
And soil type rank corresponding to the training dataset, for each soil types corresponding to the training dataset
Merger processing is carried out, and calculates the variation lines for updating soil manganese content under each soil types rank corresponding to the training dataset
Number, constructs soil sulphur element content coefficient of variation data set of the training dataset based on merger soil types rank, that is, obtains each
Training dataset is based respectively on the soil sulphur element content coefficient of variation data set CV_Soil_T_Dun of merger soil types rank;
Step 2d. is based respectively on the soil sulphur element content coefficient of variation data of merger soil types rank according to each training dataset
Collect CV_Soil_T_Dun, the corresponding all training datas of building concentrate the soil manganese content coefficient of variation histogram of all collection points;
Step 2e. uses Principal Component Analysis, concentrates each collection point affiliated soil types rank respectively according to each training data,
For all training datasets, building corresponds respectively to the principal component analysis scatter plot of soil type rank;
Step 2f. is based on soil manganese content coefficient of variation histogram and principal component analysis scatter plot, determines soil types to difference
The conspicuousness that sampling density soil attribute Spatial Variability influences.
7. the soil sulphur element content prediction method based on soil types merger and multiple regression according to claim 6, feature
It is, the step 3 includes the following steps:
Step 3a. obtains optimal independent variable set Pred_ corresponding to each training dataset by stepwise regression method respectively
OLS;
Step 3b. is directed to each training dataset respectively, the optimal independent variable set Pred_ according to corresponding to training dataset
OLS carries out models fitting to training dataset using Geographically weighted regression procedure, obtains collection point corresponding to the training dataset
Regression coefficient collection Coeff;
Step 3c. is directed to each training dataset respectively, according to optimal corresponding to training dataset and the training dataset
Independent variable set Pred_OLS calculates the standard deviation for obtaining Multivariable regressive analysis model corresponding to the training dataset, that is, obtains
The standard deviation STD of each corresponding Multivariable regressive analysis model of training dataset differenceMLR;
Step 3d. is directed to each training dataset respectively, according to the following formula:
It calculates and obtains the riding quality SI that training dataset corresponds to all independents variable, that is, it is right respectively to obtain each training dataset
Answer the riding quality SI of its all independent variable, in formula, INTQGWRIndicate collection point regression coefficient corresponding to training dataset
Interquartile-range IQR;
Step 3e., which is calculated, obtains the mean value that each training dataset respectively corresponds its all independent variable riding quality SI
Average_SI;
Compared with 1, it is flat to judge whether model constructed by independent variable and dependent variable meets second order according to Average_SI by step 3f.
Stability.
8. the soil sulphur element content prediction method based on soil types merger and multiple regression according to claim 7, feature
It is, the step 4 includes the following steps:
Step 4a. is directed to each training dataset respectively, and using Geostatistics analysis method, building obtains the optimal of training dataset
Semivariable function model, and calculate its space effective distance and determine coefficient with model;
If the model of the optimal semivariable function model of all training datasets of step 4b. determines that coefficient is all larger than 0.5, and block gold number
It is respectively less than 25% with the ratio of base station value, then all training datasets use geo-statistic model, and enter step 4c;Otherwise own
Training dataset does not use geo-statistic model, and enters step 4d;
Step 4c. is according to step 2, the conclusion of step 3 and step 4b, i.e., soil types is to soil attribute spatial variability influence degree
Test result is assumed with second-order stationary, by following selection prediction model:
If soil types soil attribute spatial variability is influenced it is not significant and meet second order stationarity it is assumed that if select polynary time
Return and constitutes prediction model set with geo-statistic;
If soil types soil attribute spatial variability is influenced it is not significant, be unsatisfactory for second order stationarity it is assumed that if select polynary time
Return and constitutes prediction model set with Geographical Weighted Regression;
If soil types soil attribute spatial variability is influenced it is significant, meet second order stationarity it is assumed that if select multiple regression and
Subregion geo-statistic model constitutes prediction model set;
If soil types soil attribute spatial variability is influenced it is significant, be unsatisfactory for second order stationarity it is assumed that if select multiple regression,
Geographical Weighted Regression and subregion Geographical Weighted Regression constitute prediction model set;
Step 4d. influences journey to soil attribute spatial variability according to the conclusion namely soil types of step 2, step 3 and step 4b
Degree assumes test result with second-order stationary, by following selection prediction model:
If soil types soil attribute spatial variability is influenced it is not significant, meet second order stationarity it is assumed that if select multiple regression,
Regression-kriging constitutes prediction model set;
If soil types soil attribute spatial variability is influenced it is not significant, be unsatisfactory for second order stationarity it is assumed that if select polynary time
Return and constitutes prediction model set with Geographical Weighted Regression;
If soil types soil attribute spatial variability is influenced it is significant, meet second order stationarity it is assumed that if select multiple regression structure
At prediction model set;
If soil types soil attribute spatial variability is influenced it is significant, be unsatisfactory for second order stationarity it is assumed that if select multiple regression,
Geographical Weighted Regression and subregion Geographical Weighted Regression constitute prediction model set.
9. the soil sulphur element content prediction method based on soil types merger and multiple regression according to claim 8, feature
It is, the step 5 includes the following steps:
Step 5a. is the training dataset of sampled points whole in trained sampled point set to be selected based on sampled point, for step 4 institute
The model obtained in prediction model set is trained, and obtains trained prediction model set;
Step 5b. concentrates the soil of each collection point using the model in trained prediction model set for verify data
Manganese content is predicted, and according to validation data set, calculates the root-mean-square error for obtaining model prediction in prediction model set;
Step 5c. repeats step 5b preset times, calculates the root-mean-square error mean value of each cross validation, and select lowest mean square
Model corresponding to root error mean constitutes optimum prediction model set as optimum prediction model;
Step 5d. according to step 3a obtain each training dataset corresponding to optimal independent variable set Pred_OLS, obtain to
Predict argument data corresponding to non-acquired point in soil region;
Step 5e. is using obtained optimum prediction model set and the be obtained from variable data of step 5d in step 5c, for pre-
The soil manganese content for surveying non-acquired point in soil region is predicted, realizes that the soil manganese content for being directed to soil region to be predicted is pre-
It surveys, obtains soil region soil manganese content spatial distribution map to be predicted.
10. the soil sulphur element content prediction method based on soil types merger and multiple regression according to claim 9, feature
It is, in the step 5a, if uniting in the obtained prediction model set of the step 4 including subregion Geographical Weighted Regression or subregion
Count model, then as follows for the training method of subregion Geographical Weighted Regression:
Using the region of the soil types of merger as individual survey region, added by the subregion geography as follows based on soil types
Weighted regression model:
Carry out the training of partial model, wherein (ui,vi) be sampling point i coordinate, β0(ui,vi), βkm(ui,vi) and εiIt respectively represents
Constant term, local regression coefficient and prediction deviation in local regression, m indicate that soil types grade variable, M indicate soil types
Rank sum, xikmIndicate that the corresponding kth kind of sampling point i specifies soil information index under m kind soil types rank.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710099018.5A CN106980603B (en) | 2017-02-23 | 2017-02-23 | Soil sulphur element content prediction method based on soil types merger and multiple regression |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710099018.5A CN106980603B (en) | 2017-02-23 | 2017-02-23 | Soil sulphur element content prediction method based on soil types merger and multiple regression |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106980603A CN106980603A (en) | 2017-07-25 |
CN106980603B true CN106980603B (en) | 2019-05-17 |
Family
ID=59339202
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710099018.5A Active CN106980603B (en) | 2017-02-23 | 2017-02-23 | Soil sulphur element content prediction method based on soil types merger and multiple regression |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106980603B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107608938B (en) * | 2017-08-08 | 2020-12-08 | 安徽师范大学 | Factor screening method for binary classification based on enhanced regression tree algorithm |
CN107766295A (en) * | 2017-09-05 | 2018-03-06 | 湖北大学 | A kind of method and system that soil nutrients variability data are carried out with special datum processing |
CN109711652A (en) * | 2017-10-26 | 2019-05-03 | 厦门一品威客网络科技股份有限公司 | A kind of Chuan Ke team potential methods of marking |
CN107944387B (en) * | 2017-11-22 | 2021-12-17 | 重庆邮电大学 | Method for analyzing spatial heterogeneity of urban heat island based on semi-variation theory |
CN109063895A (en) * | 2018-06-27 | 2018-12-21 | 李林 | Based on beneficial organism content prediction method in soil types merger and soil |
CN111540418A (en) * | 2019-11-14 | 2020-08-14 | 中国科学院地理科学与资源研究所 | Method and system for predicting probability value of excessive arsenic in plant |
CN111508569B (en) * | 2020-03-19 | 2023-05-09 | 中国科学院南京土壤研究所 | Target soil property content prediction method based on soil transfer function |
CN112382349B (en) * | 2020-11-10 | 2022-03-25 | 中国石油大学(华东) | Method for judging origin of basalt from EM I type or EM II type mantle |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104408258B (en) * | 2014-12-01 | 2017-12-15 | 四川农业大学 | The large scale soil organic matter spatial distribution analogy method of integrated environment factor |
CN104764868B (en) * | 2015-04-02 | 2016-07-06 | 中国科学院南京土壤研究所 | A kind of soil organic matter Forecasting Methodology based on Geographical Weighted Regression |
CN105911037B (en) * | 2016-04-19 | 2018-05-15 | 湖南科技大学 | Manganese and association heavy metal distribution Forecasting Methodology in the soil water termination contaminated stream of Manganese Ore District |
-
2017
- 2017-02-23 CN CN201710099018.5A patent/CN106980603B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN106980603A (en) | 2017-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106980603B (en) | Soil sulphur element content prediction method based on soil types merger and multiple regression | |
CN111508569B (en) | Target soil property content prediction method based on soil transfer function | |
Bouma | Using soil survey data for quantitative land evaluation | |
CN103106347B (en) | A kind of agricultural area source phosphorus based on soil attribute space distribution pollutes evaluation method | |
Burrough et al. | The state of the art in pedometrics | |
CN105699624B (en) | A kind of Soil Carbon Stock evaluation method based on soil genetic horizon thickness prediction | |
US10830750B2 (en) | Functional soil maps | |
CN107103378B (en) | Corn planting environment test site layout method and system | |
CN108446715A (en) | A kind of heavy metal pollution of soil Source Apportionment, system and device | |
CN106980750B (en) | A kind of Soil Nitrogen estimation method of reserve based on typical correspondence analysis | |
CN104750884A (en) | Quantitative evaluation method of shale oil and gas enrichment index on the basis of multi-factor nonlinear regression | |
Della Chiesa et al. | Farmers as data sources: Cooperative framework for mapping soil properties for permanent crops in South Tyrol (Northern Italy) | |
CN113360587B (en) | Land surveying and mapping equipment and method based on GIS technology | |
Jianchang et al. | Validation of an agricultural non-point source (AGNPS) pollution model for a catchment in the Jiulong River watershed, China | |
CN108764527A (en) | A kind of Soil organic carbon pool space-time dynamic prediction suitable environment Variable Selection method | |
CN106528788A (en) | Method for analyzing space distribution feature of ground rainfall runoff pollution based on GIS (Geographic Information System) technology | |
CN114169161A (en) | Method and system for estimating space-time variation and carbon sequestration potential of soil organic carbon | |
CN114186491A (en) | Fine particulate matter concentration space-time characteristic distribution method based on improved LUR model | |
CN113901348A (en) | Oncomelania snail distribution influence factor identification and prediction method based on mathematical model | |
CN109063895A (en) | Based on beneficial organism content prediction method in soil types merger and soil | |
CN105891442A (en) | Soil texture particle content predicting method | |
CN107063210A (en) | The mapping method of project is renovated in a kind of reallocation of land | |
CN114254802B (en) | Prediction method for vegetation coverage space-time change under climate change drive | |
KR20050063615A (en) | Method for providing surface roughness in geographic information system | |
CN111401683B (en) | Method and device for measuring tradition of ancient villages |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |