CN110531054B - Soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling - Google Patents
Soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling Download PDFInfo
- Publication number
- CN110531054B CN110531054B CN201910931442.0A CN201910931442A CN110531054B CN 110531054 B CN110531054 B CN 110531054B CN 201910931442 A CN201910931442 A CN 201910931442A CN 110531054 B CN110531054 B CN 110531054B
- Authority
- CN
- China
- Prior art keywords
- soil
- sample
- organic carbon
- sampling
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000002689 soil Substances 0.000 title claims abstract description 138
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 title claims abstract description 95
- 229910052799 carbon Inorganic materials 0.000 title claims abstract description 94
- 238000005070 sampling Methods 0.000 title claims abstract description 70
- 238000000034 method Methods 0.000 title claims abstract description 69
- 239000011159 matrix material Substances 0.000 claims abstract description 28
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 238000005516 engineering process Methods 0.000 claims abstract description 6
- 239000000523 sample Substances 0.000 claims description 97
- 239000013598 vector Substances 0.000 claims description 33
- 238000001228 spectrum Methods 0.000 claims description 19
- 238000005259 measurement Methods 0.000 claims description 15
- 238000009499 grossing Methods 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 10
- 230000003595 spectral effect Effects 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000005315 distribution function Methods 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 4
- 235000005824 Zea mays ssp. parviglumis Nutrition 0.000 claims description 3
- 235000002017 Zea mays subsp mays Nutrition 0.000 claims description 3
- 235000005822 corn Nutrition 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 239000002344 surface layer Substances 0.000 claims description 3
- 240000008042 Zea mays Species 0.000 claims 1
- 238000003306 harvesting Methods 0.000 claims 1
- 238000010238 partial least squares regression Methods 0.000 abstract description 3
- 238000012952 Resampling Methods 0.000 abstract 1
- 238000011156 evaluation Methods 0.000 description 6
- 238000011160 research Methods 0.000 description 6
- 238000013507 mapping Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 241000209149 Zea Species 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000003292 glue Substances 0.000 description 2
- 239000010410 layer Substances 0.000 description 2
- 238000011002 quantification Methods 0.000 description 2
- 238000004445 quantitative analysis Methods 0.000 description 2
- 238000013076 uncertainty analysis Methods 0.000 description 2
- 238000000342 Monte Carlo simulation Methods 0.000 description 1
- 238000004177 carbon cycle Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000013433 optimization analysis Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N1/00—Sampling; Preparing specimens for investigation
- G01N1/28—Preparing specimens for investigation including physical details of (bio-)chemical methods covered elsewhere, e.g. G01N33/50, C12Q
- G01N1/286—Preparing specimens for investigation including physical details of (bio-)chemical methods covered elsewhere, e.g. G01N33/50, C12Q involving mechanical work, e.g. chopping, disintegrating, compacting, homogenising
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N1/00—Sampling; Preparing specimens for investigation
- G01N1/28—Preparing specimens for investigation including physical details of (bio-)chemical methods covered elsewhere, e.g. G01N33/50, C12Q
- G01N1/34—Purifying; Cleaning
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
- G01N21/31—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
- G01N21/31—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
- G01N21/35—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
- G01N21/359—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/24—Earth materials
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N1/00—Sampling; Preparing specimens for investigation
- G01N1/28—Preparing specimens for investigation including physical details of (bio-)chemical methods covered elsewhere, e.g. G01N33/50, C12Q
- G01N1/286—Preparing specimens for investigation including physical details of (bio-)chemical methods covered elsewhere, e.g. G01N33/50, C12Q involving mechanical work, e.g. chopping, disintegrating, compacting, homogenising
- G01N2001/2866—Grinding or homogeneising
Abstract
The invention provides a soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling, which comprises the following steps: collecting and preprocessing a soil sample, acquiring organic carbon content data and soil hyperspectral data of the soil sample and preprocessing the data; establishing a soil organic carbon prediction model of the organic carbon content data and the soil hyperspectral data of the soil sample by adopting a partial least squares regression method; carrying out replacement random sampling on original measured sample data, obtaining a sub-sample once per sampling, and calculating to obtain an estimated value of each sub-sample parameter by constructing a measured value and a predicted value matrix; a Bootstrap resampling technology is adopted to extract a certain amount of samples from original samples in a returning way; extracting parameters of a sample according to a Bootstrap sampling method to evaluate the uncertainty of the soil organic carbon prediction model; and (6) evaluating the accuracy of the model. The method reduces the problem of low accuracy of the prediction model caused by sampling representativeness and spatial variability of the prediction model.
Description
Technical Field
The invention relates to the technical field of uncertainty statistical analysis of soil organic carbon prediction models, in particular to a soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling.
Background
The soil organic carbon is a main index for balancing the global carbon cycle and maintaining the soil quality, but due to the influence of natural factors and human factors, uncertainty exists in the content and the spatial distribution of the soil organic carbon, errors are caused for the evaluation of the soil organic carbon content, and the errors caused by the uncertainty influence the accurate prediction and the drawing precision of the soil organic carbon.
The Vis-NIR technology provides a rapid and accurate near-earth remote sensing estimation method for soil organic carbon, the whole process from data acquisition to predictive modeling is quite simple, and the cost of laboratory analysis is saved. Meanwhile, the advantages of the interdiscipline are utilized, and the research of prediction, mapping and spatial variation of soil organic carbon is carried out by establishing a model by combining with the modern analysis technology, so that a satisfactory result is obtained (Lishuo, 2010; Kuang and Mouazen, 2013; Camboule et al, 2014; Sithole et al, 2018; Zhou et al, 2019). But not to be neglected, some documents mention the problem of uncertainty analysis (Simbahan et al, 2006; beam two, 2007; Helioscopy, 2012) or involve quantitative analysis studies of uncertainty of uncertain analysis content but lack the uncertainty of organic carbon prediction models in the discussion or the prospect section. In the whole process of soil organic carbon prediction, due to the sampling representativeness, the prediction model (training sample and model parameter) method and the spatial variability of the soil organic carbon natural attribute, the accuracy of the prediction mapping result is reduced, and the actual purpose is difficult to achieve.
At the end of the 20 th century, with the development of uncertainty theory, pervasive likelihood uncertainty estimation methods (GLUE, Beven and Binley, 1992), Markov chain-Monte Carlo methods (Kuczera and Parent, 1998), BaRE (Thiemann et al, 2001), Bootstrap methods (Pan and Politis, 2016), etc. have become more and more widely used in the uncertainty evaluation of models. Such as Linqing (2011), Muleta, etc. ((2013) Sellami et al (2013), Tian et al (2014) adopt a GLUE method to analyze the parameter uncertainty of the water and solute transport model; pilot and present (2018),And (2019) vehicle insurance rates and positions of soil parameter sensors are selected by adopting a Bootstrap method for optimization analysis, and the research achieves good effects, but the methods are not seen in uncertainty analysis research of soil organic carbon prediction models. Bootstrap is a nonparametric random inspection method, and is also an important method for researching uncertainty in recent years because sampling of Bootstrap is not influenced by parameter distribution hypothesis. In soil organic carbon predictive modeling, foreignEt al (2013), Hoffmann et al (2014) indicate that when we use Vis-NIR for predictive mapping of soil organic carbon, uncertainty comes mainly from modeling samples and the propagation process from prediction to mapping. According to the invention, a Bootstrap method is introduced to a soil organic carbon prediction model around a modeling sample to carry out uncertainty evaluation.
Disclosure of Invention
Aiming at the technical problem that the uncertainty of the existing soil organic carbon prediction model is lack of quantitative analysis, the invention provides a soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling, which introduces the Bootstrap parameter estimation method in the traditional statistics into the soil science, realizes the uncertainty estimation of the soil organic carbon prediction model, and reduces the problem of low accuracy of the prediction model caused by sampling representativeness and space variability of the prediction model (training samples and model parameters).
In order to achieve the purpose, the technical scheme of the invention is realized as follows: a soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling comprises the following steps:
the method comprises the following steps: collecting and pretreating a soil sample: after soil is leveled, a surface soil sample is obtained by adopting a 5-point mixing sampling method, and is naturally air-dried, ground and sieved;
step two: acquiring organic carbon content data of the soil sample in the first step: measuring the organic carbon content of the soil sample in the step one by adopting a total organic carbon analyzer, and storing the organic carbon content in a computer in a file form;
step three: acquiring soil hyperspectral data of the soil sample in the step one and preprocessing the soil hyperspectral data;
step four: constructing a prediction model: establishing a soil organic carbon prediction model of the organic carbon content data of the soil sample obtained in the step two and the soil hyperspectral data preprocessed in the step three by adopting a partial least square regression method, constructing an actual measured value and a predicted value of the obtained soil organic carbon into a matrix vector, and calculating to obtain a matrix vector parameter;
step five: the Bootstrap sampling method idea is as follows: suppose sample X ═ X1,X2,……,Xn) From the population of the distribution F, R (X, F) is a known function of the sample X and the distribution F, and it is now known that the measured value X of a sample is (X ═ X)1,x2,……xn) The uncertainty distribution properties of the function R (X, F) are specifically analyzed by the measured values X of the sample; firstly, carrying out replacement random sampling on original actual measurement sample data, obtaining a sub-sample once per sampling, wherein the size of the actual measurement sample is n, and the probability that each original sample data can be extracted is 1/n; after Bootstrap samples are determined, the estimated value of each subsample parameter is calculated by constructing an actual measured value and a predicted value matrixEstimates of these subsamplesForm a row vector
Step six: bootstrap sampling: sampling a certain number of samples from original samples by adopting a Bootstrap sampling technology, repeating the sampling process, and recording the sampling times q;
step seven: sample parameters extracted according to Bootstrap sampling methodEstimated value of q-th timeLine vector Calculating to obtain sample parametersMean value ofSum varianceEvaluating the uncertainty of the soil organic carbon prediction model;
step eight: and (3) evaluating the model precision: using a determining coefficient R2And evaluating the soil organic carbon prediction model established in the fourth step by the ratio RPD of the root mean square error RMSE and the standard prediction error to determine the robustness of the soil organic carbon prediction model.
In the first step, the soil samples are soil with a soil layer depth of 0-0.20 m from the surface layer of the ground, and the soil samples are collected for 42 times after the 2018 corn is harvested; the 5-point mixed sampling method comprises the following steps: selecting an area of 1 square meter, firstly determining the middle point of two diagonal lines of the area as a first central sampling point, then selecting four points with equal distance with the central sampling point on the diagonal lines as second to fifth sampling points, and then mixing samples obtained by the five sampling points to obtain a soil sample.
In the third step, a high-density reflection probe of an ASD spectrometer is adopted to obtain surface spectrum reflection information of the soil sample in the first step, 3 points are randomly selected for each soil sample to be measured, 10 spectrums are measured at each measuring point, and the arithmetic mean value of 30 spectrums is taken as actual reflection spectrum data; and then removing the edge wave band with larger noise, and preprocessing by adopting derivative transformation and a Savitzky-Golay smoothing method.
The spectral band range of the ASD spectrometer is 400 nm-2500 nm, and the spectral band comprises visible light and near infrared bands; the obtained actual reflection spectrum data is obtained by deleting 350-399 nm and 2451-2500 nm edge wave bands with large noise and reserving 400-2450 nm reflection spectrum; the derivative transform is a first order differential transform; the Savitzky-Golay smoothing is implemented by continuously performing m data points at a certain p position in hyperspectral data to be processed, selecting a fitting order D to perform least square fitting, taking the value of a curve obtained by fitting at the center point of a data window as a smoothed hyperspectral value, moving the window, and repeating the processes, so that all hyperspectral data are processed, wherein the processing formula is as follows:j=0,1,……m;
wherein x isp,smoothRepresenting the value, x, of the p position after hyperspectral smoothingk+jRepresents a high spectral value corresponding to the k + j position, ajRepresents the smoothing coefficient of the j-th position, and k represents the k-th position.
The method for establishing the soil organic carbon prediction model by adopting the partial least square regression method in the fourth step comprises the following steps: substituting the hyperspectral data into a soil organic carbon prediction model to obtain a soil organic carbon prediction value, wherein the expression form of the matrix vector isMatrix vector parametersIs a correlation coefficient, x, of two column vectorsi1The measured value is represented by the measured value,indicates the predicted value, and n indicates the number of samples.
The overall distribution F in the fifth step corresponds to the matrix vector parameters in the fourth stepAnd isFn(x) Is an empirical distribution function of the measured sample X and is:
the obtained actually measured sample x1,x2,……xnArranged in order from small to large, x1<x2<…xn,x(l)The frequency of occurrence is nl,l=1,2,…,r,n1+n2+...nrN; x denotes the measured sample, x(l)Measured sample representing the l-th position, x(l+1)Measured sample representing the l +1 th position, x(r)And l represents the l-th position after sorting according to the size of the measured sample, and r represents a natural number smaller than n.
Sample parameters of the Bootstrap sampleMean value ofSum varianceThe calculation method of (2) is as follows:
mean valueSum varianceFor estimating matrix vector parametersThe degree of deviation of; matrix vector parametersThe coefficient of variation CV is:
determining the coefficient R in the step eight2The calculation method of the ratio RPD of the root mean square error RMSE to the standard prediction error is as follows:
wherein x isiAndrespectively are an actual measured value and a predicted value of the organic carbon in the soil,the average value of soil organic carbon samples is shown, and n is the number of the samples; and determines the coefficient R2The larger the root mean square error RMSE is, the smaller the root mean square error RMSE is, the larger the ratio RPD of the standard prediction error is, and the better the soil organic carbon prediction model is.
The invention has the beneficial effects that: the method is used for solving the uncertainty estimation and quantification of soil carbon organic carbon prediction, in particular the uncertainty estimation and quantification problem when soil organic carbon modeling prediction is carried out based on a small sample; after soil spectrum data acquired by ASD hyperspectral measurement is preprocessed, a prediction model of soil hyperspectral data and actually measured soil organic carbon is established by adopting a partial least squares regression method; sampling the collected sample data with a Bootstrap sampling method, and calculating the average value, standard error and 95% confidence interval of the soil organic carbon prediction model parameters, thereby carrying out uncertainty evaluation on the soil organic carbon prediction model. According to the invention, the Bootstrap parameter estimation method in statistics is applied to uncertainty evaluation of the soil organic carbon prediction model, so that the problem of low accuracy of the prediction model caused by sampling representativeness and spatial variability of the prediction model is solved, and research results are beneficial to exploration of the difficulty in accurately estimating national scale and global soil organic carbon reserves.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a schematic diagram of the boottrap sampling method in fig. 1.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, a soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling comprises the following steps:
the method comprises the following steps: collecting and pretreating a soil sample: after soil is leveled, a surface soil sample is obtained by adopting a 5-point mixing sampling method, and the surface soil sample is naturally air-dried, ground and sieved for later use.
The soil samples are soil with the soil layer depth of 0-0.20 m from the surface layer of the ground, and the soil samples are collected in 42 times after the corn is harvested in 2018. The concrete implementation method of the 5-point mixed sampling method is as follows: selecting a region of 1 square meter, firstly determining the middle point of two diagonal lines of the region as a first central sampling point, and then selecting four points which are equidistant from the central sampling point on the diagonal lines as second to fifth sampling points. The samples obtained at the five sampling points were then mixed to form a soil sample.
Step two: acquiring organic carbon content data of the soil sample in the first step: and (3) measuring the organic carbon content of the soil sample in the step one by adopting a TOC-L CPH total organic carbon analyzer, and storing the organic carbon content in a computer in a file form. TOC-L CPH Total organic carbon Analyzer was manufactured by Shimadzu corporation of Japan.
Step three: acquiring soil hyperspectral data of the soil sample in the first step and preprocessing the soil hyperspectral data: acquiring surface spectrum reflection information of the soil sample in the step one by adopting a high-density reflection probe of an ASD spectrometer, randomly selecting 3 points for measurement of each soil sample, measuring 10 spectra at each measurement point, and taking the arithmetic average value of 30 spectra as actual reflection spectrum data; then removing the edge wave band with larger noise, and adopting derivative transformation and S-G smoothing method to make pretreatment.
The ASD spectrometer is an ASD field Spec Pro FR type spectrometer (Analytical Spectral Devices, Boulder, CO, USA), has a Spectral band range of 400 nm-2500 nm, and comprises visible light and near infrared bands. Other similar types of spectrometers may also be used. The obtained actual reflection spectrum data is obtained by deleting 350-399 nm and 2451-2500 nm edge wave bands with large noise and reserving 400-2450 nm reflection spectrum. In order to reduce the influence of the research data on the difference of optical environment and the grinding and screening of samples, a method combining Savitzky-Golay smoothing of a 2-order 11-window and first-order differential transformation is adopted for preprocessing.
Savitzky-Golay smoothing is to continuously process m data points (window widths) at a certain p position in hyperspectral data to be processed, select a certain fitting order D to carry out least square fitting, take the value of a curve obtained by fitting at the center point of a data window as a smoothed hyperspectral value, then move the window and repeat the process, thereby realizing the processing of all hyperspectral data. The specific formula is as follows:
wherein x isp,smoothRepresenting the value, x, of the p position after hyperspectral smoothingk+jRepresents a high spectral value corresponding to the k + j position, ajRepresents the smoothing coefficient of the j-th position, and k represents the k-th position.
Step four: constructing a prediction model: establishing a soil organic carbon prediction model of the organic carbon content data of the soil sample obtained in the step two and the soil hyperspectral data preprocessed in the step three by adopting a partial least square regression method, and obtaining an actually measured value x of the soil organic carboniAnd the predicted valueIs constructed into a matrix vector A, and matrix vector parameters are obtained by calculation
The method for establishing the soil organic carbon prediction model by adopting the partial least squares regression method comprises the following steps: substituting the hyperspectral data into a soil organic carbon prediction model to obtain a soil organic carbon prediction value, wherein the expression form of the matrix vector isMatrix vector parametersIs a correlation coefficient, x, of two column vectorsi1、n represents the measured value, the predicted value and the number of samples, respectively.
Step five: the Bootstrap sampling method idea is as follows: suppose sample X ═ X1,X2,……,Xn) From the population of the distribution F, R (X, F) is a known function of the sample X and the distribution F, and it is now known that the measured value X of a sample is (X ═ X)1,x2,……xn) The uncertainty distribution properties of the function R (X, F) are specifically analyzed by the measured values X of the sample. Firstly, the original actual measurement sample data is sampled randomly with the sample being put back, and each sampling is recorded as a sub-sample. Assuming that the size of the measured sample is n, the probability that each original sample data can be extracted is 1/n. After Bootstrap samples are determined, the estimated value of each subsample parameter can be calculated by constructing a matrix vector A of the measured value and the predicted valueEstimates of these subsamplesForm a row vector
As shown in FIG. 2, the population distribution F corresponds to the direction of the matrix in step fourQuantity parameter Fn(x) Is an empirical distribution function of the measured sample X. The empirical distribution function is defined as follows:
the obtained actually measured sample x1,x2,……xnArranged in order from small to large, x1<x2<…xnWherein x is(l)The frequency of occurrence is nl,l=1,2,…,r,n1+n2+...nrN, Fn(x) Sample distribution function or empirical distribution function for the population X:
the obtained actually measured sample x1,x2,……xnArranged in order from small to large, x1<x2<…xn,x(l)The frequency of occurrence is nl1, 2, …, r; wherein, x and x(l)、x(l+1)、x(r)And l and r respectively represent the actual measurement sample, the l position actual measurement sample, the l +1 position actual measurement sample, the r position actual measurement sample, the l position and the natural number which are sequenced according to the size of the actual measurement sample.
Step six: bootstrap sampling: the Bootstrap sampling technology is adopted to extract a certain number of samples from original samples, the sampling process is repeated, and the sampling frequency q is recorded, wherein in the test, q is 50.
Step seven: estimation values of q times of sample parameter matrix vectors extracted according to Bootstrap sampling method(Vector) Thereby calculating the obtained parametersMean value ofSum varianceAnd evaluating the uncertainty of the soil organic carbon prediction model.
Sample parameters of the Bootstrap sampleMean value ofSum varianceThe calculation method of (2) is as follows:
calculating sample parameters obtained by Bootstrap sampling for 50 times by using formulas (3) and (4)The value range is as follows: 0.7116-1.0958. Mean valueSum varianceAre respectively 0.942 and 0.0080, for estimating matrix vector parametersThe degree of deviation of (a). Sample-specific matrix vector parametersThe probability map is shown in FIG. 3. from FIG. 3, it can be seen that 95% of the confidence interval ranges are shown, where the confidence intervals represent matrix vector parametersThe true value of (a) has a certain probability of falling around the measurement result, which gives a confidence level of the measured value of the measured parameter, i.e. a "probability". Therefore, the uncertainty of the soil organic carbon prediction model can be estimated through Bootstrap sampling, and the parameters areThe coefficient of variation, CV, of 9.48% all sample parameters fell within the 95% confidence interval.
Step eight: and (3) evaluating the model precision: using a determining coefficient R2And evaluating the soil organic carbon prediction model established in the fourth step by the ratio RPD of the root mean square error RMSE and the standard prediction error to determine the robustness of the soil organic carbon prediction model.
The determination coefficient R2The calculation method of the ratio RPD of the root mean square error RMSE to the standard prediction error is as follows:
wherein x isiAndrespectively are an actual measured value and a predicted value of the organic carbon in the soil,the average value of soil organic carbon samples is shown, and n is the number of the samples; and determines the coefficient R2The larger the root mean square error RMSE is, the smaller the root mean square error RMSE is, the larger the ratio RPD of the standard prediction error is, and the better the soil organic carbon prediction model is.
Evaluating the established soil organic carbon prediction model by adopting formulas (6), (7) and (8), and adopting a coefficient of determination R of the hyperspectral prediction model20.960, the ratio of the root mean square error RMSE to the standard prediction error, RPD, is 1.44 and 4.87, respectively. Ratio RPD>And 2, the soil organic carbon prediction model has excellent prediction capability and is robust.
According to the invention, the Bootstrap parameter estimation method in statistics is applied to uncertainty evaluation of the soil organic carbon prediction model, so that the problem of low accuracy of the prediction model caused by sampling representativeness and spatial variability of the prediction model (training samples and model parameters) is solved, and research results are beneficial to exploration of the difficulty in accurately estimating national scale and global soil organic carbon reserves.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (6)
1. A soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling is characterized by comprising the following steps:
the method comprises the following steps: collecting and pretreating a soil sample: after soil is leveled, a surface soil sample is obtained by adopting a 5-point mixing sampling method, and is naturally air-dried, ground and sieved;
step two: acquiring organic carbon content data of the soil sample in the first step: measuring the organic carbon content of the soil sample in the step one by adopting a total organic carbon analyzer, and storing the organic carbon content in a computer in a file form;
step three: acquiring soil hyperspectral data of the soil sample in the step one and preprocessing the soil hyperspectral data;
step four: constructing a prediction model: establishing a soil organic carbon prediction model of the organic carbon content data of the soil sample obtained in the step two and the soil hyperspectral data preprocessed in the step three by adopting a partial least square regression method, constructing an actual measured value and a predicted value of the obtained soil organic carbon into a matrix vector, and calculating to obtain a matrix vector parameter;
the method for establishing the soil organic carbon prediction model by adopting the partial least square regression method in the fourth step comprises the following steps: substituting the hyperspectral data into a soil organic carbon prediction model to obtain a soil organic carbon prediction value, wherein the expression form of the matrix vector isMatrix vector parametersIs a correlation coefficient, x, of two column vectorsi1The measured value is represented by the measured value,representing the predicted value, n representing the number of samples;
step five: the Bootstrap sampling method idea is as follows: suppose sample X ═ X1,X2,……,Xn) From the population of the distribution F, R (X, F) is a known function of the sample X and the distribution F, and it is now known that the measured value X of a sample is (X ═ X)1,x2,……xn) The uncertainty distribution properties of the function R (X, F) are specifically analyzed by the measured values X of the sample; firstly, carrying out replacement random sampling on original actual measurement sample data, obtaining a sub-sample once per sampling, wherein the size of the actual measurement sample is n, and the probability that each original sample data can be extracted is 1/n; after Bootstrap samples are determined, the estimated value of each subsample parameter is calculated by constructing an actual measured value and a predicted value matrixEstimates of these subsamplesForm a row vector
Step six: bootstrap sampling: sampling a certain number of samples from original samples by adopting a Bootstrap sampling technology, repeating the sampling process, and recording the sampling times q;
step seven: sample parameters extracted according to Bootstrap sampling methodEstimated value of q-th timeLine vector Calculating to obtain sample parametersMean value ofSum varianceEvaluating the uncertainty of the soil organic carbon prediction model;
sample parameters of the Bootstrap sampleMean value ofSum varianceThe calculation method of (2) is as follows:
mean valueSum varianceFor estimating matrix vector parametersThe degree of deviation of; matrix vector parametersThe coefficient of variation CV is:
step eight: and (3) evaluating the model precision: using a determining coefficient R2And evaluating the soil organic carbon prediction model established in the fourth step by the ratio RPD of the root mean square error RMSE and the standard prediction error to determine the robustness of the soil organic carbon prediction model.
2. The soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling according to claim 1, characterized in that in the first step, the soil sample is soil with a soil depth of 0-0.20 m from the surface layer of the ground, and the soil sample collection time is 42 soil samples collected after 2018 years of corn harvesting; the 5-point mixed sampling method comprises the following steps: selecting an area of 1 square meter, firstly determining the middle point of two diagonal lines of the area as a first central sampling point, then selecting four points with equal distance with the central sampling point on the diagonal lines as second to fifth sampling points, and then mixing samples obtained by the five sampling points to obtain a soil sample.
3. The soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling according to claim 1 or 2, characterized in that in the third step, a high-density reflection probe of an ASD spectrometer is adopted to obtain surface spectrum reflection information of the soil sample in the first step, 3 points are randomly selected for each soil sample to be measured, 10 spectra are measured at each measurement point, and the arithmetic mean of 30 spectra is taken as actual reflection spectrum data; and then removing the edge wave band with larger noise, and preprocessing by adopting derivative transformation and a Savitzky-Golay smoothing method.
4. The soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling according to claim 3, characterized in that the spectral band range of the ASD spectrometer is 400 nm-2500 nm, and the spectral band comprises visible light and near infrared band; the obtained actual reflection spectrum data is obtained by deleting 350-399 nm and 2451-2500 nm edge wave bands with large noise and reserving 400-2450 nm reflection spectrum; the derivative transform is a first order differential transform; the Savitzky-Golay smoothing is implemented by continuously performing m data points at a certain p position in hyperspectral data to be processed, selecting a fitting order D to perform least square fitting, taking the value of a curve obtained by fitting at the center point of a data window as a smoothed hyperspectral value, moving the window, and repeating the processes, so that all hyperspectral data are processed, wherein the processing formula is as follows:
wherein x isp,smoothRepresenting the value, x, of the p position after hyperspectral smoothingk+jRepresents a high spectral value corresponding to the k + j position, ajRepresents the smoothing coefficient of the j-th position, and k represents the k-th position.
5. The Bootstrap-sampling-based soil organic carbon prediction uncertainty estimation method according to claim 1, wherein the total distribution F in the fifth step corresponds to the matrix vector parameter in the fourth stepAnd is Fn(x) Is an empirical distribution function of the measured sample X and is:
the obtained actually measured sample x1,x2,……xnArranged in order from small to large, x1<x2<…xn,x(l)The frequency of occurrence is nl,l=1,2,…,r,n1+n2+...nrN; x denotes the measured sample, x(l)Indicates the l-th bitSet actually measured sample, x(l+1)Measured sample representing the l +1 th position, x(r)And l represents the l-th position after sorting according to the size of the measured sample, and r represents a natural number smaller than n.
6. The Bootstrap-sampling-based soil organic carbon prediction uncertainty estimation method according to claim 1, wherein the coefficient R is determined in the eighth step2The calculation method of the ratio RPD of the root mean square error RMSE to the standard prediction error is as follows:
wherein x isiAndrespectively are an actual measured value and a predicted value of the organic carbon in the soil,the average value of soil organic carbon samples is shown, and n is the number of the samples; and determines the coefficient R2The larger the root mean square error RMSE is, the smaller the root mean square error RMSE is, the larger the ratio RPD of the standard prediction error is, and the better the soil organic carbon prediction model is.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910931442.0A CN110531054B (en) | 2019-09-29 | 2019-09-29 | Soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910931442.0A CN110531054B (en) | 2019-09-29 | 2019-09-29 | Soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110531054A CN110531054A (en) | 2019-12-03 |
CN110531054B true CN110531054B (en) | 2022-02-08 |
Family
ID=68670748
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910931442.0A Active CN110531054B (en) | 2019-09-29 | 2019-09-29 | Soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110531054B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111595806A (en) * | 2020-05-25 | 2020-08-28 | 中国农业大学 | Method for monitoring soil carbon component by using mid-infrared diffuse reflection spectrum |
CN112100574A (en) * | 2020-08-21 | 2020-12-18 | 西安交通大学 | Resampling-based AAKR model uncertainty calculation method and system |
CN112461770B (en) * | 2020-11-17 | 2022-11-29 | 山东省科学院海洋仪器仪表研究所 | Method for acquiring performance of spectrometer |
CN113420412B (en) * | 2021-05-26 | 2023-05-09 | 南京信息工程大学 | Soil organic carbon content continuous depth distribution extraction method based on imaging spectrum |
WO2023220934A1 (en) * | 2022-05-17 | 2023-11-23 | 中山大学 | Method and system for determining deviation and reliability of hydrometeorological ensemble forecast |
CN117219182A (en) * | 2023-06-19 | 2023-12-12 | 浙江大学 | Organic carbon component rapid prediction method based on in-situ spectrum and machine learning model |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102798607A (en) * | 2012-08-13 | 2012-11-28 | 浙江大学 | Method for estimating soil organic carbon content by using mid-infrared spectrum technology |
CN103234922A (en) * | 2013-03-29 | 2013-08-07 | 浙江大学 | Rapid soil organic matter detection method based on large sample soil visible-near infrared spectrum classification |
-
2019
- 2019-09-29 CN CN201910931442.0A patent/CN110531054B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102798607A (en) * | 2012-08-13 | 2012-11-28 | 浙江大学 | Method for estimating soil organic carbon content by using mid-infrared spectrum technology |
CN103234922A (en) * | 2013-03-29 | 2013-08-07 | 浙江大学 | Rapid soil organic matter detection method based on large sample soil visible-near infrared spectrum classification |
Non-Patent Citations (4)
Title |
---|
基于Bootstrap 的负荷模型的小样本不确定性分析;韩冬 等;《电力系统保护与控制》;20120916;第40卷(第18期);第1.1节 * |
基于Savitzky-Golay滤波算法的FY -2F地表温度产品时间序列重建;吴迪 等;《国土资源遥感》;20190630;第31卷(第2期);第59-65页 * |
基于Savitzky-Golay算法的录井气测曲线滤波技术;王宝华;《西部探矿工程》;20171231(第6期);第30-31页 * |
新郑市农田土壤属性高光谱综合反演模型;刘文锴 等;《河南理工大学学报( 然科学版)》;20180930;第37卷(第5期);第1.2、1.3、2.1节 * |
Also Published As
Publication number | Publication date |
---|---|
CN110531054A (en) | 2019-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110531054B (en) | Soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling | |
CN101915744B (en) | Near infrared spectrum nondestructive testing method and device for material component content | |
CN110174359B (en) | Aviation hyperspectral image soil heavy metal concentration assessment method based on Gaussian process regression | |
Jin et al. | Non-destructive estimation of field maize biomass using terrestrial lidar: an evaluation from plot level to individual leaf level | |
CN107478580B (en) | Soil heavy metal content estimation method and device based on hyperspectral remote sensing | |
Mahmood et al. | Sensor data fusion to predict multiple soil properties | |
CN108801934A (en) | A kind of modeling method of soil organic carbon EO-1 hyperion prediction model | |
CN110376139A (en) | Soil organic matter content quantitative inversion method based on ground high-spectrum | |
CN105486655B (en) | The soil organism rapid detection method of model is intelligently identified based on infrared spectroscopy | |
CN110455722A (en) | Rubber tree blade phosphorus content EO-1 hyperion inversion method and system | |
CN101825567A (en) | Screening method for near infrared spectrum wavelength and Raman spectrum wavelength | |
CN103854305A (en) | Module transfer method based on multiscale modeling | |
Song et al. | Chlorophyll content estimation based on cascade spectral optimizations of interval and wavelength characteristics | |
Kosnik et al. | Radiocarbon-calibrated multiple amino acid geochronology of Holocene molluscs from Bramble and Rib Reefs (Great Barrier Reef, Australia) | |
CN113436153B (en) | Undisturbed soil profile carbon component prediction method based on hyperspectral imaging and support vector machine technology | |
CN102072767A (en) | Wavelength similarity consensus regression-based infrared spectrum quantitative analysis method and device | |
CN110779875B (en) | Method for detecting moisture content of winter wheat ear based on hyperspectral technology | |
CN113466143B (en) | Soil nutrient inversion method, device, equipment and medium | |
CN114112941A (en) | Aviation hyperspectral water eutrophication evaluation method based on support vector regression | |
CN111141809B (en) | Soil nutrient ion content detection method based on non-contact type conductivity signal | |
CN116818687B (en) | Soil organic carbon spectrum prediction method and device based on spectrum guide integrated learning | |
CN111595806A (en) | Method for monitoring soil carbon component by using mid-infrared diffuse reflection spectrum | |
Liu et al. | Detection of Apple Taste Information Using Model Based on Hyperspectral Imaging and Electronic Tongue Data. | |
CN112924401A (en) | Semi-empirical inversion method for chlorophyll content of vegetation canopy | |
CN116773516A (en) | Soil carbon content analysis system based on remote sensing data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |