CN110531054B - Soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling - Google Patents

Soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling Download PDF

Info

Publication number
CN110531054B
CN110531054B CN201910931442.0A CN201910931442A CN110531054B CN 110531054 B CN110531054 B CN 110531054B CN 201910931442 A CN201910931442 A CN 201910931442A CN 110531054 B CN110531054 B CN 110531054B
Authority
CN
China
Prior art keywords
soil
sample
organic carbon
sampling
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910931442.0A
Other languages
Chinese (zh)
Other versions
CN110531054A (en
Inventor
郭燕
王来刚
贺佳
郑国清
黎世民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute Of Agricultural Economics And Information Henan Academy Of Agricultural Sciences
Original Assignee
Institute Of Agricultural Economics And Information Henan Academy Of Agricultural Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute Of Agricultural Economics And Information Henan Academy Of Agricultural Sciences filed Critical Institute Of Agricultural Economics And Information Henan Academy Of Agricultural Sciences
Priority to CN201910931442.0A priority Critical patent/CN110531054B/en
Publication of CN110531054A publication Critical patent/CN110531054A/en
Application granted granted Critical
Publication of CN110531054B publication Critical patent/CN110531054B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N1/00Sampling; Preparing specimens for investigation
    • G01N1/28Preparing specimens for investigation including physical details of (bio-)chemical methods covered elsewhere, e.g. G01N33/50, C12Q
    • G01N1/286Preparing specimens for investigation including physical details of (bio-)chemical methods covered elsewhere, e.g. G01N33/50, C12Q involving mechanical work, e.g. chopping, disintegrating, compacting, homogenising
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N1/00Sampling; Preparing specimens for investigation
    • G01N1/28Preparing specimens for investigation including physical details of (bio-)chemical methods covered elsewhere, e.g. G01N33/50, C12Q
    • G01N1/34Purifying; Cleaning
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/359Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/24Earth materials
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N1/00Sampling; Preparing specimens for investigation
    • G01N1/28Preparing specimens for investigation including physical details of (bio-)chemical methods covered elsewhere, e.g. G01N33/50, C12Q
    • G01N1/286Preparing specimens for investigation including physical details of (bio-)chemical methods covered elsewhere, e.g. G01N33/50, C12Q involving mechanical work, e.g. chopping, disintegrating, compacting, homogenising
    • G01N2001/2866Grinding or homogeneising

Abstract

The invention provides a soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling, which comprises the following steps: collecting and preprocessing a soil sample, acquiring organic carbon content data and soil hyperspectral data of the soil sample and preprocessing the data; establishing a soil organic carbon prediction model of the organic carbon content data and the soil hyperspectral data of the soil sample by adopting a partial least squares regression method; carrying out replacement random sampling on original measured sample data, obtaining a sub-sample once per sampling, and calculating to obtain an estimated value of each sub-sample parameter by constructing a measured value and a predicted value matrix; a Bootstrap resampling technology is adopted to extract a certain amount of samples from original samples in a returning way; extracting parameters of a sample according to a Bootstrap sampling method to evaluate the uncertainty of the soil organic carbon prediction model; and (6) evaluating the accuracy of the model. The method reduces the problem of low accuracy of the prediction model caused by sampling representativeness and spatial variability of the prediction model.

Description

Soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling
Technical Field
The invention relates to the technical field of uncertainty statistical analysis of soil organic carbon prediction models, in particular to a soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling.
Background
The soil organic carbon is a main index for balancing the global carbon cycle and maintaining the soil quality, but due to the influence of natural factors and human factors, uncertainty exists in the content and the spatial distribution of the soil organic carbon, errors are caused for the evaluation of the soil organic carbon content, and the errors caused by the uncertainty influence the accurate prediction and the drawing precision of the soil organic carbon.
The Vis-NIR technology provides a rapid and accurate near-earth remote sensing estimation method for soil organic carbon, the whole process from data acquisition to predictive modeling is quite simple, and the cost of laboratory analysis is saved. Meanwhile, the advantages of the interdiscipline are utilized, and the research of prediction, mapping and spatial variation of soil organic carbon is carried out by establishing a model by combining with the modern analysis technology, so that a satisfactory result is obtained (Lishuo, 2010; Kuang and Mouazen, 2013; Camboule et al, 2014; Sithole et al, 2018; Zhou et al, 2019). But not to be neglected, some documents mention the problem of uncertainty analysis (Simbahan et al, 2006; beam two, 2007; Helioscopy, 2012) or involve quantitative analysis studies of uncertainty of uncertain analysis content but lack the uncertainty of organic carbon prediction models in the discussion or the prospect section. In the whole process of soil organic carbon prediction, due to the sampling representativeness, the prediction model (training sample and model parameter) method and the spatial variability of the soil organic carbon natural attribute, the accuracy of the prediction mapping result is reduced, and the actual purpose is difficult to achieve.
At the end of the 20 th century, with the development of uncertainty theory, pervasive likelihood uncertainty estimation methods (GLUE, Beven and Binley, 1992), Markov chain-Monte Carlo methods (Kuczera and Parent, 1998), BaRE (Thiemann et al, 2001), Bootstrap methods (Pan and Politis, 2016), etc. have become more and more widely used in the uncertainty evaluation of models. Such as Linqing (2011), Muleta, etc. ((2013) Sellami et al (2013), Tian et al (2014) adopt a GLUE method to analyze the parameter uncertainty of the water and solute transport model; pilot and present (2018),
Figure GDA0003344598760000011
And (2019) vehicle insurance rates and positions of soil parameter sensors are selected by adopting a Bootstrap method for optimization analysis, and the research achieves good effects, but the methods are not seen in uncertainty analysis research of soil organic carbon prediction models. Bootstrap is a nonparametric random inspection method, and is also an important method for researching uncertainty in recent years because sampling of Bootstrap is not influenced by parameter distribution hypothesis. In soil organic carbon predictive modeling, foreign
Figure GDA0003344598760000012
Et al (2013), Hoffmann et al (2014) indicate that when we use Vis-NIR for predictive mapping of soil organic carbon, uncertainty comes mainly from modeling samples and the propagation process from prediction to mapping. According to the invention, a Bootstrap method is introduced to a soil organic carbon prediction model around a modeling sample to carry out uncertainty evaluation.
Disclosure of Invention
Aiming at the technical problem that the uncertainty of the existing soil organic carbon prediction model is lack of quantitative analysis, the invention provides a soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling, which introduces the Bootstrap parameter estimation method in the traditional statistics into the soil science, realizes the uncertainty estimation of the soil organic carbon prediction model, and reduces the problem of low accuracy of the prediction model caused by sampling representativeness and space variability of the prediction model (training samples and model parameters).
In order to achieve the purpose, the technical scheme of the invention is realized as follows: a soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling comprises the following steps:
the method comprises the following steps: collecting and pretreating a soil sample: after soil is leveled, a surface soil sample is obtained by adopting a 5-point mixing sampling method, and is naturally air-dried, ground and sieved;
step two: acquiring organic carbon content data of the soil sample in the first step: measuring the organic carbon content of the soil sample in the step one by adopting a total organic carbon analyzer, and storing the organic carbon content in a computer in a file form;
step three: acquiring soil hyperspectral data of the soil sample in the step one and preprocessing the soil hyperspectral data;
step four: constructing a prediction model: establishing a soil organic carbon prediction model of the organic carbon content data of the soil sample obtained in the step two and the soil hyperspectral data preprocessed in the step three by adopting a partial least square regression method, constructing an actual measured value and a predicted value of the obtained soil organic carbon into a matrix vector, and calculating to obtain a matrix vector parameter;
step five: the Bootstrap sampling method idea is as follows: suppose sample X ═ X1,X2,……,Xn) From the population of the distribution F, R (X, F) is a known function of the sample X and the distribution F, and it is now known that the measured value X of a sample is (X ═ X)1,x2,……xn) The uncertainty distribution properties of the function R (X, F) are specifically analyzed by the measured values X of the sample; firstly, carrying out replacement random sampling on original actual measurement sample data, obtaining a sub-sample once per sampling, wherein the size of the actual measurement sample is n, and the probability that each original sample data can be extracted is 1/n; after Bootstrap samples are determined, the estimated value of each subsample parameter is calculated by constructing an actual measured value and a predicted value matrix
Figure GDA0003344598760000021
Estimates of these subsamples
Figure GDA0003344598760000022
Form a row vector
Figure GDA0003344598760000023
Step six: bootstrap sampling: sampling a certain number of samples from original samples by adopting a Bootstrap sampling technology, repeating the sampling process, and recording the sampling times q;
step seven: sample parameters extracted according to Bootstrap sampling method
Figure GDA0003344598760000024
Estimated value of q-th time
Figure GDA0003344598760000025
Line vector
Figure GDA0003344598760000026
Figure GDA0003344598760000027
Calculating to obtain sample parameters
Figure GDA0003344598760000028
Mean value of
Figure GDA0003344598760000029
Sum variance
Figure GDA00033445987600000210
Evaluating the uncertainty of the soil organic carbon prediction model;
step eight: and (3) evaluating the model precision: using a determining coefficient R2And evaluating the soil organic carbon prediction model established in the fourth step by the ratio RPD of the root mean square error RMSE and the standard prediction error to determine the robustness of the soil organic carbon prediction model.
In the first step, the soil samples are soil with a soil layer depth of 0-0.20 m from the surface layer of the ground, and the soil samples are collected for 42 times after the 2018 corn is harvested; the 5-point mixed sampling method comprises the following steps: selecting an area of 1 square meter, firstly determining the middle point of two diagonal lines of the area as a first central sampling point, then selecting four points with equal distance with the central sampling point on the diagonal lines as second to fifth sampling points, and then mixing samples obtained by the five sampling points to obtain a soil sample.
In the third step, a high-density reflection probe of an ASD spectrometer is adopted to obtain surface spectrum reflection information of the soil sample in the first step, 3 points are randomly selected for each soil sample to be measured, 10 spectrums are measured at each measuring point, and the arithmetic mean value of 30 spectrums is taken as actual reflection spectrum data; and then removing the edge wave band with larger noise, and preprocessing by adopting derivative transformation and a Savitzky-Golay smoothing method.
The spectral band range of the ASD spectrometer is 400 nm-2500 nm, and the spectral band comprises visible light and near infrared bands; the obtained actual reflection spectrum data is obtained by deleting 350-399 nm and 2451-2500 nm edge wave bands with large noise and reserving 400-2450 nm reflection spectrum; the derivative transform is a first order differential transform; the Savitzky-Golay smoothing is implemented by continuously performing m data points at a certain p position in hyperspectral data to be processed, selecting a fitting order D to perform least square fitting, taking the value of a curve obtained by fitting at the center point of a data window as a smoothed hyperspectral value, moving the window, and repeating the processes, so that all hyperspectral data are processed, wherein the processing formula is as follows:
Figure GDA0003344598760000031
j=0,1,……m;
wherein x isp,smoothRepresenting the value, x, of the p position after hyperspectral smoothingk+jRepresents a high spectral value corresponding to the k + j position, ajRepresents the smoothing coefficient of the j-th position, and k represents the k-th position.
The method for establishing the soil organic carbon prediction model by adopting the partial least square regression method in the fourth step comprises the following steps: substituting the hyperspectral data into a soil organic carbon prediction model to obtain a soil organic carbon prediction value, wherein the expression form of the matrix vector is
Figure GDA0003344598760000032
Matrix vector parameters
Figure GDA0003344598760000033
Is a correlation coefficient, x, of two column vectorsi1The measured value is represented by the measured value,
Figure GDA0003344598760000034
indicates the predicted value, and n indicates the number of samples.
The overall distribution F in the fifth step corresponds to the matrix vector parameters in the fourth step
Figure GDA0003344598760000035
And is
Figure GDA0003344598760000036
Fn(x) Is an empirical distribution function of the measured sample X and is:
Figure GDA0003344598760000037
the obtained actually measured sample x1,x2,……xnArranged in order from small to large, x1<x2<…xn,x(l)The frequency of occurrence is nl,l=1,2,…,r,n1+n2+...nrN; x denotes the measured sample, x(l)Measured sample representing the l-th position, x(l+1)Measured sample representing the l +1 th position, x(r)And l represents the l-th position after sorting according to the size of the measured sample, and r represents a natural number smaller than n.
Sample parameters of the Bootstrap sample
Figure GDA0003344598760000041
Mean value of
Figure GDA0003344598760000042
Sum variance
Figure GDA0003344598760000043
The calculation method of (2) is as follows:
Figure GDA0003344598760000044
Figure GDA0003344598760000045
mean value
Figure GDA0003344598760000046
Sum variance
Figure GDA0003344598760000047
For estimating matrix vector parameters
Figure GDA0003344598760000048
The degree of deviation of; matrix vector parameters
Figure GDA0003344598760000049
The coefficient of variation CV is:
Figure GDA00033445987600000410
determining the coefficient R in the step eight2The calculation method of the ratio RPD of the root mean square error RMSE to the standard prediction error is as follows:
Figure GDA00033445987600000411
Figure GDA00033445987600000412
Figure GDA00033445987600000413
wherein x isiAnd
Figure GDA00033445987600000414
respectively are an actual measured value and a predicted value of the organic carbon in the soil,
Figure GDA00033445987600000415
the average value of soil organic carbon samples is shown, and n is the number of the samples; and determines the coefficient R2The larger the root mean square error RMSE is, the smaller the root mean square error RMSE is, the larger the ratio RPD of the standard prediction error is, and the better the soil organic carbon prediction model is.
The invention has the beneficial effects that: the method is used for solving the uncertainty estimation and quantification of soil carbon organic carbon prediction, in particular the uncertainty estimation and quantification problem when soil organic carbon modeling prediction is carried out based on a small sample; after soil spectrum data acquired by ASD hyperspectral measurement is preprocessed, a prediction model of soil hyperspectral data and actually measured soil organic carbon is established by adopting a partial least squares regression method; sampling the collected sample data with a Bootstrap sampling method, and calculating the average value, standard error and 95% confidence interval of the soil organic carbon prediction model parameters, thereby carrying out uncertainty evaluation on the soil organic carbon prediction model. According to the invention, the Bootstrap parameter estimation method in statistics is applied to uncertainty evaluation of the soil organic carbon prediction model, so that the problem of low accuracy of the prediction model caused by sampling representativeness and spatial variability of the prediction model is solved, and research results are beneficial to exploration of the difficulty in accurately estimating national scale and global soil organic carbon reserves.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a schematic diagram of the boottrap sampling method in fig. 1.
FIG. 3 is a diagram of matrix vector parameters of the present invention
Figure GDA0003344598760000051
Probability distribution map of (2).
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, a soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling comprises the following steps:
the method comprises the following steps: collecting and pretreating a soil sample: after soil is leveled, a surface soil sample is obtained by adopting a 5-point mixing sampling method, and the surface soil sample is naturally air-dried, ground and sieved for later use.
The soil samples are soil with the soil layer depth of 0-0.20 m from the surface layer of the ground, and the soil samples are collected in 42 times after the corn is harvested in 2018. The concrete implementation method of the 5-point mixed sampling method is as follows: selecting a region of 1 square meter, firstly determining the middle point of two diagonal lines of the region as a first central sampling point, and then selecting four points which are equidistant from the central sampling point on the diagonal lines as second to fifth sampling points. The samples obtained at the five sampling points were then mixed to form a soil sample.
Step two: acquiring organic carbon content data of the soil sample in the first step: and (3) measuring the organic carbon content of the soil sample in the step one by adopting a TOC-L CPH total organic carbon analyzer, and storing the organic carbon content in a computer in a file form. TOC-L CPH Total organic carbon Analyzer was manufactured by Shimadzu corporation of Japan.
Step three: acquiring soil hyperspectral data of the soil sample in the first step and preprocessing the soil hyperspectral data: acquiring surface spectrum reflection information of the soil sample in the step one by adopting a high-density reflection probe of an ASD spectrometer, randomly selecting 3 points for measurement of each soil sample, measuring 10 spectra at each measurement point, and taking the arithmetic average value of 30 spectra as actual reflection spectrum data; then removing the edge wave band with larger noise, and adopting derivative transformation and S-G smoothing method to make pretreatment.
The ASD spectrometer is an ASD field Spec Pro FR type spectrometer (Analytical Spectral Devices, Boulder, CO, USA), has a Spectral band range of 400 nm-2500 nm, and comprises visible light and near infrared bands. Other similar types of spectrometers may also be used. The obtained actual reflection spectrum data is obtained by deleting 350-399 nm and 2451-2500 nm edge wave bands with large noise and reserving 400-2450 nm reflection spectrum. In order to reduce the influence of the research data on the difference of optical environment and the grinding and screening of samples, a method combining Savitzky-Golay smoothing of a 2-order 11-window and first-order differential transformation is adopted for preprocessing.
Savitzky-Golay smoothing is to continuously process m data points (window widths) at a certain p position in hyperspectral data to be processed, select a certain fitting order D to carry out least square fitting, take the value of a curve obtained by fitting at the center point of a data window as a smoothed hyperspectral value, then move the window and repeat the process, thereby realizing the processing of all hyperspectral data. The specific formula is as follows:
Figure GDA0003344598760000061
wherein x isp,smoothRepresenting the value, x, of the p position after hyperspectral smoothingk+jRepresents a high spectral value corresponding to the k + j position, ajRepresents the smoothing coefficient of the j-th position, and k represents the k-th position.
Step four: constructing a prediction model: establishing a soil organic carbon prediction model of the organic carbon content data of the soil sample obtained in the step two and the soil hyperspectral data preprocessed in the step three by adopting a partial least square regression method, and obtaining an actually measured value x of the soil organic carboniAnd the predicted value
Figure GDA00033445987600000611
Is constructed into a matrix vector A, and matrix vector parameters are obtained by calculation
Figure GDA0003344598760000062
The method for establishing the soil organic carbon prediction model by adopting the partial least squares regression method comprises the following steps: substituting the hyperspectral data into a soil organic carbon prediction model to obtain a soil organic carbon prediction value, wherein the expression form of the matrix vector is
Figure GDA0003344598760000063
Matrix vector parameters
Figure GDA0003344598760000064
Is a correlation coefficient, x, of two column vectorsi1
Figure GDA0003344598760000065
n represents the measured value, the predicted value and the number of samples, respectively.
Step five: the Bootstrap sampling method idea is as follows: suppose sample X ═ X1,X2,……,Xn) From the population of the distribution F, R (X, F) is a known function of the sample X and the distribution F, and it is now known that the measured value X of a sample is (X ═ X)1,x2,……xn) The uncertainty distribution properties of the function R (X, F) are specifically analyzed by the measured values X of the sample. Firstly, the original actual measurement sample data is sampled randomly with the sample being put back, and each sampling is recorded as a sub-sample. Assuming that the size of the measured sample is n, the probability that each original sample data can be extracted is 1/n. After Bootstrap samples are determined, the estimated value of each subsample parameter can be calculated by constructing a matrix vector A of the measured value and the predicted value
Figure GDA0003344598760000066
Estimates of these subsamples
Figure GDA0003344598760000067
Form a row vector
Figure GDA0003344598760000068
As shown in FIG. 2, the population distribution F corresponds to the direction of the matrix in step fourQuantity parameter
Figure GDA0003344598760000069
Figure GDA00033445987600000610
Fn(x) Is an empirical distribution function of the measured sample X. The empirical distribution function is defined as follows:
the obtained actually measured sample x1,x2,……xnArranged in order from small to large, x1<x2<…xnWherein x is(l)The frequency of occurrence is nl,l=1,2,…,r,n1+n2+...nrN, Fn(x) Sample distribution function or empirical distribution function for the population X:
Figure GDA0003344598760000071
the obtained actually measured sample x1,x2,……xnArranged in order from small to large, x1<x2<…xn,x(l)The frequency of occurrence is nl1, 2, …, r; wherein, x and x(l)、x(l+1)、x(r)And l and r respectively represent the actual measurement sample, the l position actual measurement sample, the l +1 position actual measurement sample, the r position actual measurement sample, the l position and the natural number which are sequenced according to the size of the actual measurement sample.
Step six: bootstrap sampling: the Bootstrap sampling technology is adopted to extract a certain number of samples from original samples, the sampling process is repeated, and the sampling frequency q is recorded, wherein in the test, q is 50.
Step seven: estimation values of q times of sample parameter matrix vectors extracted according to Bootstrap sampling method
Figure GDA0003344598760000072
(Vector)
Figure GDA0003344598760000073
Figure GDA0003344598760000074
Thereby calculating the obtained parameters
Figure GDA0003344598760000075
Mean value of
Figure GDA0003344598760000076
Sum variance
Figure GDA0003344598760000077
And evaluating the uncertainty of the soil organic carbon prediction model.
Sample parameters of the Bootstrap sample
Figure GDA0003344598760000078
Mean value of
Figure GDA0003344598760000079
Sum variance
Figure GDA00033445987600000710
The calculation method of (2) is as follows:
Figure GDA00033445987600000711
Figure GDA00033445987600000712
calculating sample parameters obtained by Bootstrap sampling for 50 times by using formulas (3) and (4)
Figure GDA00033445987600000713
The value range is as follows: 0.7116-1.0958. Mean value
Figure GDA00033445987600000714
Sum variance
Figure GDA00033445987600000715
Are respectively 0.942 and 0.0080, for estimating matrix vector parameters
Figure GDA00033445987600000716
The degree of deviation of (a). Sample-specific matrix vector parameters
Figure GDA00033445987600000717
The probability map is shown in FIG. 3. from FIG. 3, it can be seen that 95% of the confidence interval ranges are shown, where the confidence intervals represent matrix vector parameters
Figure GDA00033445987600000718
The true value of (a) has a certain probability of falling around the measurement result, which gives a confidence level of the measured value of the measured parameter, i.e. a "probability". Therefore, the uncertainty of the soil organic carbon prediction model can be estimated through Bootstrap sampling, and the parameters are
Figure GDA00033445987600000719
The coefficient of variation, CV, of 9.48% all sample parameters fell within the 95% confidence interval.
Figure GDA00033445987600000720
Step eight: and (3) evaluating the model precision: using a determining coefficient R2And evaluating the soil organic carbon prediction model established in the fourth step by the ratio RPD of the root mean square error RMSE and the standard prediction error to determine the robustness of the soil organic carbon prediction model.
The determination coefficient R2The calculation method of the ratio RPD of the root mean square error RMSE to the standard prediction error is as follows:
Figure GDA00033445987600000721
Figure GDA0003344598760000081
Figure GDA0003344598760000082
wherein x isiAnd
Figure GDA0003344598760000083
respectively are an actual measured value and a predicted value of the organic carbon in the soil,
Figure GDA0003344598760000084
the average value of soil organic carbon samples is shown, and n is the number of the samples; and determines the coefficient R2The larger the root mean square error RMSE is, the smaller the root mean square error RMSE is, the larger the ratio RPD of the standard prediction error is, and the better the soil organic carbon prediction model is.
Evaluating the established soil organic carbon prediction model by adopting formulas (6), (7) and (8), and adopting a coefficient of determination R of the hyperspectral prediction model20.960, the ratio of the root mean square error RMSE to the standard prediction error, RPD, is 1.44 and 4.87, respectively. Ratio RPD>And 2, the soil organic carbon prediction model has excellent prediction capability and is robust.
According to the invention, the Bootstrap parameter estimation method in statistics is applied to uncertainty evaluation of the soil organic carbon prediction model, so that the problem of low accuracy of the prediction model caused by sampling representativeness and spatial variability of the prediction model (training samples and model parameters) is solved, and research results are beneficial to exploration of the difficulty in accurately estimating national scale and global soil organic carbon reserves.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (6)

1. A soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling is characterized by comprising the following steps:
the method comprises the following steps: collecting and pretreating a soil sample: after soil is leveled, a surface soil sample is obtained by adopting a 5-point mixing sampling method, and is naturally air-dried, ground and sieved;
step two: acquiring organic carbon content data of the soil sample in the first step: measuring the organic carbon content of the soil sample in the step one by adopting a total organic carbon analyzer, and storing the organic carbon content in a computer in a file form;
step three: acquiring soil hyperspectral data of the soil sample in the step one and preprocessing the soil hyperspectral data;
step four: constructing a prediction model: establishing a soil organic carbon prediction model of the organic carbon content data of the soil sample obtained in the step two and the soil hyperspectral data preprocessed in the step three by adopting a partial least square regression method, constructing an actual measured value and a predicted value of the obtained soil organic carbon into a matrix vector, and calculating to obtain a matrix vector parameter;
the method for establishing the soil organic carbon prediction model by adopting the partial least square regression method in the fourth step comprises the following steps: substituting the hyperspectral data into a soil organic carbon prediction model to obtain a soil organic carbon prediction value, wherein the expression form of the matrix vector is
Figure FDA0003344598750000011
Matrix vector parameters
Figure FDA0003344598750000012
Is a correlation coefficient, x, of two column vectorsi1The measured value is represented by the measured value,
Figure FDA0003344598750000013
representing the predicted value, n representing the number of samples;
step five: the Bootstrap sampling method idea is as follows: suppose sample X ═ X1,X2,……,Xn) From the population of the distribution F, R (X, F) is a known function of the sample X and the distribution F, and it is now known that the measured value X of a sample is (X ═ X)1,x2,……xn) The uncertainty distribution properties of the function R (X, F) are specifically analyzed by the measured values X of the sample; firstly, carrying out replacement random sampling on original actual measurement sample data, obtaining a sub-sample once per sampling, wherein the size of the actual measurement sample is n, and the probability that each original sample data can be extracted is 1/n; after Bootstrap samples are determined, the estimated value of each subsample parameter is calculated by constructing an actual measured value and a predicted value matrix
Figure FDA0003344598750000014
Estimates of these subsamples
Figure FDA0003344598750000015
Form a row vector
Figure FDA0003344598750000016
Step six: bootstrap sampling: sampling a certain number of samples from original samples by adopting a Bootstrap sampling technology, repeating the sampling process, and recording the sampling times q;
step seven: sample parameters extracted according to Bootstrap sampling method
Figure FDA0003344598750000017
Estimated value of q-th time
Figure FDA0003344598750000018
Line vector
Figure FDA0003344598750000019
Figure FDA00033445987500000110
Calculating to obtain sample parameters
Figure FDA00033445987500000111
Mean value of
Figure FDA00033445987500000112
Sum variance
Figure FDA00033445987500000113
Evaluating the uncertainty of the soil organic carbon prediction model;
sample parameters of the Bootstrap sample
Figure FDA00033445987500000114
Mean value of
Figure FDA00033445987500000115
Sum variance
Figure FDA00033445987500000116
The calculation method of (2) is as follows:
Figure FDA0003344598750000021
Figure FDA0003344598750000022
mean value
Figure FDA0003344598750000023
Sum variance
Figure FDA0003344598750000024
For estimating matrix vector parameters
Figure FDA0003344598750000025
The degree of deviation of; matrix vector parameters
Figure FDA0003344598750000026
The coefficient of variation CV is:
Figure FDA0003344598750000027
step eight: and (3) evaluating the model precision: using a determining coefficient R2And evaluating the soil organic carbon prediction model established in the fourth step by the ratio RPD of the root mean square error RMSE and the standard prediction error to determine the robustness of the soil organic carbon prediction model.
2. The soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling according to claim 1, characterized in that in the first step, the soil sample is soil with a soil depth of 0-0.20 m from the surface layer of the ground, and the soil sample collection time is 42 soil samples collected after 2018 years of corn harvesting; the 5-point mixed sampling method comprises the following steps: selecting an area of 1 square meter, firstly determining the middle point of two diagonal lines of the area as a first central sampling point, then selecting four points with equal distance with the central sampling point on the diagonal lines as second to fifth sampling points, and then mixing samples obtained by the five sampling points to obtain a soil sample.
3. The soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling according to claim 1 or 2, characterized in that in the third step, a high-density reflection probe of an ASD spectrometer is adopted to obtain surface spectrum reflection information of the soil sample in the first step, 3 points are randomly selected for each soil sample to be measured, 10 spectra are measured at each measurement point, and the arithmetic mean of 30 spectra is taken as actual reflection spectrum data; and then removing the edge wave band with larger noise, and preprocessing by adopting derivative transformation and a Savitzky-Golay smoothing method.
4. The soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling according to claim 3, characterized in that the spectral band range of the ASD spectrometer is 400 nm-2500 nm, and the spectral band comprises visible light and near infrared band; the obtained actual reflection spectrum data is obtained by deleting 350-399 nm and 2451-2500 nm edge wave bands with large noise and reserving 400-2450 nm reflection spectrum; the derivative transform is a first order differential transform; the Savitzky-Golay smoothing is implemented by continuously performing m data points at a certain p position in hyperspectral data to be processed, selecting a fitting order D to perform least square fitting, taking the value of a curve obtained by fitting at the center point of a data window as a smoothed hyperspectral value, moving the window, and repeating the processes, so that all hyperspectral data are processed, wherein the processing formula is as follows:
Figure FDA0003344598750000028
wherein x isp,smoothRepresenting the value, x, of the p position after hyperspectral smoothingk+jRepresents a high spectral value corresponding to the k + j position, ajRepresents the smoothing coefficient of the j-th position, and k represents the k-th position.
5. The Bootstrap-sampling-based soil organic carbon prediction uncertainty estimation method according to claim 1, wherein the total distribution F in the fifth step corresponds to the matrix vector parameter in the fourth step
Figure FDA0003344598750000031
And is
Figure FDA0003344598750000032
Fn(x) Is an empirical distribution function of the measured sample X and is:
Figure FDA0003344598750000033
the obtained actually measured sample x1,x2,……xnArranged in order from small to large, x1<x2<…xn,x(l)The frequency of occurrence is nl,l=1,2,…,r,n1+n2+...nrN; x denotes the measured sample, x(l)Indicates the l-th bitSet actually measured sample, x(l+1)Measured sample representing the l +1 th position, x(r)And l represents the l-th position after sorting according to the size of the measured sample, and r represents a natural number smaller than n.
6. The Bootstrap-sampling-based soil organic carbon prediction uncertainty estimation method according to claim 1, wherein the coefficient R is determined in the eighth step2The calculation method of the ratio RPD of the root mean square error RMSE to the standard prediction error is as follows:
Figure FDA0003344598750000034
Figure FDA0003344598750000035
Figure FDA0003344598750000036
wherein x isiAnd
Figure FDA0003344598750000037
respectively are an actual measured value and a predicted value of the organic carbon in the soil,
Figure FDA0003344598750000038
the average value of soil organic carbon samples is shown, and n is the number of the samples; and determines the coefficient R2The larger the root mean square error RMSE is, the smaller the root mean square error RMSE is, the larger the ratio RPD of the standard prediction error is, and the better the soil organic carbon prediction model is.
CN201910931442.0A 2019-09-29 2019-09-29 Soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling Active CN110531054B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910931442.0A CN110531054B (en) 2019-09-29 2019-09-29 Soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910931442.0A CN110531054B (en) 2019-09-29 2019-09-29 Soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling

Publications (2)

Publication Number Publication Date
CN110531054A CN110531054A (en) 2019-12-03
CN110531054B true CN110531054B (en) 2022-02-08

Family

ID=68670748

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910931442.0A Active CN110531054B (en) 2019-09-29 2019-09-29 Soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling

Country Status (1)

Country Link
CN (1) CN110531054B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111595806A (en) * 2020-05-25 2020-08-28 中国农业大学 Method for monitoring soil carbon component by using mid-infrared diffuse reflection spectrum
CN112100574A (en) * 2020-08-21 2020-12-18 西安交通大学 Resampling-based AAKR model uncertainty calculation method and system
CN112461770B (en) * 2020-11-17 2022-11-29 山东省科学院海洋仪器仪表研究所 Method for acquiring performance of spectrometer
CN113420412B (en) * 2021-05-26 2023-05-09 南京信息工程大学 Soil organic carbon content continuous depth distribution extraction method based on imaging spectrum
WO2023220934A1 (en) * 2022-05-17 2023-11-23 中山大学 Method and system for determining deviation and reliability of hydrometeorological ensemble forecast
CN117219182A (en) * 2023-06-19 2023-12-12 浙江大学 Organic carbon component rapid prediction method based on in-situ spectrum and machine learning model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102798607A (en) * 2012-08-13 2012-11-28 浙江大学 Method for estimating soil organic carbon content by using mid-infrared spectrum technology
CN103234922A (en) * 2013-03-29 2013-08-07 浙江大学 Rapid soil organic matter detection method based on large sample soil visible-near infrared spectrum classification

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102798607A (en) * 2012-08-13 2012-11-28 浙江大学 Method for estimating soil organic carbon content by using mid-infrared spectrum technology
CN103234922A (en) * 2013-03-29 2013-08-07 浙江大学 Rapid soil organic matter detection method based on large sample soil visible-near infrared spectrum classification

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
基于Bootstrap 的负荷模型的小样本不确定性分析;韩冬 等;《电力系统保护与控制》;20120916;第40卷(第18期);第1.1节 *
基于Savitzky-Golay滤波算法的FY -2F地表温度产品时间序列重建;吴迪 等;《国土资源遥感》;20190630;第31卷(第2期);第59-65页 *
基于Savitzky-Golay算法的录井气测曲线滤波技术;王宝华;《西部探矿工程》;20171231(第6期);第30-31页 *
新郑市农田土壤属性高光谱综合反演模型;刘文锴 等;《河南理工大学学报( 然科学版)》;20180930;第37卷(第5期);第1.2、1.3、2.1节 *

Also Published As

Publication number Publication date
CN110531054A (en) 2019-12-03

Similar Documents

Publication Publication Date Title
CN110531054B (en) Soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling
CN101915744B (en) Near infrared spectrum nondestructive testing method and device for material component content
CN110174359B (en) Aviation hyperspectral image soil heavy metal concentration assessment method based on Gaussian process regression
Jin et al. Non-destructive estimation of field maize biomass using terrestrial lidar: an evaluation from plot level to individual leaf level
CN107478580B (en) Soil heavy metal content estimation method and device based on hyperspectral remote sensing
Mahmood et al. Sensor data fusion to predict multiple soil properties
CN108801934A (en) A kind of modeling method of soil organic carbon EO-1 hyperion prediction model
CN110376139A (en) Soil organic matter content quantitative inversion method based on ground high-spectrum
CN105486655B (en) The soil organism rapid detection method of model is intelligently identified based on infrared spectroscopy
CN110455722A (en) Rubber tree blade phosphorus content EO-1 hyperion inversion method and system
CN101825567A (en) Screening method for near infrared spectrum wavelength and Raman spectrum wavelength
CN103854305A (en) Module transfer method based on multiscale modeling
Song et al. Chlorophyll content estimation based on cascade spectral optimizations of interval and wavelength characteristics
Kosnik et al. Radiocarbon-calibrated multiple amino acid geochronology of Holocene molluscs from Bramble and Rib Reefs (Great Barrier Reef, Australia)
CN113436153B (en) Undisturbed soil profile carbon component prediction method based on hyperspectral imaging and support vector machine technology
CN102072767A (en) Wavelength similarity consensus regression-based infrared spectrum quantitative analysis method and device
CN110779875B (en) Method for detecting moisture content of winter wheat ear based on hyperspectral technology
CN113466143B (en) Soil nutrient inversion method, device, equipment and medium
CN114112941A (en) Aviation hyperspectral water eutrophication evaluation method based on support vector regression
CN111141809B (en) Soil nutrient ion content detection method based on non-contact type conductivity signal
CN116818687B (en) Soil organic carbon spectrum prediction method and device based on spectrum guide integrated learning
CN111595806A (en) Method for monitoring soil carbon component by using mid-infrared diffuse reflection spectrum
Liu et al. Detection of Apple Taste Information Using Model Based on Hyperspectral Imaging and Electronic Tongue Data.
CN112924401A (en) Semi-empirical inversion method for chlorophyll content of vegetation canopy
CN116773516A (en) Soil carbon content analysis system based on remote sensing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant