CN114509404A - Method for predicting content of available boron in hyperspectral soil - Google Patents

Method for predicting content of available boron in hyperspectral soil Download PDF

Info

Publication number
CN114509404A
CN114509404A CN202210141354.2A CN202210141354A CN114509404A CN 114509404 A CN114509404 A CN 114509404A CN 202210141354 A CN202210141354 A CN 202210141354A CN 114509404 A CN114509404 A CN 114509404A
Authority
CN
China
Prior art keywords
soil
hyperspectral
data
model
boron content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210141354.2A
Other languages
Chinese (zh)
Inventor
李绍稳
朱娟娟
金�秀
韩亚鲁
郑文瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Agricultural University AHAU
Original Assignee
Anhui Agricultural University AHAU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Agricultural University AHAU filed Critical Anhui Agricultural University AHAU
Priority to CN202210141354.2A priority Critical patent/CN114509404A/en
Publication of CN114509404A publication Critical patent/CN114509404A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/359Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/01Arrangements or apparatus for facilitating the optical investigation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/01Arrangements or apparatus for facilitating the optical investigation
    • G01N2021/0106General arrangement of respective parts
    • G01N2021/0112Apparatus in one mechanical, optical or electronic block
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/10Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Analytical Chemistry (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Immunology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Mathematical Physics (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • Geometry (AREA)

Abstract

The invention discloses a method for predicting the effective boron content of hyperspectral soil, which predicts the effective boron content of soil according to visible near-infrared hyperspectrum by preprocessing modeling, firstly carries out preprocessing conversion on collected visible near-infrared hyperspectral data, and then establishes a soil effective boron content prediction model by combining a regression algorithm, thereby realizing the prediction of the effective boron content of soil by utilizing soil spectrum data according to the soil effective boron content prediction model. The method can realize nondestructive, real-time, quick and accurate indoor detection of the effective boron content of the soil by utilizing visible near-infrared hyperspectrum.

Description

Method for predicting content of available boron in hyperspectral soil
Technical Field
The invention relates to the technical field of chemical detection, in particular to a method for predicting the content of available boron in hyperspectral soil.
Background
Boron is an essential trace element and plays a crucial role in flowering, fertilization, yield increase and quality of crops. In sandy calcareous soils with a rough texture, boron may be one of the key limiting micronutrient elements. Boron deficiency may be a major limiting factor in crop production, and occurs globally, and is considered the second most important micronutrient limiting factor in crop growth. And as the boron content is usually the minimum content of all chemical elements in the soil, the rapid and accurate detection of the effective boron content in the soil is of great significance. However, the existing soil effective boron detection methods (such as curcumin method and azomethine-H acid method) mainly rely on chemical analysis, and have the defects of low detection efficiency, high cost, potential environmental pollution and the like.
Visible near infrared (VIS-NIR) hyperspectrum can provide a method for detecting physicochemical properties of crops and soil in situ in a nondestructive, real-time and rapid manner. Thus, VIS-NIR is widely used in agriculture to monitor organic compounds and mineral nutrients. TahmasBionet et al used laboratory-based hyperspectral image (400-; the Partial Least Squares Regression (PLSR) model gives the coefficient of determination (R2) for all tested compositions>0.8. The Tamburini group researches the influence of moisture and particle size on quantitative prediction of total organic carbon in soil through near infrared spectroscopy, and finds that the best prediction result is given by combining Standard Normal Variables (SNV) and second derivatives with a PLSR regression algorithm. Jinxiu et al used VIS-NIR spectroscopy to predict soil available potassium content and found that boosting algorithms (GBRT and Adaboost) showed the best R2. Although the near infrared spectrum has attracted a great deal of attention in the last decades and has been developed in soil science, the accuracy and versatility of the VIS-NIR model in predicting effective minerals in soil is still unsatisfactory, especially in terms of trace element content.
Therefore, how to accurately predict the boron content in the soil according to the visible near-infrared hyperspectrum is a problem which needs to be solved urgently by the technical personnel in the field.
Disclosure of Invention
In view of the above, the invention provides a method for predicting the effective boron content of hyperspectral soil, which is used for collecting and analyzing soil spectral data through indoor non-imaging visible near infrared (VIS-NIR) hyperspectral and determining the boron content by using physicochemical analysis. Firstly, preprocessing the collected visible near-infrared hyperspectral data by adopting detrending correction (DT)And (4) converting, and establishing a soil effective boron content prediction model by combining a regression algorithm of a support vector machine fused with a radial basis Gaussian kernel function (SVM _ RBF), so that the soil effective boron content is predicted by utilizing soil spectrum data according to the soil effective boron content prediction model. The spectral data are converted by 29 preprocessing methods such as trend correction (DT) and Savitzky-Golay convolution smoothing, 9 regression algorithms such as Elastinet, Ridge and support vector machine fusion radial basis Gaussian kernel function (SVM _ RBF) are used for further modeling to generate 270 model prediction methods, and the decision coefficient R of the model is used for generating2The accuracy, reliability and stability of the model established by predicting the effective boron in the soil are evaluated by parameters such as relative analysis errors RPD and RPIQ, and the method proves that the DT + SVM _ RBF combination method is an optimal method, so that the beneficial effect of the method is clear. The invention can provide reference for remote sensing monitoring of soil fertilizer trace element information.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for predicting the content of available boron in hyperspectral soil comprises the following specific steps:
step 1: collecting visible near-infrared hyperspectral data of soil;
step 2: preprocessing and transforming visible near-infrared hyperspectral data of soil to obtain model data;
and step 3: the model data adopts a regression algorithm to construct a VIS-NIR spectrum model, and training is carried out to obtain a soil effective boron content prediction model;
and 4, step 4: and inputting the soil hyperspectral image to be predicted into the soil effective boron content prediction model, and predicting to obtain the soil effective boron content.
Preferably, the pretreatment process in step 2 is as follows: denoising and deleting noise invalid wave bands on the soil visible near-infrared hyperspectral data, and then performing detrending correction processing on the denoised hyperspectral data by adopting a detrending correction method.
Preferably, the front noise invalid wave band in the collected soil visible near-infrared hyperspectral data is removed, and the spectrum data of the area of 350-1655nm (1306 wave bands) is taken.
Preferably, the regression algorithm in the step 3 adopts a support vector machine and a radial basis function gaussian kernel function to map the model data to a high-dimensional feature space, so as to construct a VIS-NIR spectrum model.
According to the technical scheme, compared with the prior art, the method for predicting the effective boron content of the hyperspectral soil is disclosed, the visible near-infrared hyperspectrum is utilized to provide a method for nondestructively, timely, quickly, accurately and indoors detecting the effective boron content of the soil, the spectrum data are converted by adopting a preprocessing method, modeling is further carried out by utilizing a regression algorithm, the collected spectrum data are analyzed and calculated by utilizing the established model, and the prediction of the effective boron content of the soil corresponding to the spectrum data is obtained. The method solves the problem of rapid nondestructive detection of the effective boron content in the soil by using a non-imaging hyperspectral meter with the wave band of 200-1700nm, and the problem of insufficient boron prediction precision by using the mid-infrared spectrum (2500-25000 nm), the HIS imaging hyperspectral (400-1000nm) and the airborne imaging hyperspectral at present, and greatly improves the prediction precision.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a method for predicting the effective boron content of hyperspectral soil provided by the invention;
FIG. 2 is a schematic view of a soil sampling area provided by the present invention;
FIG. 3 is a schematic diagram of an average spectrum before various pre-processing transformations provided by the present invention;
FIG. 4 is a schematic diagram of the average spectra after various pre-processing transformations provided by the present invention;
FIG. 5 is a graph showing R of regression models for all test data sets transformed by preprocessing according to the present invention2A value diagram;
FIG. 6 is a schematic diagram of the RPIQ values of the regression model for different pre-processing transformations provided by the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a method for predicting the effective boron content of hyperspectral soil, which predicts the effective boron content of soil according to visible near-infrared hyperspectral data by preprocessing modeling, firstly, preprocessing and converting the collected visible near-infrared hyperspectral data, denoising the visible near-infrared hyperspectral data of the soil, deleting noise invalid bands, and performing detrending correction processing on the denoised hyperspectral data by adopting a detrending correction method; and then establishing a soil effective boron content prediction model by combining a regression algorithm, so that the soil effective boron content is predicted by utilizing soil spectral data according to the soil effective boron content prediction model, and mapping the model data to a high-dimensional characteristic space by adopting a support vector machine and a radial basis function Gaussian kernel function in the regression algorithm to construct a VIS-NIR spectral model. The method can realize nondestructive, real-time, quick and accurate indoor detection of the effective boron content of the soil by utilizing visible near-infrared hyperspectrum.
Examples
And collecting soil samples to perform various pretreatments and modeling and predicting effect comparison by a regression algorithm.
(1) Collecting soil samples
188 parts of yellow red soil samples are collected in a typical mountain area in the south of a certain province, and the geographic coordinates of the samples are 117 degrees 29 '7' to 118 degrees 11 '1' in the east longitude and 30 degrees 8 '23' to 30 degrees 22 '25' in the north latitude, as shown in FIG. 2. Sampling by using a diagonal sampling method, wherein the sampling depth is between 0 and 20 cm. After removing plant roots, broken stones and impurities, collecting 1.5kg of pure soil samples, numbering, air-drying, grinding and sieving by a sieve with the diameter of 2 mm. Performing hyperspectral analysis and effective boron physical and chemical detection on each sample by VIS-NIR and azomethine-H acid colorimetric methods.
VIS-NIR measurements were performed using a portable non-imaging spectrometer (Ocean Optics OFS-1700) with a spectral range of 200-1700nm, spectral resolution of 2nm at 200-950 nm and 5nm at 950-1700 nm. The resampling interval was 1 nm. The measurements between 200 and 349nm were filtered as noise. The denoising process changes the original 1501 bands into 1306 bands.
The treated soil particles of 2mm are placed in a sample container, and the sample container is covered by a piece of black cloth to prevent stray light interference. And randomly selecting 3 groups of soil samples for spectral measurement, and taking an average spectral value as soil spectral data.
(2) Pre-processing transformations
A total of 29 methods were used for the preprocessing transformation, including trending correction (DT), first derivative transformation (FD), second derivative transformation (SD), logarithmic transformation (LG), Mean Centering (MC), Multivariate Scatter Correction (MSC), standard normal variable transformation (SNV), and Savitzky-Golay convolution Smoothing (SG), applied alone or in combination, as shown in table 1. Wherein, SG treatment is generally used for removing the marginal zone of spectral curve, obviously eliminates the influence of high frequency noise, improves the SNR, and furthest keeps the peak value characteristic of original spectral signal. Although FD and SD can effectively cancel the linear baseline effect, post-processing noise can be amplified. SNV was used to calibrate the effects of soil particle size and surface scattering, while MC and DT reduced spectral shifts. Therefore, the combination of various pretreatment methods can integrate their advantages and eliminate their disadvantages.
1) Savitzky-Golay convolution smoothing
Savitzky-Golay (SG) convolution smoothing is a common denoising algorithm in soil hyperspectral data analysis, a certain amount of noise signals exist in soil spectra due to interference of a spectrometer detector and environmental factors, and the spectrum smoothing can improve the spectral signal-to-noise ratio, so that the influence of noise on prediction accuracy is reduced. The SG algorithm is an improvement on moving average smoothing, which is based on local polynomial least squaresFitting coefficients c of points in a moving windowkThe value of the spectrum at the i-th wavelength after smoothing by SG is:
Figure BDA0003507113470000051
wherein m is the number of points in the smooth window on one side of the wavelength,
Figure BDA0003507113470000052
to normalize the index, ckTo smooth coefficients, wherein the window size and polynomial order have an effect on the curve processing.
2) Derivative transformation
The derivative transformation has the effect of removing the influence of baseline drift and enhancing the soil spectral linearity characteristics, and is therefore often used as a preprocessing transformation in spectral analysis. The First Derivative (FD) transformation of the original spectrum can eliminate the influence of the baseline constant, and the Second Derivative (SD) transformation can eliminate the influence of the First linear baseline.
xi,FD=x′i
xi,SD=x″i
3) Standard normal transformation
The Standard normal transform (SNV) method is mainly used to eliminate the influence of soil particle size, optical path variation and surface scattering on the spectrum, and scattering phenomena are generated due to uneven soil particle size and uneven soil surface, thus forming interference signals. Common methods for spectrally enhanced scatter correction are standard normal transform (SNV) and Multivariate Scatter Correction (MSC). SNV firstly averages all samples to obtain an average spectrum
Figure BDA0003507113470000053
Subtracting the average value from the original spectrum variable, and dividing by the standard deviation of the sample, and converting the wavelength j of each spectrum i by each wavelength j according to the following formula:
Figure BDA0003507113470000061
in the formula: x is the number ofij,SNVIs the spectrum processed by SNV algorithm, n is the number of variables, xijIs the value of the jth variable of the ith sample,
Figure BDA0003507113470000064
is the average of all sample spectra.
4) Multivariate scatter correction
The Multivariate Scatter Correction (MSC) preprocessing algorithm is similar to the Standard normal transformation algorithm (SNV), but more complex than SNV. The MSC can effectively eliminate or reduce the influence error of the baseline shift caused by scattering between soil samples.
The calculation process of the multivariate scattering correction is as follows: transforming the original spectrum a) λ) of the entire sample into a spectrum a of the desired reference grain size0(λ), the values of α and β are specified by the least square method, and the estimated values of the two factors are set to α 'and β', respectively, as represented by the formula a (λ) ═ α0A0(λ) + β + e (λ) gives a down-conversion:
A0(λ)=[A(λ)-β]/α
obtaining spectral data for α 'and β' may use the average spectrum of all soil samples, as shown in the following equation:
Figure BDA0003507113470000062
the linear regression equation is:
Figure BDA0003507113470000063
in the formula: a. thei,MSCAnd (3) representing the spectral data of the ith soil sample, wherein A is a modeling light collection spectrum matrix, and alpha and beta values can be obtained through linear regression analysis.
5) Logarithmic transformation
And carrying out Logarithmic transformation (LG) on the original soil reflectivity spectrum to obtain an absorption spectrum of the soil, so that the spectral intensity and the target concentration are in a linear relation, and the original soil spectral reflectivity is converted into the absorption rate. The expert and scholars at home and abroad prove that the method can increase the accuracy of the prediction model of the soil property through research, and is a common spectrum pretreatment method. Logarithm base 10, for original spectrum xiThe transformation is performed, the formula is as follows:
xi,LG=log(1/xi)
6) mean centering
Mean Centering (MC) is applied in machine learning algorithms, regression analysis and neural network training processes, can eliminate errors caused by data self-variation and large differences between data, and can avoid the influence of large or small values of various spectral features. Before modeling, 0-mean-value centralization standardization processing is carried out on the spectral signal data and the soil attribute data, and the processed data conform to standard normal distribution, namely the mean value is 0 and the standard deviation is 1. When mean value centralization processing is carried out on the spectral matrix, data standardization processing is also carried out on the physicochemical value to be measured. The spectra after MC treatment at wavelength i are given by:
Figure BDA0003507113470000071
in the formula, xiRepresenting the spectral reflectance at the wavelength i,
Figure BDA0003507113470000073
represents the average of all sample reflectivities.
7) Data de-trending
Dislodge attenuation (DT), data detrending, is a method used in spectral analysis to eliminate or reduce baseline drift of diffuse reflectance spectra by first polynomial fitting the spectrum xiIs fitted to the wavelength ofPotential line diThen d is addediFrom xiSubtract, reduce the overall change, i.e.:
xi,DT=xi-di
TABLE 1 pretreatment method for soil sample visible near infrared spectrum
Figure BDA0003507113470000072
Figure BDA0003507113470000081
(3) Regression algorithm
A total of 9 algorithms were used for regression. Among them, Support Vector Regression (SVR) is a popular algorithm in the field of machine learning. Different kernel functions, including linear, polynomial, sigmoid, and Radial Basis Function (RBF), are used to map the input to the high-dimensional feature space.
Suppose a sample is (x)i,yi) I ═ 1,2,3, …, n; wherein xi=(xi1,xi2,xi3,…,xip)TIs an explanatory variable of dimension p, yiAnd the response variable corresponding to the ith observation value. At sample xiUpgrading to a high dimensional data space
Figure BDA0003507113470000084
The optimal linear function of (i) is:
Figure BDA0003507113470000082
where ω is the weight and b is the bias term.
Introducing an insensitive error function epsilon in a high dimension, which is specifically defined as
Figure BDA0003507113470000083
The insensitive error function epsilon is respectively provided with a linear insensitive error function, a quadratic insensitive error function and a Huber insensitive error function.
A gaussian function is commonly used in the radial basis function, which represents a real-valued function in which 1 value depends only on the distance from the origin, and is a nonlinear kernel function commonly used in the regression algorithm, and the formula is as follows:
Figure BDA0003507113470000091
in the formula, the first step is that,
Figure BDA0003507113470000092
is the squared euclidean distance between the two eigenvectors; σ is a free parameter.
Sigmoid function, also called S-type growth curve, whose formula is defined as follows:
Figure BDA0003507113470000093
ridge regression is a special biased estimation regression method for collinear data analysis, is essentially an improved least square estimation method, obtains a regression coefficient more consistent with the actual and more reliable regression method by giving up unbiased property of the least square method and at the cost of losing partial information and reducing precision, and has stronger fitting to pathological data than the least square method. When the linear regression model has highly correlated arguments, the ridge regression estimates the coefficients of the multiple regression model by creating ridge regression estimators, which provides a more accurate approximation of the ridge parameters.
Lasso regression performs variable selection and regularization to improve the prediction accuracy and interpretability of the model. The lasso process encourages the use of simple sparse models with fewer parameters and is well suited to models with higher levels of multicollinearity.
Elastic net is a regularized regression method that linearly integrates the penalties of lasso and ridge regression methods to effectively shrink coefficients (as in ridge regression) and set some coefficients to zero (as in lasso).
(4) Evaluation index
The present invention employs a coefficient of determination (R)2) Root Mean Square Error (RMSE), and performance deviation Ratio (RPD) as prediction evaluation indicators.
Figure BDA0003507113470000094
Figure BDA0003507113470000095
Figure BDA0003507113470000096
Wherein n is the number of samples in the prediction set, yiIs the actual chemical measurement of the ith sample,
Figure BDA0003507113470000098
for the model prediction value of the ith sample,
Figure BDA0003507113470000097
is yiAverage value of (d);
s.d is the standard deviation, and the model was classified into different levels according to different RPD values, as shown in table 2.
TABLE 2 classes of different models based on RPD values
RPD Level
RPD≤1.4 C
1.4<RPD≤2.0 B
RPD>2.0 A
Since soil physical properties and chemical content generally exhibit a biased normal distribution, the ratio of performance to iq (rpiq) is a better indicator than RPD. RPIQ is the ratio of IQ to RMSE, where IQ is the difference between the third quartile Q3 (75% of the sample) and the first quartile Q1 (25% of the sample). The larger the RPIQ value, the better the model performance.
IQ=Q3-Q1(4)
Figure BDA0003507113470000101
In summary, R is compared2RMSE, RPD and RPIQ to perform regression model comparisons.
(5) Comparison results
1) Soil sample statistics
The 188 soil samples were divided into training and test sets according to the Kennard-Stone method at a ratio of 7:3, resulting in a training set of 131 samples and a test set of 57 samples. Statistical indexes show that the distribution modes of the effective boron contents of the two groups of soil are different, which is beneficial to model training and universality and is shown in the following table 3.
TABLE 3 soil available boron sample statistics
Figure BDA0003507113470000102
Pre-processing is an essential step for accurate VIS-NIR spectroscopic analysis. Various pre-processing methods are employed to filter noise and reduce complexity. Fig. 3-4 show the reflection spectra (Rs) with different pretreatments. Since the SG method is generally used to reduce spectral noise and smooth curves, it is always combined with other preprocessing methods, fig. 3 is an average spectral image without Savitzky-golay (SG) processing, and fig. 4 is an average spectrum with Savitzky-golay (SG) processing. As can be seen from the illustration, the pattern of the spectral curves is significantly modified by methods other than the scatter correction method, SNV and MSC. FD. SD and LG almost completely reshaped the curve.
2) Performance evaluation of different regression models
The preprocessing transformation and the regression algorithm are combined to generate 270 VIS-NIR spectral models. FIG. 5 illustrates R for each model of the test set2The value is obtained. The SVM method using the RBF kernel exhibits the highest R in prediction of test data regardless of which preprocessing transform is employed2Value, next to the PLS model with RBF kernel. Regardless of the regression model used, pretreatment of SD, MSC + SD, or SNV + SD will generally yield the worst R2Especially for SNV + SD. The RPD levels and RPIQ of the models are shown in table 4 and fig. 6, with fig. 6 showing regression model RPIQ values for different pre-processing transforms. And R2The results are consistent, SVM with RBF kernel and PLS with RBF produced the most level A results. The elastonet and Lasso models generally do not perform as well as the other models.
Table 4 below shows the RPD levels for each model to determine the effect of pretreatment. Level a indicates the highest stability of the model, while level C indicates the lowest stability. Without any preprocessing, the RS data set did not reach class a (table 4). The Elasticnet, Lasso and SVM _ Sigmoid models appear at level C, while the Ridge, SVM _ Linear and SVM _ RBF models increase their level to B. After pretreatment of DT, LG, SNV, MSC, SNV + DT, SG + DT or SG + SNV + DT, and further regression through SVM _ RBF, VIS-NIR data can be generated into a class a model (table 4). This indicates that DT or SNV is preferable to other preprocessing transformations.
TABLE 4 RPD levels of regression models for different preprocessing transformations
Figure BDA0003507113470000111
Figure BDA0003507113470000121
Table 5 shows the statistics of RPD levels based on the type of pretreatment. Although preprocessing transforms are expected to reduce noise and improve accuracy, some transforms produce worse results than the original spectrum, especially for FD and SD. Most of the conversions involving FD and SD resulted in all C-levels, which strongly suggests that these two conversions are not suitable for efficient boron prediction based on VIS-NIR data. The DT and LG methods improve the overall performance to a better level than the raw RS data. The performance of MSC and SNV improved in some models but decreased in others. Although SG processing is a typical pre-processing used in NIR data analysis, no observable improvement was detected.
TABLE 5 RPD level statistics based on different models of preprocessing
Figure BDA0003507113470000131
Figure BDA0003507113470000141
Table 6 shows the statistical results of different model RPD levels based on the regression method. The PLS model produced the most level a results, indicating its predictive stability.
TABLE 6 RPD level statistics for different models based on regression method
Figure BDA0003507113470000142
(6) VIS-NIR favorable model for predicting available boron
Different regression algorithms are combined with different pre-processing to generate the best model for each regression algorithm. Elastiscreen and SVM _ RBF need to incorporate DT preprocessing to present the best model, and Ridge. SVM _ Linear and SVM _ Sigmoid are the best partners for LG. The SG conversion method is preferably used for PLS. SG is the best choice for Lasso regression. All of these combinations result in R2≧ 0.72, and SVM _ RBF gives the highest R2(0.82) and optimal RPD level (class a). Thus, different regression algorithms correspond to different pre-treatments to achieve the best performance, and DT + SVM _ RBF performs best in all test models of embodiments of the invention. And R2Consistent with the RPD level results, the RPIQ value of SVM _ RBF is highest in these models. In conclusion, the SVM _ RBF algorithm shows the best performance in predicting the effective boron content of soil through VIS-NIR. The properties and parameters of the best model are shown in table 7 below.
TABLE 7 Performance and parameters of the best model
Figure BDA0003507113470000143
Figure BDA0003507113470000151
In the verification process of the invention, 29 pretreatment conversions are carried out, original RS data are added, and the combination with 9 regression algorithms generates 270 models for predicting the effective boron content of the soil based on VIS-NIR spectra of soil samples. Of all the generated models, the SVM _ RBF model with DT pre-processing and the PLS _ RBF model with SG _ SNV _ DT transformation are significantly better than the others and give R respectively2Values of 0.82, 0.80 and RPD level a are shown in table 7. SVM is widely used for calibration of VIS-NIR spectra, and the nonlinear RBF kernel is a Gaussian kernel.
Since the number of samples is much smaller than the number of features, i.e., the number of frequencies, the gaussian kernel plays a role in reducing dimensions. When R is2The performance of the PLS _ RBF model is practically comparable to the performance of the SVM _ RBF when the RMSE, RPD and RPIQ metrics are used for performance evaluation, as shown in table 7. The best two models utilize RBF models, which show the effectiveness of Gaussian kernel in predicting the effective mineral substances of soil and consolidate the aspect of reducing the content of dimension to the soilNecessity of prediction of amount. The DT preprocessing method filters this trend and reflects true fluctuations, thereby eliminating fraudulent correlations. Furthermore, DT conversion typically occurs after SNV, and the results also indicate that SNV and SNV + DT preprocessing alone in the SVM _ RBF model show a class a results, as shown in table 4. Whatever regression algorithm is used, DT preprocessing appears to generally improve model performance when superimposed on other transformations. The results also show that SNV may generate an acceptable soil-efficient boron prediction model when well adjusted using a regression algorithm.
Meanwhile, as shown in table 7, the RBF kernel (SVM or PLS) generated the most a-level models, R of SVM _ RBF model2The value was 0.82 max, as shown in table 7. This suggests that the RBF core may be particularly suitable for some (but not all) pre-treatments. In contrast, the prediction accuracy of the SVM _ RBF model is higher.
Although the pre-treatment transformation is expected to smooth the curve, reduce noise, and improve model performance, not all pre-treatments are effective in soil-efficient boron prediction. Savitzky-Golay was used for almost every near infrared analysis as a standard preparation for soil spectral curves. However, the results show that SG contributes little to the improvement in model performance. In some models it leads to even worse performance, for example DT + SG versus DT alone in the Elasticnet model, as shown in table 4. Furthermore, the SD transform resulted in a severe degradation in performance for almost all models, strongly indicating that the method is not suitable for VIS-NIR based soil effective chemical content prediction analysis.
In summary, the optimal soil boron content prediction model is generated from DT preprocessed spectral data, and is provided with an SVM algorithm of RBF kernel function, and then SG _ SNV _ DT transformation and PLS of RBF kernel. Under the parameters of C200000 and gamma1 of the SVM _ RBF model, a high R of 0.82 is achieved2Value and RPD level a. The SVM _ RBF algorithm is significantly better than the other algorithms and in most cases the SD preprocessing results in poor performance.
Example 2
(1) And collecting an experimental sample.
The experimental sample is collected in a rape field of two places in a certain province from 7 month and 16 days to 7 month and 19 days in 2018, and the rape is a boron-favorite crop, has high boron demand and strong boron absorption capacity and is very sensitive to boron nutrition. The total collection time of 200 soil samples (188 effective samples) is just after the rape is ripe and harvested, the data in the period are less influenced by other external factors, and the effective boron information of the soil can be well shown. The data acquisition mainly comprises field soil sample acquisition and indoor spectral measurement. When the soil is collected, in order to reduce errors caused by uneven distribution of the content of nutrient elements in the soil sample, and the soil sample is representative, a diagonal sampling method is adopted.
The sampling process is as follows: firstly removing impurities such as vegetation and stones on the surface of soil, then collecting soil with the depth of 20cm of a plough layer by using a soil sampler, fully mixing the collected soil, removing redundant soil by using a quartering method, and collecting a pure soil sample of about 1.5kg as 1 experimental sample. Finally, a sterile sealing bag is used for bagging, and a black marking pen is used for marking the bag to facilitate identification.
The measurement is obtained soil spectrum and is gone on in the laboratory, and experimental operation needs two personnel cooperation work, selects the time quantum at 9 am to 16 pm every day to ensure that light is sufficient stable, reduces the interference. Firstly, a soil sample which is air-dried, ground and sieved in the front is put into a laboratory soil aluminum box with the diameter of 4.5cm and the depth of 2.5cm, thick velvet black cloth is padded at the bottom, and the velvet black cloth is used for avoiding stray light interference. And then, a steel ruler is used for flattening and smoothing the surface of the soil sample, one person holds a reflection probe of a spectrometer to compact the soil, the soil is kept stable and static for five seconds, stray light interference and light leakage are prevented, meanwhile, another assistant person opens an instrument by using a computer to control Uspectral-RIT software to carry out spectrum acquisition, 3 positions of each soil sample are fixedly selected for carrying out spectrum measurement, 3 spectrums of each soil sample are measured, and the average value of the spectrums is taken as the original spectrum DN value of the soil sample. In order to reduce experimental errors, the whole experiment participators, the experiment steps and the operation flow are kept constant.
The spectrum collection instrument is a portable ground object non-imaging spectrometer (model: OFS-1700) produced by Ulsea optical instruments Inc., the wavelength range is 200nm-1700nm, and the temperature of the experimental working environment is normal temperature. A ground object non-imaging hyperspectral acquisition system is set up in a laboratory and used for acquiring non-imaging hyperspectral data of a soil sample, and the system mainly comprises an OFS-1700 spectrometer, a standard ground object reflection probe, an associative ThinkPadE450 computer, optical fibers and other components.
The spectrum range of the OFS-1700 spectrometer is 200nm-1700nm (because the head and tail parts of the spectrum have larger noise, the spectrum region with larger head and tail noise is usually cut off, and 350-. When the OFS-1700 spectrometer is used for collecting soil near-infrared hyperspectral data, a standard ground object reflection probe is connected with the spectrometer through an optical fiber, light emitted by a light source in the reflection probe is reflected by soil and enters the spectrometer, a sensor in the spectrometer responds to the reflected light, the light reflected by the soil is converted into a brightness value (DN value) in the spectrometer, the DN value collected by the spectrometer is stored in a txt file through Uspctral-RIT software (note that direct data measured by the spectrometer is the brightness value and the non-reflectivity of the soil), the txt file comprises two contents, the first column is the wavelength value of each wavelength point, and the second column is the DN value corresponding to each wavelength point. Before each soil sample is measured, firstly, standard white board correction is carried out on a spectrometer, DN value files of the white board when signals are bright and dark are respectively stored, then, the soil sample is subjected to spectrum measurement by the same method, the DN value files of the soil sample when the signals are bright and dark are stored, and soil spectrum reflectivity data are obtained through formula calculation.
The formula for calculating the reflectivity is as follows:
Figure BDA0003507113470000181
in the formula: r represents the reflectance value of the sample to be measured, SSample preparationIndicating DN-value data, S, in the condition of the probe being placed on the object to be measured and the light being turned onDark sampleIndicating DN value data, S in the condition of the probe being placed on the object to be measured and the light being turned offStandard lampIndicating the DN value data, S in the on-light condition with the probe placed on a standard white boardStandard darkIndicating the DN value data in a light off condition with the probe placed on a standard white board.
(2) And measuring the soil related physicochemical parameters of the collected soil sample.
The collected soil sample data is sent to resource and environment colleges of agriculture university of Anhui province, and professionals measure physicochemical parameters related to the soil.
(3) The VIS-NIR spectrum model is adopted to carry out prediction analysis on the soil sample spectrum to obtain a prediction value of the effective boron content of the soil, and the prediction value is compared with the effective boron content obtained by soil parameter measurement to verify the prediction reliability of the method.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (3)

1. A method for predicting the content of available boron in hyperspectral soil is characterized by comprising the following specific steps:
step 1: collecting visible near-infrared hyperspectral data of soil;
step 2: preprocessing and transforming visible near-infrared hyperspectral data of soil to obtain model data;
and step 3: building a VIS-NIR spectrum model by adopting a regression algorithm for the model data, and training to obtain a soil effective boron content prediction model;
and 4, step 4: and inputting the soil hyperspectral image to be predicted into the soil effective boron content prediction model, and predicting to obtain the soil effective boron content.
2. The method for predicting the content of the available boron in the hyperspectral soil according to claim 1, wherein the pretreatment process in the step 2 is as follows: denoising the soil visible near-infrared hyperspectral data, deleting noise invalid wave bands, and then performing detrending correction processing on the denoised hyperspectral data by adopting a detrending correction method.
3. The method for predicting the content of available boron in hyperspectral soil according to claim 1, wherein the regression algorithm in the step 3 adopts a support vector machine and a radial basis function Gaussian kernel function to map the model data to a high-dimensional feature space so as to construct a VIS-NIR spectral model.
CN202210141354.2A 2022-02-16 2022-02-16 Method for predicting content of available boron in hyperspectral soil Pending CN114509404A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210141354.2A CN114509404A (en) 2022-02-16 2022-02-16 Method for predicting content of available boron in hyperspectral soil

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210141354.2A CN114509404A (en) 2022-02-16 2022-02-16 Method for predicting content of available boron in hyperspectral soil

Publications (1)

Publication Number Publication Date
CN114509404A true CN114509404A (en) 2022-05-17

Family

ID=81552030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210141354.2A Pending CN114509404A (en) 2022-02-16 2022-02-16 Method for predicting content of available boron in hyperspectral soil

Country Status (1)

Country Link
CN (1) CN114509404A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102636450A (en) * 2012-04-18 2012-08-15 西北农林科技大学 Method for detecting wolfberry polyose content in Chinese wolfberry in a nondestructive way based on near infrared spectrum technology
CN206114526U (en) * 2016-09-18 2017-04-19 安徽农业大学 Soil topsoil nutrient dynamic testing system based on spectral reflectivity
US20180172659A1 (en) * 2016-12-16 2018-06-21 Farmers Edge Inc. Classification of Soil Texture and Content by Near-Infrared Spectroscopy
WO2019028540A1 (en) * 2017-08-10 2019-02-14 Speclab Holding S.A. Method of soil fertility analysis by chemical and physical parameters using vis-nir spectroscopy in large-scale routine
CN109682762A (en) * 2017-10-18 2019-04-26 朱桂华 A kind of soil organic matter content evaluation method based on EO-1 hyperion
CN110082310A (en) * 2019-05-30 2019-08-02 海南大学 A kind of near infrared band EO-1 hyperion diagnostic method of rubber tree LTN content
CN110376139A (en) * 2019-08-05 2019-10-25 北京绿土科技有限公司 Soil organic matter content quantitative inversion method based on ground high-spectrum
CN110907393A (en) * 2019-11-22 2020-03-24 黑龙江八一农垦大学 Method and device for detecting saline-alkali stress degree of plants
CN113971989A (en) * 2021-09-10 2022-01-25 广西壮族自治区林业科学研究院 Forest soil organic carbon content high-spectrum modeling method based on OPLS

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102636450A (en) * 2012-04-18 2012-08-15 西北农林科技大学 Method for detecting wolfberry polyose content in Chinese wolfberry in a nondestructive way based on near infrared spectrum technology
CN206114526U (en) * 2016-09-18 2017-04-19 安徽农业大学 Soil topsoil nutrient dynamic testing system based on spectral reflectivity
US20180172659A1 (en) * 2016-12-16 2018-06-21 Farmers Edge Inc. Classification of Soil Texture and Content by Near-Infrared Spectroscopy
WO2019028540A1 (en) * 2017-08-10 2019-02-14 Speclab Holding S.A. Method of soil fertility analysis by chemical and physical parameters using vis-nir spectroscopy in large-scale routine
CN109682762A (en) * 2017-10-18 2019-04-26 朱桂华 A kind of soil organic matter content evaluation method based on EO-1 hyperion
CN110082310A (en) * 2019-05-30 2019-08-02 海南大学 A kind of near infrared band EO-1 hyperion diagnostic method of rubber tree LTN content
CN110376139A (en) * 2019-08-05 2019-10-25 北京绿土科技有限公司 Soil organic matter content quantitative inversion method based on ground high-spectrum
CN110907393A (en) * 2019-11-22 2020-03-24 黑龙江八一农垦大学 Method and device for detecting saline-alkali stress degree of plants
CN113971989A (en) * 2021-09-10 2022-01-25 广西壮族自治区林业科学研究院 Forest soil organic carbon content high-spectrum modeling method based on OPLS

Similar Documents

Publication Publication Date Title
Hong et al. Comparing laboratory and airborne hyperspectral data for the estimation and mapping of topsoil organic carbon: Feature selection coupled with random forest
Ali et al. Comparing methods for mapping canopy chlorophyll content in a mixed mountain forest using Sentinel-2 data
CN101915744B (en) Near infrared spectrum nondestructive testing method and device for material component content
Muñoz et al. Soil carbon mapping using on-the-go near infrared spectroscopy, topography and aerial photographs
Wei et al. Common spectral bands and optimum vegetation indices for monitoring leaf nitrogen accumulation in rice and wheat
Song et al. Chlorophyll content estimation based on cascade spectral optimizations of interval and wavelength characteristics
Hong et al. Fusion of visible-to-near-infrared and mid-infrared spectroscopy to estimate soil organic carbon
Shafri et al. Detection of stressed oil palms from an airborne sensor using optimized spectral indices
Cao et al. Hyperspectral inversion of nitrogen content in maize leaves based on different dimensionality reduction algorithms
Stramski et al. Ocean color algorithms to estimate the concentration of particulate organic carbon in surface waters of the global ocean in support of a long-term data record from multiple satellite missions
Xu et al. Improving the accuracy of soil organic carbon content prediction based on visible and near-infrared spectroscopy and machine learning
Du et al. Application of spectral indices and reflectance spectrum on leaf nitrogen content analysis derived from hyperspectral LiDAR data
Hu et al. Soil phosphorus and potassium estimation by reflectance spectroscopy
Tian et al. Extracting red edge position parameters from ground-and space-based hyperspectral data for estimation of canopy leaf nitrogen concentration in rice
Ma et al. Rapid detection of total nitrogen content in soil based on hyperspectral technology
Wu et al. Study on the optimal algorithm prediction of corn leaf component information based on hyperspectral imaging
Zhao et al. Temporal resolution of vegetation indices and solar-induced chlorophyll fluorescence data affects the accuracy of vegetation phenology estimation: A study using in-situ measurements
Gao et al. Diagnosis of maize chlorophyll content based on hybrid preprocessing and wavelengths optimization
Song et al. Temporal instability of partial least squares regressions for estimating leaf photosynthetic traits from hyperspectral information
CN116578851A (en) Method for predicting effective boron content of hyperspectral soil
Liu et al. Estimation of chlorophyll content in maize canopy using wavelet denoising and SVR method
CN116148210A (en) Fruit quality detection method and device
CN115508292A (en) Soil profile nitrogen content high spectrum detection and visualization method based on machine learning
CN114509404A (en) Method for predicting content of available boron in hyperspectral soil
Mondal et al. VIS-NIR reflectance spectroscopy for assessment of soil organic carbon in a rice-wheat field of Ludhiana district of Punjab

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220517

RJ01 Rejection of invention patent application after publication