CN110991064B - Soil heavy metal content inversion model generation method, system and inversion method - Google Patents

Soil heavy metal content inversion model generation method, system and inversion method Download PDF

Info

Publication number
CN110991064B
CN110991064B CN201911265449.XA CN201911265449A CN110991064B CN 110991064 B CN110991064 B CN 110991064B CN 201911265449 A CN201911265449 A CN 201911265449A CN 110991064 B CN110991064 B CN 110991064B
Authority
CN
China
Prior art keywords
heavy metal
soil
metal content
sample
spectral
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911265449.XA
Other languages
Chinese (zh)
Other versions
CN110991064A (en
Inventor
郭云开
钱佳
张晓炯
章琼
张思爱
谢晓峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou City Construction College
Original Assignee
Guangzhou City Construction College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou City Construction College filed Critical Guangzhou City Construction College
Priority to CN201911265449.XA priority Critical patent/CN110991064B/en
Publication of CN110991064A publication Critical patent/CN110991064A/en
Application granted granted Critical
Publication of CN110991064B publication Critical patent/CN110991064B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The invention discloses a method and a system for generating a soil heavy metal content inversion model, a storage medium and an inversion method, wherein the method comprises the following steps: acquiring the heavy metal content and indoor spectral data of a soil sample; preprocessing the indoor spectral data and extracting a heavy metal spectral characteristic waveband by adopting a competitive self-adaptive re-weighting method; constructing an actual measurement sample set; a virtual soil sample set is generated based on the actually measured sample set by using a virtual least square based sample generation method; and training to obtain a BP neural network heavy metal content inversion regression model by taking the heavy metal spectral characteristic wave band as input and the corresponding heavy metal content as output based on the actual measurement and virtual sample set. In the process of constructing a training sample set, a virtual sample set is generated based on a least square virtual sample generation method, the number of actual measurement samples is reduced, and the sample acquisition cost and time in soil heavy metal content inversion are reduced; and sufficient training samples ensure the accuracy, robustness and generalization capability of the heavy metal content inversion model.

Description

Soil heavy metal content inversion model generation method, system and inversion method
Technical Field
The invention relates to the technical field of environmental monitoring, in particular to a soil heavy metal content inversion model generation method, a soil heavy metal content inversion model generation system, a storage medium and an inversion method.
Background
Heavy metal contamination of soil has always been a serious environmental problem, especially for countries that are undergoing rapid industrialization and urbanization. According to investigation, more than 10% of cultivated land in China is polluted by heavy metal. Since heavy metals are difficult to be degraded by microorganisms in soil, the growth of crops is seriously affected by the accumulated effect, and further, the food safety and the human health risk are caused. Therefore, how to rapidly and accurately detect the heavy metal pollution condition of the soil and prevent the heavy metal pollution condition becomes a key problem to be solved urgently at present.
The traditional heavy metal monitoring method is time-consuming and expensive, and is particularly suitable for large-area soil sampling and chemical analysis. The method can not meet the important requirements of China on quickly and efficiently checking the spatial pattern of heavy metal pollution in wide-area geographic space, finding out the spatial-temporal evolution rule and the formation mechanism of heavy metal pollution, controlling the pollution range to be expanded and transferred, reasonably planning agricultural production and reducing the pollution to the national health.
At present, a soil heavy metal content inversion method based on a spectral analysis technology is rapidly developed and becomes a main means for soil heavy metal pollution investigation. The main principle is as follows: and observing indoor spectral data of the soil sample by using a ground object spectrometer, exploring the response relation between the heavy metal content of the soil and the full-waveband spectrum of the soil, and constructing a soil heavy metal content spectrum inversion model. The method has the advantage of being less expensive than the traditional statistical analysis method. The inversion models have specific requirements on data characteristics and sample sizes, the performance of the inversion models is easily influenced by the sample size and the number of input and output variables, and sufficient sample capacity and uniform sample distribution are two key factors for determining the accuracy and the robustness of the inversion models. In addition, sufficient training samples provide important guarantees for providing the generalization capability of data-driven models. The training samples are obtained by adopting a traditional heavy metal monitoring method, and in order to ensure the accuracy, robustness and generalization capability of the inversion model, the training samples are required to be sufficient as much as possible, so that the soil heavy metal content inversion is still high in time, time and cost undoubtedly.
Disclosure of Invention
The invention provides a method and a system for generating a soil heavy metal content inversion model, a storage medium and an inversion method, and aims to solve the problems of high time consumption and high cost in soil heavy metal content inversion in the prior art.
The invention provides a method for generating a soil heavy metal content inversion model, which comprises the following steps:
step 1: acquiring the heavy metal content and indoor spectral data of a soil sample;
step 2: preprocessing indoor spectral data of a soil sample and extracting a heavy metal spectral characteristic waveband by adopting a competitive self-adaptive re-weighting method;
and step 3: constructing an actual measurement sample set based on the heavy metal content of the soil sample and the corresponding heavy metal spectral characteristic band;
and 4, step 4: based on the heavy metal content of the soil sample in the actually measured sample set and the heavy metal spectral characteristic wave band corresponding to the heavy metal content, a virtual soil sample set is generated by applying a method based on least square virtual sample generation, the number ratio of the virtual soil sample to the actually measured sample is 1-10: 1, and the actually measured sample set and the virtual soil sample set form a mixed sample set;
and 5: and training to obtain a BP neural network heavy metal content inversion regression model by taking the heavy metal spectral characteristic wave band as input and the corresponding heavy metal content as output based on the mixed sample set.
The method comprises the steps of firstly obtaining corresponding heavy metal content and indoor spectral data through sampling, extracting heavy metal spectral characteristic wave bands from the indoor spectral data of a soil sample by adopting a competitive self-adaptive re-weighting method, then generating a soil virtual sample set by applying a least square virtual sample generation method based on an actually measured sample set constructed by the heavy metal content of the soil sample and the corresponding heavy metal spectral characteristic wave bands, and finally training to obtain a BP neural network heavy metal content inversion regression model based on a mixed sample set formed by the actually measured sample set and the soil virtual sample set, wherein the heavy metal spectral characteristic wave bands are used as input, and the corresponding heavy metal content is used as output. In the process of constructing the training sample set, the virtual sample set is generated by applying a virtual sample generation method based on least square, so that the number of actually measured samples can be greatly reduced, and the cost and time for soil heavy metal content inversion are greatly reduced; the generated virtual sample set provides sufficient training samples for model training, so that the accuracy, robustness and generalization capability of the trained heavy metal content inversion model are ensured.
Further, the step 1 of obtaining the heavy metal content and the indoor spectrum data of the soil sample comprises the following processes:
according to the soil type, the texture and the type of the covering crops in the research area, at least 30 soil samples are uniformly and randomly selected in the research area, the collected soil samples are chemically analyzed and measured for the content of heavy metals in the soil samples in a laboratory by a triacid digestion-atomic absorption spectrophotometry, and meanwhile, an AvaFiled-3 type full-waveband ground object spectrum radiometer is used for carrying out soil spectrum measurement to obtain indoor spectrum data.
Further, the preprocessing the indoor spectral data of the soil sample in the step 2 includes:
performing spectrum resampling on indoor spectrum data of the soil sample:
with the preset spectrum length as an interval unit, performing spectrum resampling on indoor spectrum data of the soil sample by adopting the following formula:
r=[r(λ1)+…+r(λi)+…+r(λf)]/f
wherein r represents the reflectance value of the hyperspectral curve after resampling, f is the number of spectral bands contained in the interval unit range, and r (lambda)i) Represents the ith spectral band λiA reflectance value of (d);
carrying out fractional differential pretreatment on indoor spectral data of the soil sample obtained after the spectrum resampling:
calculating indoor spectral data of each soil sample obtained after spectral resampling by adopting the following formula based on fractional order differentiation:
Figure GDA0003110195450000021
wherein, r (λ)i) For the ith spectral band λiReflectance value of rvi) As the spectral band λiGamma is a Gamma function, s represents the number of spectral bands in the indoor spectral data of the measured soil sample,
Figure GDA0003110195450000031
h is differential step length, t is 0.2 in the invention, t is upper differential limit, 2 in the invention, a is lower differential limit, and 1 in the invention.
The fractional order differential spectrum can effectively improve the correlation between the spectrum and the heavy metal content, and further highlight the heavy metal characteristic wave band.
Further, the step 2 of extracting the heavy metal spectral characteristic wave band by using the competitive adaptive re-weighting method comprises the following steps:
using M to represent the sampling times of the Monte Carlo simulation method, using M to represent the total number of soil samples subjected to indoor spectrum pretreatment, using s to represent the number of spectrum wave bands in indoor spectrum data of the tested soil samples, and using i to represent the ith sampling, wherein the competitive adaptive re-weighting method comprises the following steps:
1) initializing i to 1;
2) judging whether M is equal to or less than i, and if so, entering the step 3); if not, entering step 7);
3) randomly extracting d samples from the sample set after the indoor spectrum pretreatment by adopting a Monte Carlo simulation method, and using V1Represents, then is based on V1Data, taking a heavy metal spectral characteristic wave band as input, taking the corresponding heavy metal content as output, and constructing a partial least squares inversion heavy metal content regression model, wherein m is more than or equal to 0.8 and d is less than m;
4) solving the absolute value of a regression coefficient vector B in the partial least square inversion heavy metal content regression model, expressing the absolute value by Bi, and solving the weight value w of each regression coefficient in the partial least square inversion heavy metal content regression modelj=Bij/sum(Bi) Wherein B isijRepresenting the contribution degree of the jth spectral band sampled at the ith time to the heavy metal content, wherein the greater the value of the contribution degree, the more important the jth spectral band is, and Bi represents the absolute value of the regression coefficient vector at the ith time; removing the spectral band with relatively small weight value by using an exponential decay function, and calculating the proportion r of the reserved spectral bandi=ae-kiWhere e is the base of the natural logarithm function, and a and k can be expressed by the following formula:
a=(s/2)1/(M-1)
Figure GDA0003110195450000032
5) filtering the filtered s multiplied by r by adopting an adaptive re-weighting algorithmiSelecting a subset of spectral bands from the spectral bands by V2Represents;
6) based on spectral band subset V2After the RMSECV value is calculated, V is executed1=V2I +1, go to step 2), where RMSECV represents cross-validation root mean square error;
7) after the M times of sampling are finished, obtaining M spectral band subsets and M corresponding RMSECV values by a competitive self-adaptive re-weighting algorithm;
8) and selecting the spectrum band subset corresponding to the minimum RMSECV value as the optimal spectrum band subset, namely the heavy metal spectrum characteristic band extracted by adopting a competitive self-adaptive re-weighting method.
The competitive adaptive re-weighting method is an optimized characteristic band screening method, and determines the optimal spectral band subset by gradually deleting unimportant spectral bands. The method comprises the steps of firstly determining a wavelength spectrum band with a large regression coefficient absolute value in a partial least squares regression model, and preferably selecting a spectrum band subset with the minimum cross validation Root Mean Square Error (RMSECV) through cross-folding cross validation to determine the spectrum band subset as an optimal spectrum band subset. The method can screen out wavelength spectrum bands sensitive to the heavy metal in the soil, reduce wavelength redundancy and improve calculation efficiency and inversion accuracy.
Further, in step 4, the virtual sample set is generated by applying a virtual sample generation method based on the content of the heavy metal in the soil sample in the actually measured sample set and the heavy metal spectral characteristic band corresponding to the content, specifically including the following steps:
4.1, using the measured sample set as the initial training set D(0)Selecting a field control factor as delta, setting the maximum search times as V, setting the number of virtual soil samples to be generated as R, setting an initialization cyclic variable R as 1, and setting the value range of delta as [0.0001, 0.01 ]];
4.2 based on the initial training set D(0)Training to obtain a partial least square inversion heavy metal content regression model by taking the heavy metal spectral characteristic wave band as input and the corresponding heavy metal content as output, and obtaining initial training by a leave-one verification methodExercise and Collection D(0)Has an absolute training error of E(0)Let E equal to E(0)
4.3, judging whether R is less than or equal to R, if so, entering a step 4.4, otherwise, skipping to a step 4.14;
4.4 from the initial training set D(0)Training error height pair initial training set D of each sample(0)Sample in (1) is ordered from high to low and is recorded as
Figure GDA0003110195450000041
Wherein N is an initial training set D(0)The total number of the medium samples, x represents the heavy metal characteristic spectrum wave band, y represents the corresponding heavy metal content, and the training error is the predicted value of the heavy metal content
Figure GDA0003110195450000042
And measured value ynThe absolute difference between them;
4.5, initializing a cycle variable q to be 1;
4.6 judging whether q is less than or equal to N, if so, entering a step 4.7; if not, making r equal to r +1, and skipping to step 4.3;
4.7, the number of times v of initial search is 1;
4.8, judging whether y is equal to or less than v, and if yes, entering a step 4.9; if not, jumping to the step 4.13;
4.9 randomly generating samples
Figure GDA0003110195450000043
Corresponding to the virtual sample (x) of the soil in the neighborhood of 6(r),y(r));
4.10 based on training set D(r-1)∪(x(r)Y (r)), training to obtain a regression model of partial least squares inversion heavy metal content by taking the spectral characteristic wave band of the heavy metal as input and the corresponding heavy metal content as output, and then calculating to obtain an initial training set D(0)Has an absolute training error of E(r)
4.11, if E(r)< E, order D(r)=D(r-1)∪(x(r),y(r)),E=E(r)R +1, go to step 4.3; otherwise, jumping to 4.12;
4.12, making v ═ v +1, and jumping to step 4.8;
4.13, making q equal to q +1, and jumping to step 4.6;
and 4.14, outputting a soil virtual sample set with the soil virtual sample number of R.
The partial least square parameters are few, the application is simple and convenient, and the method is suitable for generating virtual samples.
Further, step 4.9 the randomly generated samples
Figure GDA0003110195450000051
Corresponding to virtual samples (x) of soil in delta neighborhood(r),y(r)) The method comprises the following steps:
determining a sample
Figure GDA0003110195450000052
Corresponding to the delta neighborhood range: sample(s)
Figure GDA0003110195450000053
The corresponding delta neighborhood is an n + 1-dimensional 'super-cuboid', n +1 represents n heavy metal spectral characteristic wave bands and 1 heavy metal content output, and the lambada is the characteristic wave band of the mth heavy metal spectralmIn other words, the value range of the soil virtual sample is
Figure GDA0003110195450000054
Wherein the content of the first and second substances,
Figure GDA0003110195450000055
representing the original training set D(0)Characteristic wave band lambda of mth heavy metal spectrummA range of (d);
for the heavy metal content y, the value range of the soil virtual sample is
Figure GDA0003110195450000056
Wherein the content of the first and second substances,
Figure GDA0003110195450000057
L(0)representing the original training set D(0)The range of heavy metal content y;
in a sample
Figure GDA0003110195450000058
Randomly generating a soil virtual sample (x) in a corresponding delta neighborhood range(r),y(r))。
Further, the step 5 of training to obtain the back regression model of the heavy metal content of the BP neural network based on the mixed sample set by using the heavy metal spectral characteristic wave band as input and the corresponding heavy metal content as output specifically includes the following steps:
5.1, determining input and output of a BP neural network heavy metal content inversion regression model, inputting the input into a heavy metal spectral characteristic wave band of a mixed sample set, and outputting corresponding heavy metal content;
5.2, establishing a three-layer forward neural network topology structure with g inputs and 1 output, wherein the activation functions of a hidden layer and an output layer are adopted
Figure GDA0003110195450000059
The layers of the Sigmod type function are connected in a full interconnection mode, nodes on the same layer are not connected, and g is the number of the characteristic wave bands of the heavy metal spectrum;
and 5.3, learning and training the BP neural network heavy metal content inversion regression model by using a BP algorithm until preset training precision is met, and then establishing the BP neural network heavy metal content inversion regression model.
The BP neural network is a nonlinear model and is suitable for the hyperspectral heavy metal content inversion of a small sample.
Further, the learning training process consists of a forward propagation part and a backward propagation part;
wherein the forward propagation comprises:
for a given input training pattern X ═ X (X)1,x2,...,xg),xgThe g-th heavy metal spectral characteristic wave band is expressed, firstly transmitted to the hidden layer unit from the input layer unit, processed by the hidden layer unit and then transmittedTo the output layer, and finally processed by the output layer unit to generate an output pattern Y ═ Y1,y2,...,yp),ypRepresenting the output p-th heavy metal content value;
let the number of nodes in input layer, hidden layer and output layer be g, h and p, and the connection right from input layer to hidden layer be wijThe connection weight from the hidden layer to the output layer is vjlThen, the outputs of the nodes of the hidden layer and the output layer are respectively:
Figure GDA0003110195450000061
Figure GDA0003110195450000062
wherein i is 1, 2.., g; j ═ 1, 2,. ·, h; 1 ═ 1, 2,. cndot, p; thetaiThe threshold value for the node of the hidden layer,
Figure GDA0003110195450000066
is the threshold value of the output layer node; f is a function of the Sigmod type,
Figure GDA0003110195450000063
if the output obtained on the output layer does not reach the preset training precision, turning to reverse propagation;
the back propagation includes:
calculating the error value along the original connection path and returning the error value according to wij、vjlUpdating the connection weight and the threshold value of each layer of neuron, reducing the error value,
vjl(t+1)=vjl(t)+αdlzl
Figure GDA0003110195450000064
Figure GDA0003110195450000065
wjl(t+l)=wjl(t)+βejxj
θj(t+1)=θj(t)+βej
Figure GDA0003110195450000071
wherein alpha and beta are learning rates in the range of 0.001 to 0.1,
Figure GDA0003110195450000072
is the desired output of output node l, y1 is the actual output; in the scheme, if the number p of output values is 1, l is also 1;
and selecting a mixed sample set sample as a training sample, repeating the forward propagation and backward propagation processes to train the network until the preset training precision is reached, and establishing a BP neural network heavy metal content inversion regression model.
In a second aspect of the present invention, a computer-readable storage medium is provided, wherein the storage medium comprises stored program instructions, and the program instructions are adapted to be loaded by a processor and execute the soil heavy metal content inversion model generation method as described above.
In a third aspect of the present invention, a soil heavy metal content inversion model generation system is provided, including:
a data acquisition module: the method is used for acquiring the heavy metal content and the indoor spectral data of the soil sample;
a feature extraction module: the method is used for preprocessing indoor spectral data of the soil sample and extracting the spectral characteristic wave band of the heavy metal by adopting a competitive self-adaptive re-weighting method;
a virtual sample generation module: the method comprises the steps of constructing an actual measurement sample set based on the heavy metal content of a soil sample and a heavy metal spectral characteristic wave band corresponding to the heavy metal content, and generating a soil virtual sample set by using a least square virtual sample generation method, wherein the ratio of the number of the soil virtual samples to the number of the actual measurement samples is 1-10: 1;
an inversion model generation module: the method is used for training to obtain a BP neural network heavy metal content inversion regression model based on a mixed sample set formed by an actually measured sample set and a soil virtual sample set, wherein a heavy metal spectral characteristic wave band is used as input, and the corresponding heavy metal content is used as output.
The invention provides a soil heavy metal content inversion method, which comprises the following steps:
acquiring indoor spectral data of a soil sample to be detected;
preprocessing indoor spectral data of a soil sample to be detected and extracting a heavy metal spectral characteristic waveband by adopting a competitive self-adaptive re-weighting method;
and taking the extracted heavy metal spectral characteristic band as an input of a BP neural network heavy metal content inversion regression model, and outputting the corresponding heavy metal content, wherein the BP neural network heavy metal content inversion regression model is generated by adopting the soil heavy metal content inversion model generation method.
According to the soil heavy metal content inversion method, the heavy metal content to be measured can be rapidly inverted through the indoor spectral data of the soil sample to be measured, and compared with the detection through actual measurement, the detection cost and the time cost are greatly reduced.
Advantageous effects
The invention provides a method, a system, a storage medium and an inversion method for generating a soil heavy metal content inversion model. In the process of constructing the training sample set, the virtual sample set is generated by applying a virtual sample generation method based on least square, so that the number of actually measured samples can be greatly reduced, and the cost and time for soil heavy metal content inversion are greatly reduced; the generated virtual sample set provides sufficient training samples for model training, so that the accuracy, robustness and generalization capability of the trained heavy metal content inversion model are ensured.
Drawings
FIG. 1 is a flow chart of a soil heavy metal content inversion model generation method provided by an embodiment of the invention;
FIG. 2 is a graph of a sample point distribution of a study area in an experiment provided by an embodiment of the present invention;
FIG. 3 is a graph of the original spectral reflectance of a soil sample from an experiment provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of a competitive adaptive re-weighting method screening process in an experiment provided by an embodiment of the present invention;
FIG. 5 is a training error curve of heavy metal Cr and Pb models in the experiment provided by the embodiment of the invention;
fig. 6 shows predicted values and measured values of Cr and Pb under the optimal model in the experiment according to the embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
Example 1
As shown in fig. 1, an embodiment of the present invention provides a method for generating a soil heavy metal content inversion model, including the following steps:
step 1: acquiring the heavy metal content and indoor spectral data of a soil sample;
step 2: preprocessing indoor spectral data of a soil sample and extracting a heavy metal spectral characteristic waveband by adopting a competitive self-adaptive re-weighting method;
and step 3: constructing an actual measurement sample set based on the heavy metal content of the soil sample and the corresponding heavy metal spectral characteristic band;
and 4, step 4: based on the heavy metal content of the soil sample in the actually measured sample set and the heavy metal spectral characteristic wave band corresponding to the heavy metal content, a virtual soil sample set is generated by applying a method based on least square virtual sample generation, the number ratio of the virtual soil sample to the actually measured sample is 1-10: 1, and the actually measured sample set and the virtual soil sample set form a mixed sample set;
and 5: and training to obtain a BP neural network heavy metal content inversion regression model by taking the heavy metal spectral characteristic wave band as input and the corresponding heavy metal content as output based on the mixed sample set.
The method comprises the steps of firstly obtaining corresponding heavy metal content and indoor spectral data through sampling, extracting heavy metal spectral characteristic wave bands from the indoor spectral data of a soil sample by adopting a competitive self-adaptive re-weighting method, then generating a soil virtual sample set by applying a least square virtual sample generation method based on an actually measured sample set constructed by the heavy metal content of the soil sample and the corresponding heavy metal spectral characteristic wave bands, and finally training to obtain a BP neural network heavy metal content inversion regression model based on a mixed sample set formed by the actually measured sample set and the soil virtual sample set, wherein the heavy metal spectral characteristic wave bands are used as input, and the corresponding heavy metal content is used as output. In the process of constructing the training sample set, the virtual sample set is generated by applying a virtual sample generation method based on least square, so that the number of actually measured samples can be greatly reduced, and the cost and time for soil heavy metal content inversion are greatly reduced; the generated virtual sample set provides sufficient training samples for model training, so that the accuracy, robustness and generalization capability of the trained heavy metal content inversion model are ensured.
The method for acquiring the heavy metal content and the indoor spectral data of the soil sample in the step 1 comprises the following steps:
according to the soil type, the texture and the type of the covering crops in a research area, at least 30 sample squares with the area of 15m multiplied by 15m are uniformly and randomly selected in the range of the research area, a soil sample is randomly selected in each sample square, the heavy metal content in the soil sample is chemically analyzed and determined on the collected soil sample in a laboratory by adopting a triacid digestion-atomic absorption spectrophotometry, and meanwhile, an AvaFiled-3 type full-waveband ground object spectrum radiometer is adopted to perform soil spectrum determination to obtain indoor spectrum data.
In this embodiment, the preprocessing the indoor spectral data of the soil sample in step 2 includes:
performing spectrum resampling on indoor spectrum data of the soil sample:
with the preset spectrum length as an interval unit, performing spectrum resampling on indoor spectrum data of the soil sample by adopting the following formula:
r=[r(λ1)+…+r(λi)+…+r(λf)]/f
wherein r represents the reflectance value of the hyperspectral curve after resampling, f is the number of spectral bands contained in the interval unit range, and r (lambda)i) Represents the ith spectral band λiA reflectance value of (d);
carrying out fractional differential pretreatment on indoor spectral data of the soil sample obtained after the spectrum resampling:
calculating indoor spectral data of each soil sample obtained after spectral resampling by adopting the following formula based on fractional order differentiation:
Figure GDA0003110195450000091
wherein, r (λ)i) For the ith spectral band λiReflectance value of rvi) As the spectral band λiGamma is a Gamma function, s represents the number of spectral bands in the indoor spectral data of the measured soil sample,
Figure GDA0003110195450000092
h is the differential step, which is 0.2 in this embodiment, t is the upper differential limit, 2 in this embodiment, and a is the lower differential limit, 1 in this embodiment.
The fractional order differential spectrum can effectively improve the correlation between the spectrum and the heavy metal content, and further highlight the heavy metal characteristic wave band.
The process for extracting the heavy metal spectral characteristic wave band by adopting the competitive self-adaptive re-weighting method in the step 2 comprises the following steps:
using M to represent the sampling times of the Monte Carlo simulation method, using M to represent the total number of soil samples subjected to indoor spectrum pretreatment, using s to represent the number of spectrum wave bands in indoor spectrum data of the tested soil samples, and using i to represent the ith sampling, wherein the competitive adaptive re-weighting method comprises the following steps:
1) initializing i to 1;
2) judging whether M is equal to or less than i, and if so, entering the step 3); if not, entering step 7);
3) randomly extracting d samples from the sample set after the indoor spectrum pretreatment by adopting a Monte Carlo simulation method, and using V1Represents, then is based on V1Data, taking a heavy metal spectral characteristic wave band as input, taking the corresponding heavy metal content as output, and constructing a partial least squares inversion heavy metal content regression model, wherein m is more than or equal to 0.8 and d is less than m;
4) solving the absolute value of a regression coefficient vector B in a partial least square inversion heavy metal content regression model, and using BiExpressing, solving the weight value w of each regression coefficient in the regression model of partial least square inversion heavy metal contentj=Bij/sum(Bi) Wherein B isijRepresents the contribution degree of the jth spectral band of the ith sampling to the heavy metal content, the larger the value of the contribution degree, the more important the jth spectral band is, BiRepresents the absolute value of the regression coefficient vector at the ith time; removing the spectral band with relatively small weight value by using an exponential decay function, and calculating the proportion r of the reserved spectral bandi=ae-kiWhere e is the base of the natural logarithm function, e is 2.718281, and a and k can be expressed by the following formula:
a=(s/2)1/(M-1)
Figure GDA0003110195450000101
5) filtering the filtered s multiplied by r by adopting an adaptive re-weighting algorithmiSelecting a subset of spectral bands from the spectral bands by V2Represents;
6) based on spectral band subset V2After the RMSECV value is calculated, V is executed1=V2I +1, go to step 2), where RMSECV represents cross-validation root mean square error;
7) after the M times of sampling are finished, obtaining M spectral band subsets and M corresponding RMSECV values by a competitive self-adaptive re-weighting algorithm;
8) and selecting the spectrum band subset corresponding to the minimum RMSECV value as the optimal spectrum band subset, namely the heavy metal spectrum characteristic band extracted by adopting a competitive self-adaptive re-weighting method.
The competitive adaptive re-weighting method is an optimized characteristic band screening method, and determines the optimal spectral band subset by gradually deleting unimportant spectral bands. The method comprises the steps of firstly determining a wavelength spectrum band with a large regression coefficient absolute value in a partial least squares regression model, and preferably selecting a spectrum band subset with the minimum cross validation Root Mean Square Error (RMSECV) through cross-folding cross validation to determine the spectrum band subset as an optimal spectrum band subset. The method can screen out wavelength spectrum bands sensitive to the heavy metal in the soil, reduce wavelength redundancy and improve calculation efficiency and inversion accuracy.
Further, in step 4, the virtual sample set is generated by applying a virtual sample generation method based on the content of the heavy metal in the soil sample in the actually measured sample set and the heavy metal spectral characteristic band corresponding to the content, specifically including the following steps:
4.1, using the measured sample set as the initial training set D(0)Selecting a field control factor as delta, setting the maximum search times as V, setting the number of virtual soil samples to be generated as R, setting an initialization cyclic variable R as 1, and setting the value range of delta as [0.0001, 0.01 ]]In this embodiment, δ is 0.001;
4.2 based on the initial training set D(0)Training to obtain a partial least square inversion heavy metal content regression model by taking the heavy metal spectral characteristic wave band as input and the corresponding heavy metal content as output, and obtaining an initial training set D by a leave-one verification method(0)Has an absolute training error of E(0)Let E equal to E(0)
4.3, judging whether R is less than or equal to R, if so, entering a step 4.4, otherwise, skipping to a step 4.14;
4.4 from the initial training set D(0)Training error height pair initial training set D of each sample(0)Sample in (1) is ordered from high to low and is recorded as
Figure GDA0003110195450000111
Wherein N is an initial training set D(0)The total number of the medium samples, x represents the heavy metal characteristic spectrum wave band, y represents the corresponding heavy metal content, and the training error is the predicted value of the heavy metal content
Figure GDA0003110195450000112
And measured value ynThe absolute difference between them;
4.5, initializing a cycle variable q to be 1;
4.6 judging whether q is less than or equal to N, if so, entering a step 4.7; if not, making r equal to r +1, and skipping to step 4.3;
4.7, the number of times v of initial search is 1;
4.8, judging whether y is equal to or less than v, and if yes, entering a step 4.9; if not, jumping to the step 4.13;
4.9 randomly generating samples
Figure GDA0003110195450000113
Corresponding to virtual samples (x) of soil in delta neighborhood(r),y(r));
4.10 based on training set D(r-1)∪(x(r),y(r)) Training to obtain a partial least square inversion heavy metal content regression model by taking the heavy metal spectral characteristic wave band as input and the corresponding heavy metal content as output, and then calculating to obtain an initial training set D(0)Has an absolute training error of E(r)
4.11, if E(r)< E, order D(r)=D(r-1)∪(x(r),y(r)),E=E(r)R +1, go to step 4.3; otherwise, jumping to 4.12;
4.12, making v ═ v +1, and jumping to step 4.8;
4.13, making q equal to q +1, and jumping to step 4.6;
and 4.14, outputting a soil virtual sample set with the soil virtual sample number of R.
The partial least square parameters are few, the application is simple and convenient, and the method is suitable for generating virtual samples.
Further, step 4.9 the randomly generated samples
Figure GDA0003110195450000121
Corresponding to virtual samples (x) of soil in delta neighborhood(r),y(r)) The method comprises the following steps:
determining a sample
Figure GDA0003110195450000122
Corresponding to the delta neighborhood range: sample(s)
Figure GDA0003110195450000123
The corresponding delta neighborhood is an n + 1-dimensional 'super-cuboid', n +1 represents n heavy metal spectral characteristic wave bands and 1 heavy metal content output, and the lambada is the characteristic wave band of the mth heavy metal spectralmIn other words, the value range of the soil virtual sample is
Figure GDA0003110195450000124
Wherein the content of the first and second substances,
Figure GDA0003110195450000125
representing the original training set D(0)Characteristic wave band lambda of mth heavy metal spectrummA range of (d);
for the heavy metal content y, the value range of the soil virtual sample is
Figure GDA0003110195450000126
Wherein the content of the first and second substances,
Figure GDA0003110195450000127
L(0)representing the original training set D(0)The range of heavy metal content y;
in a sample
Figure GDA0003110195450000128
Randomly generating a soil virtual sample (x) in a corresponding delta neighborhood range(r),y(r))。
Further, the step 5 of training to obtain the back regression model of the heavy metal content of the BP neural network based on the mixed sample set by using the heavy metal spectral characteristic wave band as input and the corresponding heavy metal content as output specifically includes the following steps:
5.1, determining input and output of a BP neural network heavy metal content inversion regression model, inputting the input into a heavy metal spectral characteristic wave band of a mixed sample set, and outputting corresponding heavy metal content;
5.2, establishing a three-layer forward neural network topology structure with g inputs and 1 output, wherein the activation functions of a hidden layer and an output layer are adopted
Figure GDA0003110195450000129
The layers of the Sigmod type function are connected in a full interconnection mode, nodes on the same layer are not connected, and g is the number of the characteristic wave bands of the heavy metal spectrum;
and 5.3, learning and training the BP neural network heavy metal content inversion regression model by using a BP algorithm, wherein the training precision is set to be E1E-3 until the preset training precision is met, and then establishing the BP neural network heavy metal content inversion regression model.
The BP neural network is a nonlinear model and is suitable for the hyperspectral heavy metal content inversion of a small sample.
Further, the learning training process consists of a forward propagation part and a backward propagation part;
wherein the forward propagation comprises:
for a given input training pattern X ═ X (X)1,x2,...,xg),xgThe g-th heavy metal spectral characteristic wave band is represented, firstly transmitted to the hidden layer unit by the input layer unit, processed by the hidden layer unit and then transmitted to the output layer, and finally processed by the output layer unit to generate an output mode Y ═ Y (Y)1,y2,...,yp),ypRepresenting the output p-th heavy metal content value;
let the number of nodes in input layer, hidden layer and output layer be g, h and p, and the connection right from input layer to hidden layer be wijThe connection weight from the hidden layer to the output layer is vjlThen, the outputs of the nodes of the hidden layer and the output layer are respectively:
Figure GDA0003110195450000131
Figure GDA0003110195450000132
wherein i is 1, 2.., g; j ═ 1, 2,. ·, h; 1 ═ 1, 2,. cndot, p; thetajThe threshold value for the node of the hidden layer,
Figure GDA0003110195450000133
is the threshold value of the output layer node; f is a function of the Sigmod type,
Figure GDA0003110195450000134
if the output obtained on the output layer does not reach the preset training precision, turning to reverse propagation;
the back propagation includes:
calculating the error value along the original connection path and returning the error value according to wij、vjlUpdating the connection weight and the threshold value of each layer of neuron, reducing the error value,
vjl(t+1)=vjl(t)+αdlzl
Figure GDA0003110195450000135
Figure GDA0003110195450000136
wjl(t+l)=wjl(t)+βejxj
θj(t+1)=θj(t)+βej
Figure GDA0003110195450000141
wherein alpha and beta are learning rates, the values are 0.001-0.1,
Figure GDA0003110195450000142
is the desired output of the output node l, ylIs the actual output; in the scheme, if the number p of output values is 1, l is also 1;
and selecting a mixed sample set sample as a training sample, repeating the forward propagation and backward propagation processes to train the network until the preset training precision is reached, and establishing a BP neural network heavy metal content inversion regression model.
Example 2
An embodiment of the present invention provides a computer-readable storage medium, where the storage medium includes stored program instructions, and the program instructions are adapted to a processor to load and execute the soil heavy metal content inversion model generation method according to embodiment 1.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Example 3
The embodiment of the invention provides a soil heavy metal content inversion model generation system, which comprises:
a data acquisition module: the method is used for acquiring the heavy metal content and the indoor spectral data of the soil sample;
a feature extraction module: the method is used for preprocessing indoor spectral data of the soil sample and extracting the spectral characteristic wave band of the heavy metal by adopting a competitive self-adaptive re-weighting method;
a virtual sample generation module: the method comprises the steps of constructing an actual measurement sample set based on the heavy metal content of a soil sample and a heavy metal spectral characteristic wave band corresponding to the heavy metal content, and generating a soil virtual sample set by using a least square virtual sample generation method, wherein the ratio of the number of the soil virtual samples to the number of the actual measurement samples is 1-10: 1;
an inversion model generation module: the method is used for training to obtain a BP neural network heavy metal content inversion regression model based on a mixed sample set formed by an actually measured sample set and a soil virtual sample set, wherein a heavy metal spectral characteristic wave band is used as input, and the corresponding heavy metal content is used as output.
In the system, specific implementation schemes of functions of each module can refer to the soil heavy metal content inversion model generation method described in embodiment 1, and details are not repeated here.
Example 4
The embodiment of the invention also provides a soil heavy metal content inversion method, which comprises the following steps:
acquiring indoor spectral data of a soil sample to be detected;
preprocessing indoor spectral data of a soil sample to be detected and extracting a heavy metal spectral characteristic waveband by adopting a competitive self-adaptive re-weighting method;
and taking the extracted heavy metal spectral characteristic band as an input of a BP neural network heavy metal content inversion regression model, and outputting the corresponding heavy metal content, wherein the BP neural network heavy metal content inversion regression model is generated by adopting the soil heavy metal content inversion model generation method.
According to the soil heavy metal content inversion method, the heavy metal content to be measured can be rapidly inverted through the indoor spectral data of the soil sample to be measured, and compared with the detection through actual measurement, the detection cost and the time cost are greatly reduced. For specific implementation of the soil heavy metal content inversion method, reference may be made to the soil heavy metal content inversion model generation method described in embodiment 1, and details are not repeated here.
The effect of this scheme will be further explained by combining with specific experiments.
The research area is located in a certain ploughing area in the Yueyang city of Hunan province, the terrain is relatively flat, and the research area belongs to the humid climate of the subtropical zone continental. The annual average air temperature is 15-16 ℃, and the annual average precipitation is 1260 mm. The soil type of the research area is mainly yellow soil, and the research area is an important grain production area. With the acceleration of the industrialization process, the discharge of chemical pollution, heavy metal pesticides and various fertilization wastes makes the problem of heavy metal pollution of soil in research areas increasingly obvious. The research area collects 50 sample points (as shown in fig. 2), and acquires the geographic coordinates thereof by using a GPS positioning instrument, so as to mainly research the contents of heavy metals Cr and Pb.
In the soil collection process, five points of a soil sample in a research area are determined according to an S curve, the collection depth is 20 cm, about 200g of surface soil is collected at each point and put into a bag with a label, and air in the bag is discharged. The soil sample is placed in a dry and ventilated place to be naturally air-dried, impurities in the sample, such as broken stones, bricks, falling objects, plant roots and the like, are removed, then the sample is filtered by a nylon sieve of 100 meshes, divided into two parts, and stored in a special container. One part is used for measuring the concentration of heavy metal, and the content of Pb and Cr in the soil is measured by adopting a flame atomic absorption spectrophotometer; and the other part adopts an AvaField-3 high-precision spectrum ground object spectrometer to collect soil spectrum data, and the sampling wave band is 350-2500 nm.
The soil spectrum data is resampled, and fractional differential preprocessing is carried out, so that the correlation between the spectrum and the heavy metal content can be effectively improved through fractional differential spectrum, and the characteristic wave bands of the heavy metal Cr and Pb spectrum are highlighted. And extracting the spectral characteristic wave bands of the heavy metals Cr and Pb by using a competitive adaptive reweighting sampling method (CARS). The competitive adaptive reweighting sampling method is an optimized characteristic band screening method, and determines an optimal variable subset by gradually deleting unimportant variables. The algorithm firstly determines the wavelength variable with larger regression coefficient absolute value in the PLS model, and optimizes the variable subset with the minimum cross validation Root Mean Square Error (RMSECV) through cross-folding cross check to determine the optimal variable subset. The method can screen out wavelength variables which are sensitive to the heavy metal in the soil, reduce the wavelength redundancy and improve the calculation efficiency and the inversion accuracy.
Then, a virtual soil sample set is generated by applying a virtual least square sample generation method (VSGPLS), and the specific implementation can refer to the following scheme:
Figure GDA0003110195450000161
Figure GDA0003110195450000171
in the experiment, a BPNN (BP neural network) algorithm is selected as an inversion model of the content of heavy metals Pb and Cr, and the inversion result is subjected to precision analysis and evaluation by adopting the following parameters including a modeling set decision coefficient (prediction coefficients of probability,
Figure GDA0003110195450000179
) A Root Mean Square Error (RMSEC) of a modeling set, a determination coefficient (determination coefficients of a prediction,
Figure GDA0003110195450000172
) A Root Mean Square Error (RMSEP) and a relative prediction error (PRD). The formula is shown below, RMSEC and RMSEP represent the accuracy of the model, and the magnitude of the value is inversely proportional to the accuracy of the model. When R is more than 0.662PRD is more than 2, the model prediction effect is better, R is more than 0.502Less than 0.66, less than 1.4 and less than 2.0 PRD, general model effect, R2The modeling fails if the PRD is less than 0.50 and less than 1.4.
Figure GDA0003110195450000173
Figure GDA0003110195450000174
Figure GDA0003110195450000175
In the formula: y isciThe predicted value of the observation is shown,
Figure GDA0003110195450000176
indicating the true value of observation, ypiThe verification of the predicted value is indicated,
Figure GDA0003110195450000177
representing the true validation value, SD representing the standard deviation of the validation set, and n being the number of samples.
As can be seen from the descriptive statistical characteristics of the contents of Pb and Cr in soil (as shown in Table 1), the mean values of the contents of Pb and Cr corresponding to all samples are 9.33g/kg and 21.25g/kg, respectively, and the Coefficients of Variation (CV) are 11.41% and 27.56%, respectively, which belong to low variation and medium variation, and are both between the mean values and the coefficients of variation of the contents of Pb and Cr in the modeling set and the verification set.
TABLE 1 statistical characteristics of the contents of heavy metals Pb and Cr in soil
Figure GDA0003110195450000178
Figure GDA0003110195450000181
The wavelength at the edge of the spectrometer area has a lower signal-to-noise ratio, and the spectral bands in the ranges of 350-400 nm and 2400-2500 nm are removed (as shown in FIG. 3). The spectrum curve is in an ascending trend overall, high reflection peaks appear in 1300 nm-1500 nm and 1800 nm-2100 nm due to the influence of environmental factors and moisture, the reflectivity of part of wave bands is larger than 1, the error is large, so that the wave band range is subjected to elimination treatment, the fluctuation of 2100 nm-2400 nm is large, and the wave band is mainly related to the organic matter characteristics of soil.
This experiment was resampled at 10nm to reduce noise and improve computational efficiency. The highest correlation between the original reflectivity spectrum and the contents of heavy metals Cr and Pb is respectively 0.191 and 0.135, the original reflectivity spectrum corresponds to wave bands of 620nm and 2400nm respectively, and the original reflectivity spectrum does not pass significance test. In order to further improve the correlation between the heavy metal elements and the spectral reflectivity, 0-2 order differential transformation with 0.2 fractional order interval is carried out on the spectrum by applying hyper-spectral mathematical transformation software 1.0 (registration number: 2018R11L1038727) developed by the subject group. Compared with the original spectrum, the correlation between the differential transform spectrum and the contents of heavy metals Cr and Pb is remarkably improved (as shown in Table 2). The highest correlations of heavy metals Cr and Pb are 0.598 and 0.412 respectively, and are obtained on differential spectra of 1.4 order and 1.6 order respectively, which are also used for subsequent experimental analysis.
TABLE 2 correlation coefficient between spectral reflectivity optimum differential transformation and heavy metals Cr and Pb
Figure GDA0003110195450000182
In the competitive adaptive reweighting sampling method (CARS) of the present study, in extracting the spectral feature bands of heavy metals Cr and Pb, the monte carlo sampling frequency is set to 50, the sampling frequency is iterated repeatedly, and by comparing the RMSECV values of the respective samples, when the value is the minimum, the variable of the corresponding sampling frequency is screened as the preferred variable subset, and the process is shown in fig. 4. 16 and 25 wave bands are selected for heavy metals Cr and Pb respectively, and the compression rates are 11.11 percent and 17.36 percent respectively. The heavy metals Cr and Pb preferably both include the wave band corresponding to the highest correlation in the variable quantum set.
In the virtual sample set generation process, the neighborhood control factor and the maximum search frequency are respectively set to be 0.001 and 30, and the number of virtual samples is sequentially set to be 20, 50, 100, 150, 200 and 500. With the increase of virtual samples, the training absolute value errors of heavy metals Cr and Pb are continuously reduced (as shown in FIG. 5). Considering the limited space, the statistical characteristics of the content of the heavy metal Cr in different virtual samples are shown, and the statistics of the statistical characteristics are shown in table 3, and the mean values and the variation coefficients are basically similar in different virtual sample numbers, which also indicates that the virtual samples do not change the overall distribution of the content of the soil samples.
TABLE 3 heavy metal Cr virtual sample feature statistics
Figure GDA0003110195450000183
Figure GDA0003110195450000191
BPNN modeling is carried out by utilizing full wave band, and heavy metal Cr and Pb prediction results
Figure GDA0003110195450000192
All are negative numbers (as in Table 4), violate 0 ≦ R2The value range is less than or equal to 1, and the RMSEP is too large, which indicates that the difference between the predicted value and the measured value is larger. The characteristic wave band extracted by CARS is selected for modeling, the inversion accuracy of heavy metals Cr and Pb is obviously improved, the highest accuracy is obtained on a BPNN model,
Figure GDA0003110195450000193
0.61 and 0.28, respectively, and a PRD of 1.16 and 0.99, respectively. According to the accuracy judgment standard, the accuracy and the reliability of the heavy metal Cr regression model are general, and the heavy metal Pb modeling fails.
TABLE 4 prediction results of heavy metal Cr and Pb content before and after spectral dimensionality reduction
Figure GDA0003110195450000194
Note: none in the table indicates that modeling is performed using all bands, and CARS in the table indicates that modeling is performed using a characteristic band extracted by a competitive adaptive re-weighted sampling method.
Tables 5 and 6 show the inversion accuracy of the heavy metal Cr and Pb contents on the BPNN model under different virtual sample quantities. Training set on BPNN model as virtual sample number increases
Figure GDA0003110195450000195
The overall trend increases and the RMSEC overall decreases. In terms of the accuracy of the test set,
Figure GDA0003110195450000196
the PRD generally increases first and then decreases, the RMSEP decreases first and then increases, and the fluctuation is large. This also indicates that the number of virtual samples is not as large as possible, but reaches a certain valueThe individual thresholds are optimal. This will be further analyzed in the discussion. As can be seen from the table, the inversion accuracy of the contents of the heavy metals Pb and Cr is obtained when the number of the virtual samples is 50 (as shown in FIG. 6),
Figure GDA0003110195450000197
0.96 and 0.94, respectively, RMSEC 1.71 and 0.34, respectively,
Figure GDA0003110195450000198
0.77 and 0.76, RMSEP 1.90 and 0.43, and PRD 2.68 and 2.07, respectively. According to the accuracy judgment standard, the accuracy and the reliability of the heavy metal Cr and Pb regression model meet the monitoring requirements. The method can effectively improve the inversion accuracy of the heavy metals, and has applicability to different heavy metals.
TABLE 5 comparison of prediction accuracy of heavy metal Cr in different quantities of virtual samples
Figure GDA0003110195450000199
TABLE 6 comparison of prediction accuracy of heavy metals Pb in different quantities for virtual samples
Figure GDA0003110195450000201
In the research, an optimal variable set (heavy metal characteristic spectrum band) of heavy metals Pb and Cr is screened out by using a spectral fractional order differential and competitive adaptive reweighted sampling coupling algorithm, a virtual sample expansion training set is generated based on a partial least square virtual sample generation method (VSGPLS) provided by the research, finally, a BP neural network (BPNN) algorithm is used for heavy metal content modeling, and the inversion result is discussed and analyzed to obtain the following conclusion:
1) the fractional order differential can effectively improve the correlation between the heavy metal and the spectrum wave band and highlight the characteristic wave band of the heavy metal.
2) Compared with the full wave band, the contents of heavy metals Cr and Pb are inverted on the BPNN model by utilizing the characteristic wave band modeling extracted by the competitive adaptive reweighted sampling, and the prediction precision is effectively improved.
3) The virtual sample generated by the partial least square virtual sample generation method can effectively improve the inversion accuracy of the contents of heavy metals Cr and Pb, and the optimal accuracy is obtained when the number of the virtual samples is 50 in the BPNN model,
Figure GDA0003110195450000202
0.96 and 0.94, respectively, RMSEC 1.71 and 0.34, respectively,
Figure GDA0003110195450000203
respectively 0.77 and 0.76, the RMSEP respectively 1.90 and 0.43, and the PRD respectively 2.68 and 2.07, which shows that the method has higher precision and applicability to farmland soil heavy metal inversion.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A method for generating a soil heavy metal content inversion model is characterized by comprising the following steps:
step 1: acquiring the heavy metal content and indoor spectral data of a soil sample;
step 2: preprocessing indoor spectral data of a soil sample and extracting a heavy metal spectral characteristic waveband by adopting a competitive self-adaptive re-weighting method;
and step 3: constructing an actual measurement sample set based on the heavy metal content of the soil sample and the corresponding heavy metal spectral characteristic band;
and 4, step 4: based on the heavy metal content of the soil sample in the actually measured sample set and the heavy metal spectral characteristic wave band corresponding to the heavy metal content, a virtual soil sample set is generated by applying a method based on least square virtual sample generation, the number ratio of the virtual soil sample to the actually measured sample is 1-10: 1, and the actually measured sample set and the virtual soil sample set form a mixed sample set;
and 5: training to obtain a BP neural network heavy metal content inversion regression model by taking the heavy metal spectral characteristic wave band as input and the corresponding heavy metal content as output based on the mixed sample set;
in the step 4, a virtual sample set is generated by applying a virtual sample generation method based on the content of the heavy metal in the soil sample in the actually measured sample set and the heavy metal spectral characteristic wave band corresponding to the content, and the method specifically comprises the following steps:
4.1, using the measured sample set as the initial training set D(0)The field control factor is delta, and the delta value range is [0.0001, 0.01 ]]The maximum search frequency is V, the number of virtual soil samples to be generated is R, and an initialization cyclic variable R is 1;
4.2 based on the initial training set D(0)Training to obtain a partial least square inversion heavy metal content regression model by taking the heavy metal spectral characteristic wave band as input and the corresponding heavy metal content as output, and obtaining an initial training set D by a leave-one verification method(0)Has an absolute training error of E(0)Let E equal to E(0)
4.3, judging whether R is less than or equal to R, if so, entering a step 4.4, otherwise, skipping to a step 4.14;
4.4 from the initial training set D(0)Training error height pair initial training set D of each sample(0)Sample in (1) is ordered from high to low and is recorded as
Figure FDA0003103402730000011
Wherein N is an initial training set D(0)The total number of the medium samples, x represents the heavy metal characteristic spectrum wave band, y represents the corresponding heavy metal content, and the training error is the predicted value of the heavy metal content
Figure FDA0003103402730000012
And measured value ynThe absolute difference between them;
4.5, initializing a cycle variable q to be 1;
4.6 judging whether q is less than or equal to N, if so, entering a step 4.7; if not, making r equal to r +1, and skipping to step 4.3;
4.7, the number of times v of initial search is 1;
4.8, judging whether V is less than or equal to V, and if V is less than or equal to V, entering a step 4.9; if not, jumping to the step 4.13;
4.9 randomly generating samples
Figure FDA0003103402730000013
Corresponding to virtual soil samples in 6 neighborhoods
Figure FDA0003103402730000014
4.10 based on training set D(r-1)∪(x(r),y(r)) Training to obtain a partial least square inversion heavy metal content regression model by taking the heavy metal spectral characteristic wave band as input and the corresponding heavy metal content as output, and then calculating to obtain an initial training set D(0)Has an absolute training error of E(r)
4.11, if E(r)< E, order D(r)=D(r-1)∪(x(r),y(r)),E=E(r)R +1, go to step 4.3; otherwise, jumping to 4.12;
4.12, making v ═ v +1, and jumping to step 4.8;
4.13, making q equal to q +1, and jumping to step 4.6;
and 4.14, outputting a soil virtual sample set with the soil virtual sample number of R.
2. The soil heavy metal content inversion model generation method according to claim 1, wherein the preprocessing the indoor spectral data of the soil sample in the step 2 comprises:
performing spectrum resampling on indoor spectrum data of the soil sample:
with the preset spectrum length as an interval unit, performing spectrum resampling on indoor spectrum data of the soil sample by adopting the following formula:
r=[r(λ1)+…+r(λi)+…+r(λf)]/f
wherein r represents the reflectance value of the hyperspectral curve after resampling, f is the number of spectral bands contained in the interval unit range, and r (lambda)i) Represents the ith spectral band λiA reflectance value of (d);
carrying out fractional differential pretreatment on indoor spectral data of the soil sample obtained after the spectrum resampling:
calculating indoor spectral data of each soil sample obtained after spectral resampling by adopting the following formula based on fractional order differentiation:
Figure FDA0003103402730000021
wherein, r (λ)i) For the ith spectral band λiReflectance value of rvi) As the spectral band λiGamma is a Gamma function, s represents the number of spectral bands in the indoor spectral data of the measured soil sample,
Figure FDA0003103402730000022
h is the differential step, t is the upper differential limit, and a is the lower differential limit.
3. The soil heavy metal content inversion model generation method according to claim 1 or 2, wherein the extracting of the heavy metal spectral characteristic wave band by using the competitive adaptive re-weighting method in the step 2 comprises the following processes:
using M to represent the sampling times of the Monte Carlo simulation method, using M to represent the total number of soil samples subjected to indoor spectrum pretreatment, using s to represent the number of spectrum wave bands in indoor spectrum data of the tested soil samples, and using i to represent the ith sampling, wherein the competitive adaptive re-weighting method comprises the following steps:
1) initializing i to 1;
2) judging whether M is equal to or less than i, and if so, entering the step 3); if not, entering step 7);
3) indoor spectral prediction by Monte Carlo simulation methodRandomly taking d samples in the processed sample set by V1Represents, then is based on V1Constructing a regression model of partial least squares inversion heavy metal content by using the data, wherein m is more than or equal to 0.8 and d is less than m;
4) solving the absolute value of a regression coefficient vector B in a partial least square inversion heavy metal content regression model, and using BiExpressing, solving the weight value w of each regression coefficient in the regression model of partial least square inversion heavy metal contentj=Bij/sum(Bi) Wherein B isijRepresenting the contribution degree of the jth spectral band of the ith sampling to the heavy metal content; removing the spectral band with relatively small weight value by using an exponential decay function, and calculating the proportion r of the reserved spectral bandi=ae-kiWhere e is the base of the natural logarithm function, and a and k can be expressed by the following formula:
a=(s/2)1/(M-1)
Figure FDA0003103402730000031
5) filtering the filtered s multiplied by r by adopting an adaptive re-weighting algorithmiSelecting a subset of spectral bands from the spectral bands by V2Represents;
6) based on spectral band subset V2After the RMSECV value is calculated, V is executed1=V2I +1, go to step 2), where RMSECV represents cross-validation root mean square error;
7) after the M times of sampling are finished, obtaining M spectral band subsets and M corresponding RMSECV values by a competitive self-adaptive re-weighting algorithm;
8) and selecting the spectrum band subset corresponding to the minimum RMSECV value as the optimal spectrum band subset, namely the heavy metal spectrum characteristic band extracted by adopting a competitive self-adaptive re-weighting method.
4. The soil heavy metal content inversion model generation method according to claim 1, wherein the randomly generated samples in step 4.9
Figure FDA0003103402730000032
Corresponding to virtual samples of soil in delta neighborhood
Figure FDA0003103402730000033
The method comprises the following steps:
determining a sample
Figure FDA0003103402730000034
Corresponding to the delta neighborhood range: for the mth heavy metal spectral characteristic wave band lambdamIn other words, the value range of the soil virtual sample is
Figure FDA0003103402730000035
Wherein the content of the first and second substances,
Figure FDA0003103402730000036
Figure FDA0003103402730000037
Figure FDA0003103402730000038
representing the original training set D(0)Characteristic wave band lambda of mth heavy metal spectrummA range of (d);
for the heavy metal content y, the value range of the soil virtual sample is
Figure FDA0003103402730000039
Wherein the content of the first and second substances,
Figure FDA00031034027300000310
L(0)representing the original training set D(0)The range of heavy metal content y;
in a sample
Figure FDA00031034027300000311
Randomly generating soil virtual sample in corresponding delta neighborhood range
Figure FDA00031034027300000312
5. The soil heavy metal content inversion model generation method according to claim 1, wherein the step 5 of training to obtain the BP neural network heavy metal content inversion regression model based on the mixed sample set by taking the heavy metal spectral characteristic wave band as input and the corresponding heavy metal content as output specifically comprises the following steps:
5.1, determining input and output of a BP neural network heavy metal content inversion regression model, inputting the input into a heavy metal spectral characteristic wave band of a mixed sample set, and outputting corresponding heavy metal content;
5.2, establishing a three-layer forward neural network topological structure with g inputs and 1 output, wherein the activation functions of a hidden layer and an output layer adopt a Sigmod type function, the layers are connected in a full interconnection mode, nodes on the same layer are not connected, and g is the number of heavy metal spectral characteristic wave bands;
and 5.3, learning and training the BP neural network heavy metal content inversion regression model by using a BP algorithm until preset training precision is met, and then establishing the BP neural network heavy metal content inversion regression model.
6. The soil heavy metal content inversion model generation method according to claim 1, wherein the step 1 of obtaining the heavy metal content and the indoor spectral data of the soil sample comprises the following processes:
according to the soil type, the texture and the type of the covering crops in the research area, at least 30 soil samples are uniformly and randomly selected in the research area, the collected soil samples are chemically analyzed and measured for the content of heavy metals in the soil samples in a laboratory by a triacid digestion-atomic absorption spectrophotometry, and meanwhile, an AvaFiled-3 type full-waveband ground object spectrum radiometer is used for carrying out soil spectrum measurement to obtain indoor spectrum data.
7. A soil heavy metal content inversion model generation system is characterized by comprising:
a data acquisition module: the method is used for acquiring the heavy metal content and the indoor spectral data of the soil sample;
a feature extraction module: the method is used for preprocessing indoor spectral data of the soil sample and extracting the spectral characteristic wave band of the heavy metal by adopting a competitive self-adaptive re-weighting method;
a virtual sample generation module: the method is used for constructing an actual measurement sample set based on the heavy metal content of a soil sample and the heavy metal spectral characteristic wave band corresponding to the heavy metal content, and generating a soil virtual sample set based on the heavy metal content of the soil sample in the actual measurement sample set and the heavy metal spectral characteristic wave band corresponding to the heavy metal content by using a least square virtual sample generation method, so that the number ratio of the soil virtual sample to the actual measurement sample is 1-10: 1; the method comprises the following steps of firstly, obtaining a heavy metal content of a soil sample in a measured sample set and a heavy metal spectral characteristic wave band corresponding to the heavy metal content, and generating a virtual sample set by using a virtual sample generation method based on least square, wherein the method specifically comprises the following steps:
4.1, using the measured sample set as the initial training set D(0)The field control factor is delta, and the delta value range is [0.0001, 0.01 ]]The maximum search frequency is V, the number of virtual soil samples to be generated is R, and an initialization cyclic variable R is 1;
4.2 based on the initial training set D(0)Training to obtain a partial least square inversion heavy metal content regression model by taking the heavy metal spectral characteristic wave band as input and the corresponding heavy metal content as output, and obtaining an initial training set D by a leave-one verification method(0)Has an absolute training error of E(0)Let E equal to E(0)
4.3, judging whether R is less than or equal to R, if so, entering a step 4.4, otherwise, skipping to a step 4.14;
4.4 from the initial training set D(0)Training error height pair initial training set D of each sample(0)Sample in (1) is ordered from high to low and is recorded as
Figure FDA0003103402730000051
Wherein N is the initial trainingCollection D(0)The total number of the medium samples, x represents the heavy metal characteristic spectrum wave band, y represents the corresponding heavy metal content, and the training error is the predicted value of the heavy metal content
Figure FDA0003103402730000052
And measured value ynThe absolute difference between them;
4.5, initializing a cycle variable q to be 1;
4.6 judging whether q is less than or equal to N, if so, entering a step 4.7; if not, making r equal to r +1, and skipping to step 4.3;
4.7, the number of times v of initial search is 1;
4.8, judging whether V is less than or equal to V, and if V is less than or equal to V, entering a step 4.9; if not, jumping to the step 4.13;
4.9 randomly generating samples
Figure FDA0003103402730000053
Corresponding to virtual samples (x) of soil in delta neighborhood(r),y(r));
4.10 based on training set D(r-1)∪(x(r),y(r)) Training to obtain a partial least square inversion heavy metal content regression model by taking the heavy metal spectral characteristic wave band as input and the corresponding heavy metal content as output, and then calculating to obtain an initial training set D(0)Has an absolute training error of E(r)
4.11, if E(r)<E, order D(r)=D(r-1)∪(x(r),y(r)),E=E(r)R +1, go to step 4.3; otherwise, jumping to 4.12;
4.12, making v ═ v +1, and jumping to step 4.8;
4.13, making q equal to q +1, and jumping to step 4.6;
4.14, outputting a soil virtual sample set with the soil virtual sample quantity of R;
an inversion model generation module: the method is used for training to obtain a BP neural network heavy metal content inversion regression model based on a mixed sample set formed by an actually measured sample set and a soil virtual sample set, wherein a heavy metal spectral characteristic wave band is used as input, and the corresponding heavy metal content is used as output.
8. A computer readable storage medium, characterized in that the storage medium comprises stored program instructions adapted to be loaded by a processor and to execute the soil heavy metal content inversion model generation method according to any one of claims 1 to 6.
9. The soil heavy metal content inversion method is characterized by comprising the following steps:
acquiring indoor spectral data of a soil sample to be detected;
preprocessing indoor spectral data of a soil sample to be detected and extracting a heavy metal spectral characteristic waveband by adopting a competitive self-adaptive re-weighting method;
and taking the extracted heavy metal spectral characteristic band as an input of a BP neural network heavy metal content inversion regression model, and outputting the corresponding heavy metal content, wherein the BP neural network heavy metal content inversion regression model is generated by adopting the method of any one of claims 1 to 6.
CN201911265449.XA 2019-12-11 2019-12-11 Soil heavy metal content inversion model generation method, system and inversion method Active CN110991064B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911265449.XA CN110991064B (en) 2019-12-11 2019-12-11 Soil heavy metal content inversion model generation method, system and inversion method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911265449.XA CN110991064B (en) 2019-12-11 2019-12-11 Soil heavy metal content inversion model generation method, system and inversion method

Publications (2)

Publication Number Publication Date
CN110991064A CN110991064A (en) 2020-04-10
CN110991064B true CN110991064B (en) 2021-07-20

Family

ID=70092241

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911265449.XA Active CN110991064B (en) 2019-12-11 2019-12-11 Soil heavy metal content inversion model generation method, system and inversion method

Country Status (1)

Country Link
CN (1) CN110991064B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112525829A (en) * 2020-04-30 2021-03-19 中国科学院地球化学研究所 Heavy metal content detection equipment
CN111487211B (en) * 2020-05-11 2022-09-30 安徽理工大学 Incoherent broadband cavity enhanced absorption spectrum fitting waveband selection method
CN111579500A (en) * 2020-05-20 2020-08-25 湖南城市学院 Heavy metal content support vector machine regression method combining wave bands and ratios of indoor and outdoor spectrums
CN112485203A (en) * 2020-11-04 2021-03-12 天水师范学院 Hyperspectral imaging analysis-based heavy metal pollution analysis method
CN113076692B (en) * 2021-03-29 2021-09-28 中国农业科学院农业资源与农业区划研究所 Method for inverting nitrogen content of leaf
CN117874480A (en) * 2021-12-31 2024-04-12 三峡大学 ICO-BOSS algorithm-based soil heavy metal spectral feature extraction method
CN114384031A (en) * 2022-01-12 2022-04-22 广西壮族自治区地理信息测绘院 Satellite-air-ground hyperspectral remote sensing water body heavy metal pollution three-dimensional monitoring method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107870150A (en) * 2017-11-02 2018-04-03 北京师范大学 Soil parameters EO-1 hyperion inversion method based on falling zone heavy metal-polluted soil
CN110376139A (en) * 2019-08-05 2019-10-25 北京绿土科技有限公司 Soil organic matter content quantitative inversion method based on ground high-spectrum

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105046320A (en) * 2015-08-13 2015-11-11 中国人民解放军61599部队计算所 Virtual sample generation method
CN106767687B (en) * 2017-02-22 2019-05-28 河海大学 A method of utilizing remote sensing moisture measurement beach elevation
CN106918565B (en) * 2017-03-02 2019-08-30 中南大学 Heavy metal-polluted soil Cd content Inverse modeling and its spectral response characteristics wave band recognition methods based on indoor standard specimen bloom spectrum signature
CN107132190A (en) * 2017-04-21 2017-09-05 武汉大学 A kind of soil organism spectra inversion model calibration samples collection construction method
CN108152235B (en) * 2018-03-21 2020-09-22 中南大学 Heavy metal content inversion method combining soil indoor and outdoor spectra
CN108982406A (en) * 2018-07-06 2018-12-11 浙江大学 A kind of soil nitrogen near-infrared spectral characteristic band choosing method based on algorithm fusion

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107870150A (en) * 2017-11-02 2018-04-03 北京师范大学 Soil parameters EO-1 hyperion inversion method based on falling zone heavy metal-polluted soil
CN110376139A (en) * 2019-08-05 2019-10-25 北京绿土科技有限公司 Soil organic matter content quantitative inversion method based on ground high-spectrum

Also Published As

Publication number Publication date
CN110991064A (en) 2020-04-10

Similar Documents

Publication Publication Date Title
CN110991064B (en) Soil heavy metal content inversion model generation method, system and inversion method
Zhang et al. Spectral features extraction for estimation of soil total nitrogen content based on modified ant colony optimization algorithm
Ali et al. Comparing methods for mapping canopy chlorophyll content in a mixed mountain forest using Sentinel-2 data
CN109493287A (en) A kind of quantitative spectra data analysis processing method based on deep learning
Rodríguez-Pérez et al. Leaf water content estimation by functional linear regression of field spectroscopy data
Liu et al. Integrating spectral indices with environmental parameters for estimating heavy metal concentrations in rice using a dynamic fuzzy neural-network model
Zhang et al. Allocate soil individuals to soil classes with topsoil spectral characteristics and decision trees
CN114018833B (en) Method for estimating heavy metal content of soil based on hyperspectral remote sensing technology
Liu et al. Estimation of soil organic matter content based on CARS algorithm coupled with random forest
CN110907393B (en) Method and device for detecting saline-alkali stress degree of plants
Odebiri et al. Deep learning-based national scale soil organic carbon mapping with Sentinel-3 data
Aitkenhead et al. Predicting soil chemical composition and other soil parameters from field observations using a neural network
Mishra et al. Machine learning for cation exchange capacity prediction in different land uses
CN113607656A (en) Leaf chlorophyll content monitoring method and system based on hyperspectral imaging
CN114764682B (en) Rice safety risk assessment method based on multi-machine learning algorithm fusion
Banskota et al. Continuous wavelet analysis for spectroscopic determination of subsurface moisture and water-table height in northern peatland ecosystems
Zhao et al. Spectral features of Fe and organic carbon in estimating low and moderate concentration of heavy metals in mangrove sediments across different regions and habitat types
EP1836600A1 (en) Modelling a phenomenon that has spectral data
CN114814167A (en) Soil heavy metal content inversion method fusing multi-source environment variables and spectral information
CN114486786A (en) Soil organic matter measuring method and measuring system
Liu et al. Simultaneous estimation of multiple soil properties under moist conditions using fractional-order derivative of vis-NIR spectra and deep learning
Wang et al. Construction of complex features for predicting soil total nitrogen content based on convolution operations
CN105974058A (en) Method for rapidly detecting potassium content of tobacco leaves based on electronic nose-artificial neural network
Guo et al. Suitability of different multivariate analysis methods for monitoring leaf N accumulation in winter wheat using in situ hyperspectral data
CN116151454A (en) Method and system for predicting yield of short-forest linalool essential oil by multispectral unmanned aerial vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant