CN110987853B - Method and device for predicting soil lead pollution degree based on terahertz spectrum - Google Patents

Method and device for predicting soil lead pollution degree based on terahertz spectrum Download PDF

Info

Publication number
CN110987853B
CN110987853B CN201911089940.1A CN201911089940A CN110987853B CN 110987853 B CN110987853 B CN 110987853B CN 201911089940 A CN201911089940 A CN 201911089940A CN 110987853 B CN110987853 B CN 110987853B
Authority
CN
China
Prior art keywords
soil
pollution degree
lead pollution
value
degree prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911089940.1A
Other languages
Chinese (zh)
Other versions
CN110987853A (en
Inventor
李斌
李超
李银坤
王姝言
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Research Center for Information Technology in Agriculture
Original Assignee
Beijing Research Center for Information Technology in Agriculture
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Research Center for Information Technology in Agriculture filed Critical Beijing Research Center for Information Technology in Agriculture
Priority to CN201911089940.1A priority Critical patent/CN110987853B/en
Publication of CN110987853A publication Critical patent/CN110987853A/en
Application granted granted Critical
Publication of CN110987853B publication Critical patent/CN110987853B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/3563Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light for analysing solids; Preparation of samples therefor
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/3581Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using far infrared light; using Terahertz radiation
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/24Earth materials
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/24Earth materials
    • G01N2033/245Earth materials for agricultural purposes

Abstract

The embodiment of the invention discloses a method and a device for predicting the lead pollution degree of soil based on a terahertz spectrum, wherein the method for predicting the lead pollution degree of soil based on the terahertz spectrum comprises the following steps: acquiring the pH value of soil to be detected; acquiring terahertz spectrum data of soil to be detected; selecting a characteristic extraction mode matched with the terahertz spectrum data of the soil to be detected and a lead pollution degree prediction model matched with the soil to be detected according to the pH value of the soil to be detected; and extracting characteristic data of the terahertz spectrum data of the soil to be detected by using the characteristic extraction mode, and inputting the characteristic data into the lead pollution degree prediction model to obtain the lead pollution degree of the soil to be detected. According to the embodiment of the invention, a more accurate soil lead pollution degree prediction result can be obtained.

Description

Method and device for predicting soil lead pollution degree based on terahertz spectrum
Technical Field
The invention relates to the technical field of soil detection, in particular to a method and a device for predicting the lead pollution degree of soil based on terahertz spectrum.
Background
Lead pollution is one of the major forms of soil heavy metal pollution. Lead has biotoxicity, nondegradable property and accumulative property in organisms, and is easily enriched in surface soil in different chemical forms through dissolution, precipitation, complexation, adsorption and the like after entering soil, and the area is just the main place for exchanging substance nutrition between crop roots and soil. Therefore, lead may enter the human body through the food chain, and when the lead is accumulated in the human body to reach a certain concentration, the lead may damage the visceral tissues, the nervous system, the bone hematopoietic system and the like of the human body. Therefore, the detection of the lead pollution degree in the soil has important significance for agricultural land pollution risk management and control and guaranteeing the quality safety of agricultural products.
The traditional laboratory analysis and determination of the concentration of heavy metals in soil is a very tedious and time-consuming process, and meanwhile, dangerous chemical reagents also need to be used, and the experimental waste liquid can cause secondary pollution to the environment. And the traditional spectrum detection technology, such as inductively coupled plasma emission spectrometry, atomic absorption spectrometry and the like, has higher use cost and maintenance cost when used for measuring the concentration of the heavy metal in the soil.
In recent years, with the deep research and rapid development of optics, remote sensing technologies and other subjects, some soil heavy metal detection methods based on spectral analysis, such as terahertz spectroscopy, are derived. However, when the terahertz spectroscopy is adopted for analyzing the lead content of the soil at present, the obtained analysis result of the lead content of the soil is not accurate enough.
Disclosure of Invention
Because the existing method has the problems, the embodiment of the invention provides a method and a device for predicting the lead pollution degree of soil based on terahertz spectrum.
In a first aspect, an embodiment of the present invention provides a method for predicting a lead pollution degree of soil based on a terahertz spectrum, including:
acquiring the pH value of soil to be detected;
acquiring terahertz spectrum data of soil to be detected;
selecting a characteristic extraction mode matched with the terahertz spectrum data of the soil to be detected and a lead pollution degree prediction model matched with the soil to be detected according to the pH value of the soil to be detected;
and extracting characteristic data of the terahertz spectrum data of the soil to be detected by using the characteristic extraction mode, and inputting the characteristic data into the lead pollution degree prediction model to obtain the lead pollution degree of the soil to be detected.
Further, the selecting, according to the pH value of the soil to be detected, a feature extraction mode matched with the terahertz spectrum data of the soil to be detected and a lead pollution degree prediction model matched with the soil to be detected specifically include:
inquiring a first data table according to the pH value of the soil to be detected, and acquiring an optimal characteristic extraction mode corresponding to the pH value and a corresponding optimal lead pollution degree prediction model;
taking the obtained optimal feature extraction mode and the optimal lead pollution degree prediction model as a feature extraction mode matched with the terahertz spectrum data of the soil to be detected and a lead pollution degree prediction model matched with the soil to be detected;
the first data table stores the pH values, the optimal characteristic extraction mode corresponding to the pH values and the mapping relation between the optimal lead pollution degree prediction model in advance;
the optimal characteristic extraction mode and the optimal lead pollution degree prediction model corresponding to each pH value mean that the terahertz spectrum data of the soil with the corresponding pH value is subjected to spectrum characteristic extraction in the corresponding optimal characteristic extraction mode and input to the corresponding optimal lead pollution degree prediction model to predict, and the accuracy rate of the obtained lead pollution degree is highest.
Further, the method further comprises: establishing the first data table specifically comprises the following steps:
performing feature extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a principal component analysis method, inputting a feature extraction result into an SVM (support vector machine) -based lead pollution degree prediction model, and obtaining a first lead pollution degree prediction result of the sample soil with each pH value;
performing characteristic extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a principal component analysis method, inputting the characteristic extraction result into a BPNN-based lead pollution degree prediction model, and obtaining a second lead pollution degree prediction result of the sample soil with each pH value;
performing feature extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a continuous projection method, inputting the feature extraction result into an SVM (support vector machine) -based lead pollution degree prediction model, and obtaining a third lead pollution degree prediction result of the sample soil with each pH value;
carrying out feature extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a continuous projection method, inputting the feature extraction result into a BPNN-based lead pollution degree prediction model, and obtaining a fourth lead pollution degree prediction result of the sample soil with each pH value;
respectively selecting a characteristic extraction mode corresponding to a prediction result with the highest lead pollution degree accuracy and a corresponding lead pollution degree prediction model from a first lead pollution degree prediction result, a second lead pollution degree prediction result, a third lead pollution degree prediction result and a fourth lead pollution degree prediction result as an optimal characteristic extraction mode and an optimal lead pollution degree prediction model corresponding to the corresponding pH value aiming at the sample soil with each pH value;
and establishing a first data table according to the pH values and the mapping relation between the optimal characteristic extraction mode corresponding to each pH value and the optimal lead pollution degree prediction model.
Further, the first data table stores the following mapping relationships:
when the pH value is in an alkaline pH value range, the corresponding optimal characteristic extraction mode and the optimal lead pollution degree prediction model are a principal component analysis method and an SVM-based lead pollution degree prediction model;
when the pH value is in a neutral pH value range, the corresponding optimal feature extraction mode and the optimal lead pollution degree prediction model are a continuous projection method and an SVM-based lead pollution degree prediction model;
when the pH value is in an acidic pH value range, the corresponding optimal feature extraction mode and the optimal lead pollution degree prediction model are a continuous projection method and an SVM-based lead pollution degree prediction model.
Further, the extracting the characteristic data of the terahertz spectrum data of the soil to be detected by using the characteristic extraction mode, and inputting the characteristic data into the lead pollution degree prediction model to obtain the lead pollution degree of the soil to be detected specifically comprises:
if the pH value interval of the soil to be detected is an alkaline pH value interval, extracting characteristic data of terahertz spectrum data of the soil to be detected by using a principal component analysis method, and inputting the characteristic data into an SVM-based lead pollution degree prediction model to obtain the lead pollution degree of the soil to be detected;
if the pH value interval of the soil to be detected is a neutral pH value interval, extracting characteristic data of terahertz spectrum data of the soil to be detected by using a continuous projection method, and inputting the characteristic data into a lead pollution degree prediction model based on an SVM (support vector machine) to obtain the lead pollution degree of the soil to be detected;
if the pH value interval of the soil to be detected is an acidic pH value interval, extracting characteristic data of terahertz spectrum data of the soil to be detected by using a continuous projection method, and inputting the characteristic data into a lead pollution degree prediction model based on an SVM (support vector machine) to obtain the lead pollution degree of the soil to be detected.
In a second aspect, an embodiment of the present invention further provides a device for predicting a lead contamination degree of soil based on a terahertz spectrum, including:
the first acquisition module is used for acquiring the pH value of the soil to be detected;
the second acquisition module is used for acquiring terahertz spectrum data of the soil to be detected;
the selection module is used for selecting a characteristic extraction mode matched with the terahertz spectrum data of the soil to be detected and a lead pollution degree prediction model matched with the soil to be detected according to the pH value of the soil to be detected;
and the prediction module is used for extracting the characteristic data of the terahertz spectrum data of the soil to be detected by using the characteristic extraction mode, and inputting the characteristic data into the lead pollution degree prediction model to obtain the lead pollution degree of the soil to be detected.
Further, the selection module is specifically configured to:
inquiring a first data table according to the pH value of the soil to be detected, and acquiring an optimal characteristic extraction mode corresponding to the pH value and a corresponding optimal lead pollution degree prediction model;
taking the obtained optimal feature extraction mode and the optimal lead pollution degree prediction model as a feature extraction mode matched with the terahertz spectrum data of the soil to be detected and a lead pollution degree prediction model matched with the soil to be detected;
the first data table stores the pH values, the optimal characteristic extraction mode corresponding to the pH values and the mapping relation between the optimal lead pollution degree prediction model in advance;
the optimal characteristic extraction mode and the optimal lead pollution degree prediction model corresponding to each pH value mean that the terahertz spectrum data of the soil with the corresponding pH value is subjected to spectrum characteristic extraction in the corresponding optimal characteristic extraction mode and input to the corresponding optimal lead pollution degree prediction model to predict, and the accuracy rate of the obtained lead pollution degree is highest.
Further, the apparatus further comprises: an establishment module to:
performing feature extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a principal component analysis method, inputting a feature extraction result into an SVM (support vector machine) -based lead pollution degree prediction model, and obtaining a first lead pollution degree prediction result of the sample soil with each pH value;
performing characteristic extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a principal component analysis method, inputting the characteristic extraction result into a BPNN-based lead pollution degree prediction model, and obtaining a second lead pollution degree prediction result of the sample soil with each pH value;
performing feature extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a continuous projection method, inputting the feature extraction result into an SVM (support vector machine) -based lead pollution degree prediction model, and obtaining a third lead pollution degree prediction result of the sample soil with each pH value;
carrying out feature extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a continuous projection method, inputting the feature extraction result into a BPNN-based lead pollution degree prediction model, and obtaining a fourth lead pollution degree prediction result of the sample soil with each pH value;
respectively selecting a characteristic extraction mode corresponding to a prediction result with the highest lead pollution degree accuracy and a corresponding lead pollution degree prediction model from a first lead pollution degree prediction result, a second lead pollution degree prediction result, a third lead pollution degree prediction result and a fourth lead pollution degree prediction result as an optimal characteristic extraction mode and an optimal lead pollution degree prediction model corresponding to the corresponding pH value aiming at the sample soil with each pH value;
and establishing a first data table according to the pH values and the mapping relation between the optimal characteristic extraction mode corresponding to each pH value and the optimal lead pollution degree prediction model.
In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method for predicting the soil lead pollution level based on the terahertz spectrum according to the first aspect is implemented.
In a fourth aspect, the embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for predicting the soil lead pollution degree based on the terahertz spectrum according to the first aspect is implemented.
According to the technical scheme, the method and the device for predicting the lead pollution degree of the soil based on the terahertz spectrum select a characteristic extraction mode matched with the terahertz spectrum data of the soil to be detected and a lead pollution degree prediction model matched with the soil to be detected according to the pH value of the soil to be detected, then extract the characteristic data of the terahertz spectrum data of the soil to be detected by using the characteristic extraction mode, and input the characteristic data into the lead pollution degree prediction model to obtain the lead pollution degree of the soil to be detected. It should be noted that, since the pH value may affect the chemical binding state of lead in the soil, and further affect the terahertz spectrum of the soil, when the soil lead pollution degree is predicted according to the terahertz spectrum data, the pH value of the soil needs to be considered, that is, when the soil lead pollution degree is predicted, corresponding prediction is performed according to different pH conditions, so as to improve the accuracy of the prediction. In addition, because the characteristic extraction mode and the lead pollution degree prediction model which are applicable to the soil terahertz spectrum data corresponding to different pH values are different, the characteristic extraction mode and the lead pollution degree prediction model which correspond to the soil pH values are further selected, and a more accurate soil lead pollution degree prediction result can be obtained.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a method for predicting soil lead pollution degree based on terahertz spectrum according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a soil lead pollution degree prediction device based on terahertz spectroscopy according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following further describes embodiments of the present invention with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Before describing the embodiments of the present invention, the concept of the embodiments of the present invention will be described. After independent research and analysis based on the detection result of the content of the chemical binding state of the lead in the lead-containing soil, the inventor finds that the lead mainly exists in a carbonate binding state and an iron-manganese oxide binding state after entering the soil, the content of other binding states is relatively less, and the content of the lead in different binding states changes along with the change of pH. As shown in table 1, the content of different bound lead varies with the pH, and as can be seen from table 1, as the pH value increases, the content of exchangeable lead decreases sharply, while the content of other exchangeable lead increases gradually, wherein the increase is more obvious for carbonate bound lead and iron-manganese oxide bound lead, while the increase in the content of organic bound lead and residual lead is not obvious. It can be seen that the content of lead in the soil in different binding states is greatly influenced by the pH.
TABLE 1 content of each chemical bonding state in lead-containing soil
Figure BDA0002266544240000081
Meanwhile, the inventor collects terahertz spectrum data of the soil sample, performs spectrum analysis and cluster analysis, and finds that the pH has a large influence on the terahertz spectrum curve of the sample, and the influence is increased along with the increase of the concentration of lead, probably because the content of lead in the soil is changed due to the change of pH in different binding states. Therefore, the pollution degree of lead in the soil is identified according to different pH conditions. Based on the theoretical research, the embodiment of the invention provides a method and a device for predicting the soil lead pollution degree based on a terahertz spectrum. In addition, because the characteristic extraction mode and the lead pollution degree prediction model which are applicable to the soil terahertz spectrum data corresponding to different pH values are different, the characteristic extraction mode and the lead pollution degree prediction model which correspond to the soil pH values are further selected, and a more accurate soil lead pollution degree prediction result can be obtained. The following will be described in detail by way of specific examples.
Fig. 1 shows a flowchart of a method for predicting a degree of soil lead pollution based on a terahertz spectrum according to an embodiment of the present invention, and as shown in fig. 1, the method for predicting a degree of soil lead pollution based on a terahertz spectrum according to an embodiment of the present invention specifically includes the following steps:
step 101: and obtaining the pH value of the soil to be detected.
In this step, a general method for measuring the pH value of the soil may be adopted to obtain the pH value of the soil to be measured, and this embodiment does not limit the specific method for obtaining the pH value.
Step 102: and acquiring terahertz spectrum data of the soil to be detected.
In the step, the soil to be detected is irradiated by the terahertz light, and terahertz spectrum data of the soil to be detected is obtained. For example, a terahertz time-domain spectroscopy system TERA K15 of Menlo Systems company of Germany can be adopted, and a femtosecond fiber laser with the center wavelength of 1560nm, the repetition frequency of 100MHz and the pulse width of less than 90fs is adopted to obtain terahertz spectrum data of soil to be measured. The signal dynamic range of the spectrum system is more than 70dB, the bandwidth spectrum coverage area (0.1-6.4THz) is large, and the spectrum resolution can reach 6 GHz. In addition, in order to avoid the interference of water vapor in the air on the strong absorption of the terahertz waves, the generation and detection device of the terahertz waves can be preferably placed in a transparent acrylic closed box, and dry nitrogen is continuously filled in the transparent acrylic closed box, so that the relative humidity of the environment is ensured to be less than 5%. The test temperature was kept around 23 ℃.
In this step, the specific spectral data acquisition and preprocessing process is as follows:
the plaques were mounted on a sample holder and then scanned in a THz-TDS system to collect time domain spectral data from 0 to 80ps, with the average of 3 measurements taken at different locations for each plaque as the spectral data for the sample. And finally, inputting the time domain spectral data into a Teralyzer software program at a PC terminal for processing, thus obtaining corresponding absorption spectral data from the time domain spectral data. Because the signal-to-noise ratio of the terahertz spectrum of the soil to be detected at the head end and the high-frequency band is low, the embodiment only takes the data of 0.075-1.5 THz wave band for subsequent analysis.
It should be noted that, after the spectral data is obtained, the spectral data may be preprocessed by using Multivariate Scattering Correction (MSC), baseline correction and Savitzky-Golay smoothing, so as to reduce the influence of noise, sample granularity, and optical path variation on the spectrum. The absorption spectrum curve can be subjected to baseline correction by adopting a baseline inclination method, and subjected to Savitzky-Golay smoothing by adopting a cubic polynomial 5-point smoothing method.
Step 103: and selecting a characteristic extraction mode matched with the terahertz spectrum data of the soil to be detected and a lead pollution degree prediction model matched with the soil to be detected according to the pH value of the soil to be detected.
In this step, since the characteristic extraction mode applicable to the soil terahertz spectrum data corresponding to different pH values and the lead pollution degree prediction model applicable thereto are different, the more accurate prediction result of the soil lead pollution degree can be obtained by selecting the characteristic extraction mode corresponding to the soil pH value and the lead pollution degree prediction model corresponding thereto.
Step 104: and extracting characteristic data of the terahertz spectrum data of the soil to be detected by using the characteristic extraction mode, and inputting the characteristic data into the lead pollution degree prediction model to obtain the lead pollution degree of the soil to be detected.
In the step, the characteristic data of the terahertz spectrum data of the soil to be detected is extracted by using a characteristic extraction mode matched with the pH value of the soil to be detected, and the characteristic data is input into a lead pollution degree prediction model matched with the pH value of the soil to be detected, so that the lead pollution degree of the soil to be detected is obtained.
In this embodiment, it should be noted that a terahertz (THz) wave refers to an electromagnetic wave with a frequency in a range of 0.1 to 10THz (with a wavelength of 3000 to 30 μm), and the wavelength is between an infrared wave and a microwave, and compared with a near infrared spectroscopy and a hyperspectral spectroscopy, the most important and characteristic point of the THz spectroscopy is that it can simultaneously observe information of low-frequency modes between molecules and within molecules of a chemical substance. The inventor carries out preliminary research on the content of lead in soil and finds that a certain corresponding relation exists between the content of lead in a soil sample and a corresponding terahertz absorption spectrum, so that the conclusion that the determination of the content of heavy metal in the soil by utilizing the terahertz spectrum technology is feasible is obtained. After entering the soil, heavy metals generally exist in 5 forms, namely: exchangeable state (EXE), carbonate bound state (CAB), iron manganese oxide bound state (FMO), organic bound state (OM), and residue state (RES). The soil spectrum information has a certain correlation with the components of the heavy metal form, and the pH is one of the important factors influencing the distribution of the heavy metal form. Therefore, in order to more accurately detect the lead contamination level in the soil, the influence of pH needs to be taken into consideration. Based on the above theory, the present embodiment utilizes terahertz spectroscopy in combination with chemometric analysis to perform exploratory research on classification and identification of lead pollution levels in soils with different pH values. The results of the detection of the content of the lead in the lead-containing soil in the chemical bonding state are shown in table 1, and it can be seen that the lead mainly exists in the carbonate bonding state and the iron-manganese oxide bonding state after entering the soil, the content of other bonding states is relatively less, and the content of the lead in different bonding states changes with the change of pH. As can be seen from table 1, as the pH value increases, the content of exchangeable-state lead decreases sharply, while the content of other exchangeable-state lead increases gradually, wherein the increase is more obvious for carbonate-bound-state lead and iron-manganese oxide-bound-state lead, while the increase in the content of organic-bound-state lead and residue-state lead is not obvious. It can be seen that the content of lead in the soil in different binding states is greatly influenced by the pH.
The following describes the spectral characteristic analysis and cluster analysis of samples with different pH values under the same concentration: after the original absorption spectra of the soil samples with lead contents of 200mg/kg, 600mg/kg and 1000mg/kg under 3 different pH values are pretreated by MSC, base line correction, Savitzky-Golay smoothing and the like, invalid noise is inhibited, the resolution information among the samples is enhanced, although no obvious absorption peak appears, the absorption characteristics among different samples are integrally shown in the way that the absorption coefficient is gradually increased along with the increase of the pH value. At a lead content of 200mg/kg, the spectral curves for the different pH values overlap with one another without significant differences. As the concentration increases, at 600mg/kg the spectral curve at pH8.5 has been clearly separated from pH5.5 and pH7.0, and at 1000mg/kg the spectral curves at three different pH's show a clear hierarchical separation. In order to research the difference between samples with the same concentration and different pH values, the spectral data is subjected to principal component analysis, the first three principal components are subjected to cluster analysis, and the clustering result shows that when the lead content is 200mg/kg, the samples with different pH values are spatially overlapped and are not obviously separated. The separation degree of the samples on the space is gradually increased along with the increase of the concentration, and the clustering effect of 3 samples is already obvious when the lead content is 1000 mg/kg. The results of the cluster analysis are consistent with the characteristics of the spectral curves. From this, it can be concluded that pH has a large influence on the terahertz spectrum curve of the sample, and the influence increases with the increase of the concentration of lead, probably due to the change of the chemical binding state content of lead in soil caused by the change of pH. Therefore, the pollution level of lead in the soil is identified according to different pH conditions.
Based on the theoretical analysis, the characteristic extraction mode matched with the terahertz spectrum data of the soil to be detected and the lead pollution degree prediction model matched with the soil to be detected are selected according to the pH value of the soil to be detected, then the characteristic data of the terahertz spectrum data of the soil to be detected are extracted by using the characteristic extraction mode and input into the lead pollution degree prediction model, and therefore a more accurate soil lead pollution degree prediction result can be obtained. It should be noted that, since the pH value may affect the chemical binding state of lead in the soil, and further affect the terahertz spectrum of the soil, when the soil lead pollution degree is predicted according to the terahertz spectrum data, the pH value of the soil needs to be considered, that is, when the soil lead pollution degree is predicted, corresponding prediction is performed according to different pH conditions, so as to improve the accuracy of the prediction. In addition, because the soil terahertz spectrum data corresponding to different pH values have different characteristics, and the soil terahertz spectrum data corresponding to different pH values are different in applicable characteristic extraction mode and applicable lead pollution degree prediction model, the characteristic extraction mode corresponding to the soil pH value and the corresponding lead pollution degree prediction model are further selected, and a more accurate soil lead pollution degree prediction result can be obtained.
Further, based on the content of the foregoing embodiment, in this embodiment, the step 103 selects, according to the pH value of the soil to be detected, a feature extraction manner matched with the terahertz spectrum data of the soil to be detected and a lead pollution degree prediction model matched with the soil to be detected, and specifically includes:
inquiring a first data table according to the pH value of the soil to be detected, and acquiring an optimal characteristic extraction mode corresponding to the pH value and a corresponding optimal lead pollution degree prediction model;
taking the obtained optimal feature extraction mode and the optimal lead pollution degree prediction model as a feature extraction mode matched with the terahertz spectrum data of the soil to be detected and a lead pollution degree prediction model matched with the soil to be detected;
the first data table stores the pH values, the optimal characteristic extraction mode corresponding to the pH values and the mapping relation between the optimal lead pollution degree prediction model in advance;
the optimal characteristic extraction mode and the optimal lead pollution degree prediction model corresponding to each pH value mean that the terahertz spectrum data of the soil with the corresponding pH value is subjected to spectrum characteristic extraction in the corresponding optimal characteristic extraction mode and input to the corresponding optimal lead pollution degree prediction model to predict, and the accuracy rate of the obtained lead pollution degree is highest.
In this embodiment, a first data table in which each pH value and a mapping relationship between an optimal feature extraction manner and an optimal lead pollution degree prediction model corresponding to each pH value are stored is established in advance, then the first data table is queried according to the pH value of the soil to be detected, the optimal feature extraction manner corresponding to the pH value and the corresponding optimal lead pollution degree prediction model are obtained, and the obtained optimal feature extraction manner and the obtained optimal lead pollution degree prediction model are used as a feature extraction manner matched with terahertz spectrum data of the soil to be detected and a lead pollution degree prediction model matched with the soil to be detected.
In this embodiment, a first data table shown in table 2 may be established in advance, and a mapping relationship between the acidic pH range, the neutral pH range, the alkaline pH range, the corresponding optimal feature extraction manner, and the optimal lead pollution degree prediction model is established in the first data table shown in table 2.
In this embodiment, it should be noted that when the optimal feature extraction manner corresponding to the pH value and the corresponding optimal lead pollution degree prediction model are obtained according to the first data table shown in the pH value lookup table 2 of the soil to be detected, the pH interval corresponding to the pH value of the soil to be detected may be determined first, and then the optimal feature extraction manner corresponding to the pH interval and the corresponding optimal lead pollution degree prediction model are used as the optimal feature extraction manner corresponding to the soil to be detected and the corresponding optimal lead pollution degree prediction model. For example, assuming that the pH value of the soil to be measured is 7.4, it can be known from table 2 that the range of the pH value of the soil to be measured is a neutral pH range, that is, the soil to be measured is neutral soil, so that the optimal feature extraction method corresponding to the soil to be measured can be determined as a continuous projection method, and the optimal lead pollution degree prediction model corresponding to the soil to be measured is a prediction model based on an SVM.
TABLE 2 first data sheet
Figure BDA0002266544240000131
Figure BDA0002266544240000141
Further, based on the content of the foregoing embodiment, in this embodiment, the method further includes: establishing the first data table specifically comprises the following steps:
performing feature extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a principal component analysis method, inputting a feature extraction result into an SVM (support vector machine) -based lead pollution degree prediction model, and obtaining a first lead pollution degree prediction result of the sample soil with each pH value;
performing characteristic extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a principal component analysis method, inputting the characteristic extraction result into a BPNN-based lead pollution degree prediction model, and obtaining a second lead pollution degree prediction result of the sample soil with each pH value;
performing feature extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a continuous projection method, inputting the feature extraction result into an SVM (support vector machine) -based lead pollution degree prediction model, and obtaining a third lead pollution degree prediction result of the sample soil with each pH value;
carrying out feature extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a continuous projection method, inputting the feature extraction result into a BPNN-based lead pollution degree prediction model, and obtaining a fourth lead pollution degree prediction result of the sample soil with each pH value;
respectively selecting a characteristic extraction mode corresponding to a prediction result with the highest lead pollution degree accuracy and a corresponding lead pollution degree prediction model from a first lead pollution degree prediction result, a second lead pollution degree prediction result, a third lead pollution degree prediction result and a fourth lead pollution degree prediction result as an optimal characteristic extraction mode and an optimal lead pollution degree prediction model corresponding to the corresponding pH value aiming at the sample soil with each pH value;
and establishing a first data table according to the pH values and the mapping relation between the optimal characteristic extraction mode corresponding to each pH value and the optimal lead pollution degree prediction model.
In this embodiment, the processing procedures of spectral feature analysis and cluster analysis of samples with different degrees of contamination at different pH are first described:
from the absorption spectra of the lightly, moderately, and heavily contaminated samples at ph5.5, ph7.0, and ph8.5, it can be seen that the spectral absorption curves are very similar in general trend, showing only a gradual increase in the absorption coefficient with increasing concentration in the absorption amplitude. At pH5.5, the spectral curves of the three samples with different degrees of contamination overlap each other, and the resolution information is weaker. The resolution information was further enhanced at pH7.0 with increasing pH, while the spectral curves for the three different contamination levels showed significant hierarchical separation at pH 8.5. In order to research the identification capability of the absorption spectrum on the pollution degree, the main component analysis is carried out on the spectrum data, then the first three main components are taken for carrying out cluster analysis, and as can be seen from the cluster result, the samples under the condition of pH5.5 are in overlapping distribution in space, and the dispersion degree of the heavily polluted samples is larger. With the increase of pH, the separation degree of different samples on space is gradually increased, and the dispersion degree of the samples of the same type is gradually reduced. At the pH of 8.5, the clustering effect of the three samples is obvious, which indicates that the absorption spectrum has certain discrimination capability. However, the samples at pH5.5 and pH7.0 have poor clustering ability, and need further chemometric analysis. Therefore, a classification discrimination model is established by combining a chemometrics method and a near absorption spectrum.
In the present embodiment, the chemometric analysis method includes Principal Component Analysis (PCA) and continuous projection (SPA).
In this embodiment, it should be noted that the principal component analysis method is an effective method for reducing data dimensionality and removing overlapping information, and converts a large data set into a small number of uncorrelated variables (called principal components, PCs) without losing basic information. Generally, when the cumulative variance contribution rate of several principal components reaches more than 85%, the principal components can be used to briefly summarize the characteristics of the original data, and meanwhile, the score scatter diagram of the principal components can depict the difference between samples. In this context, principal component analysis can provide very important information for sample classification, and the principal component analysis method is used for dimensionality reduction of spectral data, so that the data volume of modeling can be reduced. And the difference between samples can be explored by carrying out cluster analysis on the data subjected to dimensionality reduction.
In this embodiment, it should be noted that the continuous projection algorithm is a forward variable selection algorithm that minimizes vector space collinearity, and has wide application in spectral analysis. The SPA selects related wavelengths through analysis of the influence of independent variables on dependent variables, and the effect of eliminating redundant information in an original spectrum matrix is achieved. The characteristic wavelength with less collinearity and redundant information is beneficial to reducing the calculation amount, simplifying the model and improving the robustness of the model. The terahertz spectrum data is subjected to characteristic frequency selection, and characteristic frequencies with good performance are selected for modeling analysis of sample classification.
In this embodiment, it should be noted that a Support Vector Machine (SVM) is a supervised learning method for analyzing data and recognizing patterns. The generalization capability of the learning machine is improved by searching for the structured minimum risk, and the minimization of the experience risk and the confidence limit is realized. Compared with other analysis methods, the SVM can obtain better statistical effect under the condition of less training variables or samples. The selection of the kernel function in the SVM model has obvious influence on the performance of the model, and the penalty factor c and the kernel function parameter g are key factors influencing the machine learning, prediction capability and generalization capability. The method takes a commonly used Radial Basis Function (RBF) as a kernel function, and adopts a grid optimization method to calculate c and g. In modeling, the labels for the 3 samples of degree of contamination (mild, moderate, severe) were defined as 1, 2, 3, respectively. The input data are spectral data after PCA and SPA. The LIBSVM multi-classification tool box of professor Lin Zhi ren of Taiwan university is utilized to realize multi-class classification.
In this embodiment, it should be noted that the BP neural network (BPNN) is a nonlinear multi-layer feedforward neural network based on an error back propagation algorithm, and includes an input layer node, one or more hidden layer nodes, and an output layer node. The information is transmitted forwards across nodes of different layers in a single direction, errors are transmitted reversely, the connection degree of neurons of all layers is controlled by a connection weight, and the weight is adjusted according to the mean square error of pre-network cross validation in the network training process. The research initially sets parameters such as input layer nodes, output layer nodes, target errors, maximum iteration times, learning rate and the like to be 3, 1 multiplied by 10 < -3 >, 200 and 0.01 respectively, and the number of hidden layer nodes is calculated according to an empirical formula
Figure BDA0002266544240000161
Selecting, determining the optimal hidden layer section according to the network training effectAnd (6) counting the number of points. The network is trained by adopting a Bayesian rule method, a logarithmic sigmoid transfer function logsig is selected as a hidden layer transfer function, and a linear transfer function purelin is adopted as an output layer transfer function. When the network training reaches the target error or the maximum iteration number, the network training is stopped. Wherein m is the number of hidden layer nodes, n is the number of input layer nodes, l is the number of output layer nodes, and alpha is a constant between 1 and 10.
In this embodiment, when the first data table is established, four sets of analyses are performed to obtain the prediction accuracy of the sample soil with different pH values under the following four sets of conditions: a principal component analysis method PCA + lead pollution degree prediction model based on SVM; PCA + BPNN-based lead pollution degree prediction model; a continuous projection method SPA + lead pollution degree prediction model based on SVM; a continuous projection method SPA + lead pollution degree prediction model based on BPNN; and then selecting a characteristic extraction mode corresponding to the prediction result with the highest lead pollution degree accuracy and a corresponding lead pollution degree prediction model as an optimal characteristic extraction mode and an optimal lead pollution degree prediction model corresponding to the corresponding pH value.
In this example, the prediction experiments of the acidic pH range, the neutral pH range and the basic pH range were carried out as represented by pH5.5, pH7.0 and pH 8.5.
In this embodiment, since the spectrum obtained by using the principal component analysis method PCA contains more than 99% of the original spectral data information, the spectrum obtained by using the principal component analysis method PCA may also be referred to as a full spectrum. The following two classification models based on full spectrum and the prediction effects of the two classification models are introduced:
in this example, in order to reduce the data input amount, the data of the first three principal components (information including more than 99% of the original spectral data) obtained after the principal component analysis method PCA is used as the model input features, and SVM and BPNN are respectively used to identify the contamination degree of the samples under different pH conditions. The calculation results of parameters c and g of SVM, the optimal number of hidden layer nodes of BPNN and the classification prediction results of the two models are shown in table 3. As can be seen from Table 3, the samples with different pH values all have higher identification precision, and the prediction precision is more than 80%. The SVM model with the best sample recognition effect of pH8.5, the SVM model with the best correction set recognition accuracy of 100%, the SVM model with the best prediction set recognition accuracy of 96.67, the SVM model with the best sample recognition effect of pH7.0, the BPNN model with the best sample recognition effect of pH5.5, the BPNN model with the best correction set recognition accuracy of 91.67%, and the SVM model with the best prediction set recognition accuracy of 83.33%. The sample identification effect of ph7.0 and ph5.5 was relatively low, probably due to the overlap of the data of the full spectrum. Therefore, a mode of selecting the characteristic frequency of the terahertz spectrum data can be adopted, the characteristic frequency with good performance is selected for modeling analysis (namely, the spectrum data is screened by adopting a continuous projection method to be described later, and the characteristic frequency is selected), and the identification precision of the model is improved.
TABLE 3 prediction of all spectral models
Figure BDA0002266544240000181
In this embodiment, the problem that the prediction of the pollution level of the soil with a lower pH value is not accurate due to the fact that full spectrum data obtained by a principal component analysis method may overlap is solved. According to the embodiment, a continuous projection method SPA is adopted, the characteristic frequency of terahertz spectrum data is selected, and the characteristic frequency with good performance is selected for modeling analysis, so that the identification precision of the model is improved. For example, the SPA algorithm may be used to screen the spectral data. For example, a soil sample at pH5.5 was screened for 7 characteristic frequencies (0.294, 0.525, 0.738, 1.069, 1.163, 1.388, 1.431 THz). Soil samples at pH7.0 were screened for 6 characteristic frequencies (0.187, 0.331, 0.625, 0.788, 1.1, 1.475 THz). Soil samples at pH8.5 were screened for 6 characteristic frequencies (0.081, 0.119, 1.131, 1.231, 1.312, 1.419 THz).
Two classification models based on characteristic frequencies and the prediction effects of the two classification models are described as follows:
two prediction models, namely an SPA-SVM and an SPA-BPNN, are established based on the characteristic frequency selected by the SPA, and the parameters and classification prediction results of the two models are shown in Table 4. It can be seen that the model identification accuracy of the samples with pH5.5 and pH7.0 after the SPA selection of the characteristic frequency is obviously improved, which may be because some useless interference frequencies are removed, and the resolution information is enhanced. The recognition accuracy of the pH8.5 sample is reduced compared to the full spectrum. The sample recognition effect of pH5.5 is best to SPA-SVM model, the recognition precision of correction set is 95%, and the recognition precision of prediction set is 90%. The sample with the pH value of 7.0 has the best identification effect and is also an SPA-SVM model, the identification precision of the correction set is 100 percent, and the identification precision of the prediction set is 96.67 percent. As can be seen from the model comparison, the best classification model for the sample with pH5.5 is the SPA-SVM, the best classification model for the sample with pH7.0 is also the SPA-SVM, and the best classification model for the sample with pH8.5 is the full spectrum SVM model. It can be seen that the classification performance of SVM is better than that of BPNN in this experiment. Experimental results show that the influence of spectral data overlapping can be reduced by selecting the characteristic frequency for modeling, and the classification precision of the model is improved. The experimental result also shows that the method is feasible to identify the lead pollution degree of the soil with different pH values by using the terahertz spectrum and combining the chemometric analysis method.
TABLE 4 prediction results based on characteristic frequencies
Figure BDA0002266544240000191
Further, based on the content of the above embodiment, in the present embodiment, the following mapping relationship is stored in the first data table:
when the pH value is in an alkaline pH value range, the corresponding optimal characteristic extraction mode and the optimal lead pollution degree prediction model are a principal component analysis method and an SVM-based lead pollution degree prediction model;
when the pH value is in a neutral pH value range, the corresponding optimal feature extraction mode and the optimal lead pollution degree prediction model are a continuous projection method and an SVM-based lead pollution degree prediction model;
when the pH value is in an acidic pH value range, the corresponding optimal feature extraction mode and the optimal lead pollution degree prediction model are a continuous projection method and an SVM-based lead pollution degree prediction model.
In this embodiment, the first data table stores the optimal feature extraction manner and the optimal lead pollution degree prediction model corresponding to soil in three typical pH value ranges (alkaline range, neutral range, and acidic range), so that the corresponding optimal feature extraction manner and the optimal lead pollution degree prediction model can be found regardless of whether the soil to be detected is alkaline, neutral, or acidic, and thus the lead pollution degree of the soil to be detected can be accurately determined.
In this embodiment, the alkaline pH range may refer to a pH range of 7.5 to 13.5, the neutral pH range may refer to a pH range of 6.8 to 7.5, and the acidic pH range may refer to a pH range of 0.5 to 6.8.
Further, based on the content of the foregoing embodiment, in this embodiment, the step 104 extracts feature data of the terahertz spectrum data of the soil to be detected by using the feature extraction method, and inputs the feature data into the lead pollution degree prediction model to obtain the lead pollution degree of the soil to be detected, which specifically includes:
if the pH value interval of the soil to be detected is an alkaline pH value interval, extracting characteristic data of terahertz spectrum data of the soil to be detected by using a principal component analysis method, and inputting the characteristic data into an SVM-based lead pollution degree prediction model to obtain the lead pollution degree of the soil to be detected;
if the pH value interval of the soil to be detected is a neutral pH value interval, extracting characteristic data of terahertz spectrum data of the soil to be detected by using a continuous projection method, and inputting the characteristic data into a lead pollution degree prediction model based on an SVM (support vector machine) to obtain the lead pollution degree of the soil to be detected;
if the pH value interval of the soil to be detected is an acidic pH value interval, extracting characteristic data of terahertz spectrum data of the soil to be detected by using a continuous projection method, and inputting the characteristic data into a lead pollution degree prediction model based on an SVM (support vector machine) to obtain the lead pollution degree of the soil to be detected.
In this embodiment, since the first data table stores the optimal feature extraction manner and the optimal lead pollution degree prediction model corresponding to the soil with typical three types of pH values (alkaline, neutral, and acidic), the corresponding optimal feature extraction manner and the optimal lead pollution degree prediction model can be determined according to the pH value of the soil to be detected, so as to conveniently and accurately determine the lead pollution degree of the soil to be detected.
It should be noted that, for the problem of detecting the lead pollution degree in the soil, the present embodiment performs exploratory research on classification and identification of the lead pollution degree in the soil with different pH values by using the terahertz spectrum technology in combination with the chemometric analysis method. In the embodiment, terahertz spectrum data of a soil sample is collected and preprocessed, dimensionality reduction and characteristic frequency selection are performed on the preprocessed data by adopting a Principal Component Analysis (PCA) method and a continuous projection (SPA) method, and then a classification model of a Support Vector Machine (SVM) and an error Back Propagation Neural Network (BPNN) is established based on the full spectrum data after PCA and the characteristic frequency data selected by the SPA method. The experimental results show that: the optimal classification model of the sample with the pH value of 5.5 is SPA-SVM, the recognition precision of the correction set is 95 percent, and the recognition precision of the prediction set is 90 percent. The sample optimal classification model of pH7.0 is also the SPA-SVM model, the recognition precision of the correction set is 100%, and the recognition precision of the prediction set is 96.67%, while the sample optimal classification model of pH8.5 is the full-spectrum SVM model, the recognition precision of the positive set is 100%, and the recognition precision of the prediction set is 96.67%. The research result shows that: the method is feasible to identify the lead pollution degree of the soil with different pH values by using the terahertz spectrum and combining a chemometric analysis method. The research provides a new idea for identifying the lead pollution degree in the soil with different pH values, and also provides theoretical methods and technical supports for identifying the pollution degree of other heavy metals in the soil with different pH values.
Fig. 2 is a schematic structural diagram of a device for predicting the degree of soil lead pollution based on a terahertz spectrum according to an embodiment of the present invention, and as shown in fig. 2, the device for predicting the degree of soil lead pollution based on a terahertz spectrum according to an embodiment of the present invention includes: a first obtaining module 21, a second obtaining module 22, a selecting module 23 and a predicting module 24, wherein:
the first acquisition module 21 is used for acquiring the pH value of the soil to be detected;
the second acquisition module 22 is used for acquiring terahertz spectrum data of the soil to be detected;
the selection module 23 is configured to select a feature extraction mode matched with the terahertz spectrum data of the soil to be detected and a lead pollution degree prediction model matched with the soil to be detected according to the pH value of the soil to be detected;
and the prediction module 24 is configured to extract feature data of the terahertz spectrum data of the soil to be detected by using the feature extraction method, and input the feature data into the lead pollution degree prediction model to obtain the lead pollution degree of the soil to be detected.
Further, based on the content of the foregoing embodiment, in this embodiment, the selecting module 23 is specifically configured to:
inquiring a first data table according to the pH value of the soil to be detected, and acquiring an optimal characteristic extraction mode corresponding to the pH value and a corresponding optimal lead pollution degree prediction model;
taking the obtained optimal feature extraction mode and the optimal lead pollution degree prediction model as a feature extraction mode matched with the terahertz spectrum data of the soil to be detected and a lead pollution degree prediction model matched with the soil to be detected;
the first data table stores the pH values, the optimal characteristic extraction mode corresponding to the pH values and the mapping relation between the optimal lead pollution degree prediction model in advance;
the optimal characteristic extraction mode and the optimal lead pollution degree prediction model corresponding to each pH value mean that the terahertz spectrum data of the soil with the corresponding pH value is subjected to spectrum characteristic extraction in the corresponding optimal characteristic extraction mode and input to the corresponding optimal lead pollution degree prediction model to predict, and the accuracy rate of the obtained lead pollution degree is highest.
Further, based on the content of the foregoing embodiment, in this embodiment, the apparatus further includes: an establishment module to:
performing feature extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a principal component analysis method, inputting a feature extraction result into an SVM (support vector machine) -based lead pollution degree prediction model, and obtaining a first lead pollution degree prediction result of the sample soil with each pH value;
performing characteristic extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a principal component analysis method, inputting the characteristic extraction result into a BPNN-based lead pollution degree prediction model, and obtaining a second lead pollution degree prediction result of the sample soil with each pH value;
performing feature extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a continuous projection method, inputting the feature extraction result into an SVM (support vector machine) -based lead pollution degree prediction model, and obtaining a third lead pollution degree prediction result of the sample soil with each pH value;
carrying out feature extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a continuous projection method, inputting the feature extraction result into a BPNN-based lead pollution degree prediction model, and obtaining a fourth lead pollution degree prediction result of the sample soil with each pH value;
respectively selecting a characteristic extraction mode corresponding to a prediction result with the highest lead pollution degree accuracy and a corresponding lead pollution degree prediction model from a first lead pollution degree prediction result, a second lead pollution degree prediction result, a third lead pollution degree prediction result and a fourth lead pollution degree prediction result as an optimal characteristic extraction mode and an optimal lead pollution degree prediction model corresponding to the corresponding pH value aiming at the sample soil with each pH value;
and establishing a first data table according to the pH values and the mapping relation between the optimal characteristic extraction mode corresponding to each pH value and the optimal lead pollution degree prediction model.
Since the device for predicting the degree of soil lead pollution based on the terahertz spectrum provided by the embodiment can be used for executing the method for predicting the degree of soil lead pollution based on the terahertz spectrum provided by the above embodiment, the working principle and the beneficial effects are similar, and detailed description is omitted here.
Terahertz spectrum-based soil lead pollution degree prediction device
Based on the same inventive concept, another embodiment of the present invention provides an electronic device, which specifically includes the following components, with reference to fig. 3: a processor 301, a memory 302, a communication interface 303, and a communication bus 304;
the processor 301, the memory 302 and the communication interface 303 complete mutual communication through the communication bus 304; the communication interface 303 is used for realizing information transmission between the devices;
the processor 301 is configured to call a computer program in the memory 302, and the processor implements all the steps of the method for predicting the soil lead pollution level based on the terahertz spectrum when executing the computer program, for example, the processor implements the following steps when executing the computer program: acquiring the pH value of soil to be detected; acquiring terahertz spectrum data of soil to be detected; selecting a characteristic extraction mode matched with the terahertz spectrum data of the soil to be detected and a lead pollution degree prediction model matched with the soil to be detected according to the pH value of the soil to be detected; and extracting characteristic data of the terahertz spectrum data of the soil to be detected by using the characteristic extraction mode, and inputting the characteristic data into the lead pollution degree prediction model to obtain the lead pollution degree of the soil to be detected.
Based on the same inventive concept, another embodiment of the present invention provides a non-transitory computer-readable storage medium, having a computer program stored thereon, which when executed by a processor implements all the steps of the method for predicting the soil lead pollution level based on terahertz spectrum, for example, the processor implements the following steps when executing the computer program: acquiring the pH value of soil to be detected; acquiring terahertz spectrum data of soil to be detected; selecting a characteristic extraction mode matched with the terahertz spectrum data of the soil to be detected and a lead pollution degree prediction model matched with the soil to be detected according to the pH value of the soil to be detected; and extracting characteristic data of the terahertz spectrum data of the soil to be detected by using the characteristic extraction mode, and inputting the characteristic data into the lead pollution degree prediction model to obtain the lead pollution degree of the soil to be detected.
In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the technical solutions mentioned above may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method for predicting the soil lead pollution level based on the terahertz spectrum according to the embodiments or some portions of the embodiments.
In addition, in the present invention, terms such as "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Moreover, in the present invention, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Furthermore, in the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (6)

1. A method for predicting the lead pollution degree of soil based on a terahertz spectrum is characterized by comprising the following steps:
acquiring the pH value of soil to be detected;
acquiring terahertz spectrum data of soil to be detected;
selecting a characteristic extraction mode matched with the terahertz spectrum data of the soil to be detected and a lead pollution degree prediction model matched with the soil to be detected according to the pH value of the soil to be detected;
extracting characteristic data of the terahertz spectrum data of the soil to be detected by using the characteristic extraction mode, and inputting the characteristic data into the lead pollution degree prediction model to obtain the lead pollution degree of the soil to be detected;
the method comprises the following steps of selecting a characteristic extraction mode matched with terahertz spectrum data of soil to be detected and a lead pollution degree prediction model matched with the soil to be detected according to the pH value of the soil to be detected, and specifically comprises the following steps:
inquiring a first data table according to the pH value of the soil to be detected, and acquiring an optimal characteristic extraction mode corresponding to the pH value and a corresponding optimal lead pollution degree prediction model;
taking the obtained optimal feature extraction mode and the optimal lead pollution degree prediction model as a feature extraction mode matched with the terahertz spectrum data of the soil to be detected and a lead pollution degree prediction model matched with the soil to be detected;
the first data table stores the pH values, the optimal characteristic extraction mode corresponding to the pH values and the mapping relation between the optimal lead pollution degree prediction model in advance;
the optimal characteristic extraction mode and the optimal lead pollution degree prediction model corresponding to each pH value mean that the terahertz spectrum data of the soil with the corresponding pH value is subjected to spectrum characteristic extraction in the corresponding optimal characteristic extraction mode and input to the corresponding optimal lead pollution degree prediction model to predict, and the accuracy rate of the obtained lead pollution degree is highest;
the method further comprises the following steps: establishing the first data table specifically comprises the following steps:
performing feature extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a principal component analysis method, inputting a feature extraction result into an SVM (support vector machine) -based lead pollution degree prediction model, and obtaining a first lead pollution degree prediction result of the sample soil with each pH value;
performing characteristic extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a principal component analysis method, inputting the characteristic extraction result into a BPNN-based lead pollution degree prediction model, and obtaining a second lead pollution degree prediction result of the sample soil with each pH value;
performing feature extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a continuous projection method, inputting the feature extraction result into an SVM (support vector machine) -based lead pollution degree prediction model, and obtaining a third lead pollution degree prediction result of the sample soil with each pH value;
carrying out feature extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a continuous projection method, inputting the feature extraction result into a BPNN-based lead pollution degree prediction model, and obtaining a fourth lead pollution degree prediction result of the sample soil with each pH value;
respectively selecting a characteristic extraction mode corresponding to a prediction result with the highest lead pollution degree accuracy and a corresponding lead pollution degree prediction model from a first lead pollution degree prediction result, a second lead pollution degree prediction result, a third lead pollution degree prediction result and a fourth lead pollution degree prediction result as an optimal characteristic extraction mode and an optimal lead pollution degree prediction model corresponding to the corresponding pH value aiming at the sample soil with each pH value;
and establishing a first data table according to the pH values and the mapping relation between the optimal characteristic extraction mode corresponding to each pH value and the optimal lead pollution degree prediction model.
2. The method for predicting the lead pollution degree of soil based on terahertz spectrum according to claim 1, wherein the first data table stores the following mapping relationship:
when the pH value is in an alkaline pH value range, the corresponding optimal characteristic extraction mode and the optimal lead pollution degree prediction model are a principal component analysis method and an SVM-based lead pollution degree prediction model;
when the pH value is in a neutral pH value range, the corresponding optimal feature extraction mode and the optimal lead pollution degree prediction model are a continuous projection method and an SVM-based lead pollution degree prediction model;
when the pH value is in an acidic pH value range, the corresponding optimal feature extraction mode and the optimal lead pollution degree prediction model are a continuous projection method and an SVM-based lead pollution degree prediction model.
3. The method for predicting the lead pollution degree of soil based on the terahertz spectrum as claimed in claim 2, wherein the step of extracting the characteristic data of the terahertz spectrum data of the soil to be detected by using the characteristic extraction method and inputting the characteristic data into the lead pollution degree prediction model to obtain the lead pollution degree of the soil to be detected specifically comprises the steps of:
if the pH value interval of the soil to be detected is an alkaline pH value interval, extracting characteristic data of terahertz spectrum data of the soil to be detected by using a principal component analysis method, and inputting the characteristic data into an SVM-based lead pollution degree prediction model to obtain the lead pollution degree of the soil to be detected;
if the pH value interval of the soil to be detected is a neutral pH value interval, extracting characteristic data of terahertz spectrum data of the soil to be detected by using a continuous projection method, and inputting the characteristic data into a lead pollution degree prediction model based on an SVM (support vector machine) to obtain the lead pollution degree of the soil to be detected;
if the pH value interval of the soil to be detected is an acidic pH value interval, extracting characteristic data of terahertz spectrum data of the soil to be detected by using a continuous projection method, and inputting the characteristic data into a lead pollution degree prediction model based on an SVM (support vector machine) to obtain the lead pollution degree of the soil to be detected.
4. A soil lead pollution degree prediction device based on terahertz spectrum is characterized by comprising:
the first acquisition module is used for acquiring the pH value of the soil to be detected;
the second acquisition module is used for acquiring terahertz spectrum data of the soil to be detected;
the selection module is used for selecting a characteristic extraction mode matched with the terahertz spectrum data of the soil to be detected and a lead pollution degree prediction model matched with the soil to be detected according to the pH value of the soil to be detected;
the selection module is specifically configured to: inquiring a first data table according to the pH value of the soil to be detected, and acquiring an optimal characteristic extraction mode corresponding to the pH value and a corresponding optimal lead pollution degree prediction model;
taking the obtained optimal feature extraction mode and the optimal lead pollution degree prediction model as a feature extraction mode matched with the terahertz spectrum data of the soil to be detected and a lead pollution degree prediction model matched with the soil to be detected;
the first data table stores the pH values, the optimal characteristic extraction mode corresponding to the pH values and the mapping relation between the optimal lead pollution degree prediction model in advance;
the optimal characteristic extraction mode and the optimal lead pollution degree prediction model corresponding to each pH value mean that the terahertz spectrum data of the soil with the corresponding pH value is subjected to spectrum characteristic extraction in the corresponding optimal characteristic extraction mode and input to the corresponding optimal lead pollution degree prediction model to predict, and the accuracy rate of the obtained lead pollution degree is highest;
the prediction module is used for extracting the characteristic data of the terahertz spectrum data of the soil to be detected by using the characteristic extraction mode, and inputting the characteristic data into the lead pollution degree prediction model to obtain the lead pollution degree of the soil to be detected;
the device further comprises: an establishment module to:
performing feature extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a principal component analysis method, inputting a feature extraction result into an SVM (support vector machine) -based lead pollution degree prediction model, and obtaining a first lead pollution degree prediction result of the sample soil with each pH value;
performing characteristic extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a principal component analysis method, inputting the characteristic extraction result into a BPNN-based lead pollution degree prediction model, and obtaining a second lead pollution degree prediction result of the sample soil with each pH value;
performing feature extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a continuous projection method, inputting the feature extraction result into an SVM (support vector machine) -based lead pollution degree prediction model, and obtaining a third lead pollution degree prediction result of the sample soil with each pH value;
carrying out feature extraction on the terahertz spectrum data of the sample soil with each pH value by adopting a continuous projection method, inputting the feature extraction result into a BPNN-based lead pollution degree prediction model, and obtaining a fourth lead pollution degree prediction result of the sample soil with each pH value;
respectively selecting a characteristic extraction mode corresponding to a prediction result with the highest lead pollution degree accuracy and a corresponding lead pollution degree prediction model from a first lead pollution degree prediction result, a second lead pollution degree prediction result, a third lead pollution degree prediction result and a fourth lead pollution degree prediction result as an optimal characteristic extraction mode and an optimal lead pollution degree prediction model corresponding to the corresponding pH value aiming at the sample soil with each pH value;
and establishing a first data table according to the pH values and the mapping relation between the optimal characteristic extraction mode corresponding to each pH value and the optimal lead pollution degree prediction model.
5. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the method for predicting the soil lead pollution level based on terahertz spectrum according to any one of claims 1 to 3.
6. A non-transitory computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the method for predicting the lead contamination level of soil based on terahertz spectroscopy according to any one of claims 1 to 3.
CN201911089940.1A 2019-11-08 2019-11-08 Method and device for predicting soil lead pollution degree based on terahertz spectrum Active CN110987853B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911089940.1A CN110987853B (en) 2019-11-08 2019-11-08 Method and device for predicting soil lead pollution degree based on terahertz spectrum

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911089940.1A CN110987853B (en) 2019-11-08 2019-11-08 Method and device for predicting soil lead pollution degree based on terahertz spectrum

Publications (2)

Publication Number Publication Date
CN110987853A CN110987853A (en) 2020-04-10
CN110987853B true CN110987853B (en) 2022-02-11

Family

ID=70083861

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911089940.1A Active CN110987853B (en) 2019-11-08 2019-11-08 Method and device for predicting soil lead pollution degree based on terahertz spectrum

Country Status (1)

Country Link
CN (1) CN110987853B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639763B (en) * 2020-06-03 2023-08-11 三一重机有限公司 Detection model training method, detection method and device for pollution degree of hydraulic oil

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104062255A (en) * 2014-06-04 2014-09-24 北京农业智能装备技术研究中心 Method and device for detecting heavy metal content in soil based on sample box method
CN104502288A (en) * 2014-11-26 2015-04-08 西安科技大学 Soil lead content measurement method using visible and near-infrared spectroscopy technology
CN106569430A (en) * 2016-11-16 2017-04-19 江苏智石科技有限公司 Soil heavy metal pollution detection and control system
CN108535200A (en) * 2018-01-23 2018-09-14 江苏大学 The detection device and method of the leaf vegetables blade heavy metal cadmium of spectral technique are merged based on visible light, Terahertz
CN109061110A (en) * 2018-09-07 2018-12-21 中山大学 A kind of quantitative forecasting technique of soil acidification to Nutrient availability
CN109374569A (en) * 2018-09-27 2019-02-22 华东交通大学 A kind of detection method of the testing melamine content in milk powder based on tera-hertz spectra

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140012504A1 (en) * 2012-06-14 2014-01-09 Ramot At Tel-Aviv University Ltd. Quantitative assessment of soil contaminants, particularly hydrocarbons, using reflectance spectroscopy
US10564316B2 (en) * 2014-09-12 2020-02-18 The Climate Corporation Forecasting national crop yield during the growing season

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104062255A (en) * 2014-06-04 2014-09-24 北京农业智能装备技术研究中心 Method and device for detecting heavy metal content in soil based on sample box method
CN104502288A (en) * 2014-11-26 2015-04-08 西安科技大学 Soil lead content measurement method using visible and near-infrared spectroscopy technology
CN106569430A (en) * 2016-11-16 2017-04-19 江苏智石科技有限公司 Soil heavy metal pollution detection and control system
CN108535200A (en) * 2018-01-23 2018-09-14 江苏大学 The detection device and method of the leaf vegetables blade heavy metal cadmium of spectral technique are merged based on visible light, Terahertz
CN109061110A (en) * 2018-09-07 2018-12-21 中山大学 A kind of quantitative forecasting technique of soil acidification to Nutrient availability
CN109374569A (en) * 2018-09-27 2019-02-22 华东交通大学 A kind of detection method of the testing melamine content in milk powder based on tera-hertz spectra

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Research on heavy metal ions detection in soil with terahertz time-domain spectroscopy;Bin Li et al.;《International Symposium on Photoelectronic Detection and Imaging 2011 - Terahertz Wave Technologies and Applications》;20110811;第81951V-1-81951V-8页 *
Study on Form Distribution and Correlation of Heavy Metals in the Sediment of Urban Water;Xu Ying et al.;《2010 4th International Conference on Bioinformatics and Biomedical Engineering》;20100723;第3页 *
土壤pH值对重金属形态的影响及其相关性研究;杨秀敏 等;《中国矿业》;20170630;第26卷(第6期);摘要 *
土壤成分与特性参数光谱快速检测方法及传感技术;李民赞等;《农业机械学报》;20130325(第03期);第73-87页 *
基于太赫兹光谱的土壤重金属铅含量检测初步研究;李斌 等;《农业机械学报》;20161031;第47卷;第292-295页 *
基于特征波长选择和建模的高光谱土壤总氮含量估测方法研究;王文才等;《浙江农业学报》;20180925(第09期);第1576-1584页 *

Also Published As

Publication number Publication date
CN110987853A (en) 2020-04-10

Similar Documents

Publication Publication Date Title
Hong et al. Application of fractional-order derivative in the quantitative estimation of soil organic matter content through visible and near-infrared spectroscopy
Wang et al. Recent advances of chemometric calibration methods in modern spectroscopy: Algorithms, strategy, and related issues
Xu et al. Comparison of multivariate methods for estimating selected soil properties from intact soil cores of paddy fields by Vis–NIR spectroscopy
Zhu et al. Rapid on-site identification of pesticide residues in tea by one-dimensional convolutional neural network coupled with surface-enhanced Raman scattering
ZHANG et al. Progress of chemometrics in laser-induced breakdown spectroscopy analysis
Zhang et al. Soil nitrogen content forecasting based on real-time NIR spectroscopy
US8452716B2 (en) Kernel-based method and apparatus for classifying materials or chemicals and for quantifying the properties of materials or chemicals in mixtures using spectroscopic data
Pontes et al. Classification of Brazilian soils by using LIBS and variable selection in the wavelet domain
Haghi et al. Prediction of various soil properties for a national spatial dataset of Scottish soils based on four different chemometric approaches: A comparison of near infrared and mid-infrared spectroscopy
Martyna et al. Improving discrimination of Raman spectra by optimising preprocessing strategies on the basis of the ability to refine the relationship between variance components
Song et al. Chlorophyll content estimation based on cascade spectral optimizations of interval and wavelength characteristics
Hong et al. Fusion of visible-to-near-infrared and mid-infrared spectroscopy to estimate soil organic carbon
Xiao et al. Spectral preprocessing combined with deep transfer learning to evaluate chlorophyll content in cotton leaves
CN110867221B (en) Method and device for determining soil lead content prediction model based on terahertz spectrum
CN110987853B (en) Method and device for predicting soil lead pollution degree based on terahertz spectrum
CN114813709B (en) Soil component detection method, equipment and system
Khaled et al. A comparative study on dimensionality reduction of dielectric spectral data for the classification of basal stem rot (BSR) disease in oil palm
Pan et al. Method for classifying a noisy Raman spectrum based on a wavelet transform and a deep neural network
Cocchi et al. Multicomponent analysis of electrochemical signals in the wavelet domain
Agustika et al. Fourier transform infrared spectrum pre-processing technique selection for detecting PYLCV-infected chilli plants
Gao et al. Diagnosis of maize chlorophyll content based on hybrid preprocessing and wavelengths optimization
Li et al. Au-Ag OHCs-based SERS sensor coupled with deep learning CNN algorithm to quantify thiram and pymetrozine in tea
O'connell et al. Classification of a target analyte in solid mixtures using principal component analysis, support vector machines, and Raman spectroscopy
Asharindavida et al. Evaluation of olive oil quality using a miniature spectrometer: A machine learning approach
Ong et al. New approach for sugarcane disease recognition through visible and near-infrared spectroscopy and a modified wavelength selection method using machine learning models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant