CN115116558A

CN115116558A - Dynamic prediction method for heavy metal content

Info

Publication number: CN115116558A
Application number: CN202210798958.4A
Authority: CN
Inventors: 安毅; 霍莉莉; 武丽娜; 刘春湘; 丁健; 林大松; 杜兆林; 秦莉; 姚彦坡
Original assignee: Agro Environmental Protection Institute Ministry of Agriculture
Current assignee: Agro Environmental Protection Institute Ministry of Agriculture
Priority date: 2022-07-06
Filing date: 2022-07-06
Publication date: 2022-09-27
Anticipated expiration: 2042-07-06
Also published as: CN115116558B

Abstract

The invention relates to a dynamic prediction method for heavy metal content, which combines the technical means of similarity calculation, cluster analysis, interpolation operation, hyperspectral inversion, error iteration and the like to acquire the monitoring data of a research area and predicts the heavy metal content and the spatial distribution information in soil in real time, quickly and efficiently by four steps of producing area environment database establishment, area division, remote sensing model construction and error analysis.

Description

Dynamic prediction method for heavy metal content

Technical Field

The invention belongs to the technical field of agricultural environment, and particularly relates to a dynamic prediction method for heavy metal content.

Background

The problem of soil environmental pollution is severe day by day, the quality safety of agricultural products and the health of human bodies are influenced, the soil environmental monitoring is comprehensive, real-time and accurate, the important practical significance is realized on the guarantee of the quality of the agricultural products and the quality of the soil, and the accurate and effective acquisition of the soil pollution distribution condition is the basis for developing the soil pollution prevention and control work. Most of the existing methods for acquiring soil environment monitoring data are fixed-point sampling and point source observation, and partial research combines methods such as land statistical interpolation and hyperspectral remote sensing to indirectly acquire the soil monitoring data.

The main problems of the above technical methods are: (1) the monitoring technology is single, the data acquisition is difficult, the time consumption is long, and the energy investment is large; (2) the accuracy of the soil monitoring data prediction method is low; (3) the existing hyperspectral remote sensing inversion model is low in applicability and large in error between monitored data and inverted data.

Disclosure of Invention

According to the method, through four steps of establishing a producing area environment database, dividing regions, constructing a remote sensing model and analyzing errors, technical means such as similarity calculation, cluster analysis, interpolation operation, hyperspectral inversion and error iteration are integrated, monitoring data of a research region are jointly obtained through a multi-layer multi-method, and heavy metal content and spatial distribution information in soil are predicted quickly and efficiently in real time.

In order to solve the technical problem, the invention discloses a dynamic prediction method of heavy metal content, which comprises the following steps:

(1) place of origin environment database establishment

(1.1) collecting historical monitoring information of a research area, wherein the historical monitoring information comprises soil monitoring data and environmental data;

the soil monitoring data includes two types: research indexes and corresponding detection data of historical soil point locations, other indexes and corresponding detection data;

(1.2) collecting historical remote sensing information of research area

(1.3) Collection of homologous indicator information

The homologous index is an index of which the content changes along with the change of activation, accumulation or migration of research indexes and the like in soil and can acquire remote sensing spectral information;

the homology indexes include: na, Ca, Mg, Zn, Pb, Fe ₂ O ₃ 、MnO、K ₂ O、SiO ₂ Etc.;

(2) region partitioning

(2.1) grid layout

Dividing the research area into squares to ensure that historical soil points exist in all the squares;

when the area of the farmland in the grid is smaller than 1/10 of the area of the grid, the grid can be merged with the grid where the adjacent farmland is located;

the squares are consistent in size and connected with each other;

(2.2) soil similarity solution

(2.2.1) data preprocessing

Preprocessing soil monitoring data of historical soil point locations in a research area, wherein the preprocessing comprises two steps of logarithm processing and normalization processing:

the calculation formula of the logarithmic processing is specifically as follows:

c’＝log ₁₀ ^(c)

wherein c is the detection data of research indexes and other indexes;

b, carrying out normalization processing on the processed data by adopting a z-score method to obtain a normalized monitoring value, wherein a calculation formula is as follows:

wherein mu is the average value of the detection data of the research index and other indexes after logarithmic processing, delta is the standard deviation of the detection data of the research index and other indexes after logarithmic processing, and the calculation formula is

N is the number of point positions;

(2.2.2) matrix transformation

Performing matrix conversion operation on the normalized monitoring values of all historical soil point positions in a research area to obtain a score coefficient matrix, wherein the behavior research indexes and other index score coefficients of the score coefficient matrix are listed as main component categories;

obtaining a matrix relation formula of the research index j and other indexes i under any main component E category according to the score coefficient matrix as follows:

E＝m1I1+m2I2+…+miIi+mjIj

wherein: mi is a score coefficient corresponding to other indexes i in the score coefficient matrix, mj is a score coefficient corresponding to a research index j in the score coefficient matrix, Ii is a score coefficient corresponding to other indexes i and only indicates no value, and Ij is a research index j and only indicates no value;

(2.2.3) similarity solving

The corresponding relation between the research index j and other indexes i is shown as follows, the global weight wj is solved through a least square method, and the relation is as follows:

Ej＝Σyimi+yjmj

yj＝w1E1+w2E2+…+wiEi+wjEj

wherein yi is the normalized monitoring value of other index i, and yj is the normalized monitoring value of the research index j;

b, respectively calculating the grid weight vj of each grid by adopting a least square method ⁿ Wherein n is a grid serial number;

respectively constructing a matrix according to the global weight and the square grid weight, and further solving the soil similarity between each square grid and the research area, wherein the concrete formula is as follows:

where cos θ j ⁿ The soil similarity of the nth grid and a research area about a research index j is shown, w is a global weight matrix, and v is a grid weight matrix;

(2.3) sub-region delineation

Clustering the soil similarity of each grid, and dividing the research area into sub-areas with different soil characteristics;

(3) remote sensing model construction

The remote sensing model building methods of all the subareas of the research area are consistent, and the subarea remote sensing model building method comprises the following steps:

(3.1) combing the homologous index information and the remote sensing information in the producing area environment database, determining the homologous index information related to the subregion, simultaneously extracting the historical remote sensing spectrum of the subregion in the producing area environment database, performing spectrum preprocessing, and acquiring inversion data corresponding to historical soil point locations;

(3.2) remote sensing model

(3.2.1) establishing a monitoring value model by taking the detection data of the research index j as a dependent variable and the detection data of the homologous index as an independent variable, and screening out the optimal monitoring value model, namely a judgment coefficient R ² A maximum monitored value model;

(3.2.2) respectively establishing spectral models by taking the detection data of the homologous indexes as dependent variables and the hyperspectral reflectivities corresponding to the homologous indexes as independent variables, and respectively screening out the spectral models with the best homologous indexes, namely the judgment coefficients R ² A maximum spectral model;

(3.2.3) combining the two models in the step (3.2.1) and the step (3.2.2) to finally obtain a remote sensing model of the research index j;

the fitting method of the monitoring value model and the spectrum model comprises least square, multiple linearity, stepwise regression, power/exponential function, decision tree, neural network and the like;

the remote sensing model independent variables are B1, B2.. Ba and the like, wherein a corresponds to the reflectivity extracted from the a-th wave band of the remote sensing spectrum;

(3.3) Point location prediction error calculation

Respectively extracting research index detection data and remote sensing model result data of all historical soil point locations in the sub-area, namely inversion data, and calculating point location prediction errors Dj of all historical soil point locations by adopting a relative error method, wherein the calculation formula is as follows:

wherein cj is detection data of a research index j, and Xj is inversion data of the research index j;

(3.4) error interpolation

Performing spatial interpolation operation on the prediction errors of all historical soil point locations in the sub-region to obtain a sub-region prediction error distribution map;

(4) error analysis

(4.1) calculating a first round of area prediction error (Aj) based on point location prediction errors of all historical soil point locations of the sub-area ¹ ) The calculation formula is as follows:

wherein k is the number of calculation rounds of the area prediction errors, and Avj is the mean value of the point location prediction errors in the sub-area;

(4.2) determining the prediction error range of the research index j according to the prediction error distribution map, taking whether the prediction error range exceeds the R% of the region prediction error as a region prediction error classification value, dividing the sub-region into an error conforming region and an error non-conforming region, directly executing the step (4.4) in the error conforming region, and executing the step (4.3) in the error non-conforming region in sequence;

(4.3) aiming at the error inconsistent area, repeating the steps (4.1) - (4.4) established by the remote sensing model and the steps (4.1) - (4.2) of error analysis until all areas are error consistent areas;

(4) and dynamically predicting the content distribution diagram of the research indexes in the research region based on the remote sensing model of each sub-region and by combining the remote sensing spectrum.

Further, the research index is one or more of Cd, Hg, As, Pb and Cr; other indicators include Cu, Zn, Ni, Ca, Na, Mg, N, P, K, Fe, Al, Si, Se, Ti, B, Mn, Co, Sr, Mo, V, pH, SOM, CEC, etc.;

further, the environment data includes: regional area, administrative boundary, altitude, point location information, sampling time, peripheral pollution sources, water source information, fertilization conditions and the like;

furthermore, by means of looking up documents, books, experimental reports, meeting records and the like, information such as a spectrum data source, a remote sensing spectrum, a spectrum sensitive waveband, a spectrum processing method, reflectivity, an inversion model, inversion accuracy and the like related to research indexes in a research area is determined;

the spectral data source is satellite remote sensing, unmanned aerial vehicle scanning, laboratory detection and the like;

further, the research index comprises one or more homologous indexes, and the homologous indexes of different research indexes can be the same;

further, grid division is carried out according to the characteristics of a research area, wherein the characteristics of the research area comprise area, farmland type, surrounding ground objects and the like;

furthermore, the number of the squares is manually set according to the research requirements;

further, the clustering includes K-means clustering, mean shift clustering, density-based clustering, maximum expected clustering, agglomerative hierarchy clustering, and the like;

further, the spectrum preprocessing step comprises remote sensing noise removal, image correction, spectrum transformation, atmospheric correction, image cutting and the like; the influence correction comprises radiation correction and geometric correction;

further, the inversion data comprises information of homologous indexes, sensitive wave bands and reflectivity;

further, error analysis methods of all sub-areas of the research area are consistent;

further, the R% is set manually according to research requirements, and if the value range is 0-200%;

the dynamic prediction method for the heavy metal content has the following advantages:

1. the invention comprehensively utilizes the database information and the remote sensing spectrum information, reduces the difficulty in acquiring the monitoring point position, improves the monitoring efficiency, provides a convenient and efficient monitoring method and a faster monitoring data acquisition method, and saves about 20% of capital investment and 35% of time cost;

2. according to the method, a small area is split firstly, and then a large area is fused, an iterative updating error inconsistent area is combined, a research area is reasonably and accurately divided according to soil characteristics, a homologous index and an inversion model are selected preferentially, and the applicability and the accuracy of a prediction result of the prediction method are enhanced.

Drawings

FIG. 1 is a technical flow chart of a dynamic prediction method of heavy metal content;

FIG. 2 is a distribution diagram of Q county grid divisions;

FIG. 3 is a square soil similarity distribution graph;

FIG. 4 is a sub-region distribution map;

FIG. 5 is a first round of region division result distribution diagram of region 1;

FIG. 6 is a Cd content distribution diagram;

Detailed Description

The present invention is further described in detail below with reference to examples so that those skilled in the art can practice the invention with reference to the description.

It will be understood that terms such as "having," "including," and "comprising," when used herein, do not preclude the presence or addition of one or more other elements or groups thereof.

Example 1

1. Place of origin environment database establishment

(1) Establishing a production area environment database: selecting farmlands in Q county as research objects, and collecting environmental information and historical soil monitoring data of the area: selecting Cd As a research index, wherein other indexes comprise 21 items in total of Hg, As, Pb, Cr, pH, CEC, SOM, K, Cu, Zn, Ni, Al, Si, Fe, Mn, Ca, Mg, Ce, La, Mo and Na;

(2) acquiring historical remote sensing information about Q county by consulting documents, experimental reports and other methods: the remote sensing spectrum mainly comes from a Landsat 8OLI image (2016, 9 months, cloud content of 0.81%), and the image shooting time is consistent with the sampling time of historical soil monitoring point positions;

(3) using Cd as a research index, collecting related data to determine homology indexes related to Cd, including Zn, Mg, Fe, Ca, Pb and K;

2. region division

(1) The terrain of Q county is relatively flat, no large flow of rivers flows through, the paddy field is mainly used, and single-season rice is planted; dividing the grids according to the specification of 800 × 800m according to county characteristics, preliminarily dividing the grids into 9 grids, wherein one grid has no historical soil points, and the farmland area in one grid is smaller than 1/10 of the grid, so that 7 grids are finally reserved, and the dividing result is shown in fig. 2;

(2) pre-processing data of detection data of the soil Cd index and other indexes in the Q county of 2015 in a production area environment database, and calculating the soil similarity of all grids according to a global weight formula, a grid weight formula and a soil similarity formula, wherein the result is shown in FIG. 3;

(3) adopting a K-means clustering method to perform clustering analysis on all the squares, and dividing the squares into 3 sub-regions in total, as shown in figure 4;

3. remote sensing model construction

(1) Taking subregion 1 as an example, combing the remote sensing information and homologous indexes related to the subregion in the production area environment database, including Ca, Mg, Zn and Fe ₂ O ₃ Four items; based on four homologous indexes and Cd detection data, selecting various methods for model fitting, and finally screening out a judgment coefficient R ² Maximum monitoring value model: cd ═ 0.121+0.02Zn-0.0003Mg-0.019Fe-0.0003 Ca;

(2) carrying out noise removal and atmospheric correction on the remote sensing data in the production place environment database, carrying out pretreatment such as radiometric calibration on the remote sensing spectrum by adopting envi software, and extracting the reflectivity of all historical soil point locations; performing correlation analysis on the detection data of the homologous indexes and the reflectivity of 7 wave bands by adopting the sps, respectively selecting the wave bands with the strongest correlation, and performing model fitting to determine an optimal spectrum model of the four homologous indexes;

(3) a monitoring value model and a spectrum model of four homologous indexes are combined, namely a remote sensing model of a subregion 1, wherein the remote sensing model is specifically a remote sensing model of Cd being 0.6+0.03ln (B5) +0.002ln (B6) +0.0002(B6), and B5 and B6 respectively represent the reflectivity of a fifth waveband (near infrared waveband) and a sixth waveband (short wave infrared 1) extracted according to Landsat 8OLI images;

(4) calculating point location prediction errors of all historical soil point locations in the sub-area 1 according to a point location prediction error formula, and performing common kriging interpolation through Arcgis to obtain a prediction error distribution diagram of the sub-area 1, wherein the prediction error range is 0.1-3.7;

(5) calculating a first round of regional prediction error according to a regional prediction error calculation formula, dividing regions by taking 95% of the regional prediction error as a condition, and dividing the regions into two types, namely error conforming regions and error non-conforming regions, as shown in fig. 5;

(6) aiming at the screened error inconsistent regions, repeating the steps until all the regions are error consistent regions, and iterating for 6 times;

4. error analysis

(1) Obtaining prediction error distribution maps of the region 2 and the region 3 by adopting the same steps, calculating region prediction errors of all sub-regions according to a first round of region prediction error formula, and dividing the Q county into two types of regions with error conforming regions and regions with error non-conforming regions by not more than 95% of the prediction errors;

(2) performing region refinement by adopting the same method aiming at regions with inconsistent errors until the region prediction errors of all the regions are within 95 percent of the prediction error, wherein a region 4 is refined in two rounds, and a region 3 is refined in 2 rounds;

(3) obtaining Landsat 8OLI image of Q county in 2022, performing spectrum pretreatment by adopting the related method, and obtaining a Cd content distribution diagram by combining remote sensing models of various areas, wherein the diagram is shown in FIG. 6.

While embodiments of the invention have been described above, it is not limited to the applications set forth in the description and the embodiments, which are fully applicable to various fields of endeavor for which the invention may be embodied with additional modifications as would be readily apparent to those skilled in the art, and the invention is therefore not limited to the details given herein and to the embodiments shown and described without departing from the generic concept as defined by the claims and their equivalents.

Claims

1. The dynamic prediction method for the heavy metal content is characterized by comprising the following steps:

(1) place of origin environment database establishment

(1.2) collecting historical remote sensing information of research area

(1.3) Collection of homologous indicator information

The homologous indexes refer to indexes of which the content changes along with the activation, accumulation or migration change of research indexes in soil and can acquire remote sensing spectral information;

the homology indexes include: na, Ca, Mg, Zn, Pb, Fe ₂ O ₃ 、MnO、K ₂ O、SiO ₂ ；

(2) Region partitioning

(2.1) grid layout

Dividing the research area into squares to ensure that historical soil point locations exist in all the squares;

the squares are consistent in size and are connected with each other;

(2.2) soil similarity solution

(2.2.1) data preprocessing

c’＝log ₁₀ ^(c)

wherein c is the detection data of research indexes and other indexes;

N is the number of point positions;

(2.2.2) matrix transformation

E＝m1I1+m2I2+…+miIi+mjIj

wherein: mi is a score coefficient corresponding to other indexes i in the score coefficient matrix, mj is a score coefficient corresponding to a research index j in the score coefficient matrix, Ii is other indexes i and only indicates no value, and Ij is a research index j and only indicates no value;

(2.2.3) similarity solving

The corresponding relation between the research index j and other indexes i is as follows, the global weight wj is solved by a least square method, and the relation is as follows:

Ej＝∑yimi+yjmj

yj＝w1E1+w2E2+…+wiEi+wjEj

(2.3) defining sub-regions

(3) remote sensing model construction

(3.2) remote sensing model

(3.2.2) respectively establishing spectral models by taking the detection data of the homologous indexes as dependent variables and the hyperspectral reflectivities corresponding to the homologous indexes as independent variables, and respectively screening out the spectral models with the best homologous indexesI.e. determining the coefficient R ² A maximum spectral model;

the fitting method of the monitoring value model and the spectrum model comprises least square, multiple linearity, stepwise regression, power/exponential function, decision tree and neural network;

the remote sensing model independent variables are B1 and B2.. Ba, wherein a corresponds to the reflectivity extracted from the a-th wave band of the remote sensing spectrum;

(3.3) Point location prediction error calculation

(3.4) error interpolation

(4) error analysis

2. The dynamic prediction method of heavy metal content according to claim 1, characterized in that: the research index is one or more of Cd, Hg, As, Pb and Cr; other indicators include Cu, Zn, Ni, Ca, Na, Mg, N, P, K, Fe, Al, Si, Se, Ti, B, Mn, Co, Sr, Mo, V, pH, SOM, CEC.

3. The dynamic prediction method of heavy metal content according to claim 1, characterized in that: the environmental data includes: regional area, administrative boundary, altitude, point location information, sampling time, peripheral pollution sources, water source information, and fertilization conditions.

4. The dynamic prediction method of heavy metal content according to claim 1, characterized in that: determining a spectral data source, a remote sensing spectrum, a spectrum sensitive waveband, a spectrum processing method, a reflectivity, an inversion model and an inversion accuracy rate related to research indexes in a research area by looking up documents, books, experimental reports and meeting records;

the spectral data source is satellite remote sensing, unmanned aerial vehicle scanning and laboratory detection.

5. The dynamic prediction method for heavy metal content according to claim 1, characterized in that: the one research index includes one or more homologous indexes, and the homologous indexes of different research indexes can be the same.

6. The dynamic prediction method of heavy metal content according to claim 1, characterized in that: and carrying out grid division according to the characteristics of a research area, wherein the characteristics of the research area comprise area, farmland type and peripheral ground features.

7. The dynamic prediction method of heavy metal content according to claim 1, characterized in that: the number of the square grids is manually set according to the research requirement.

8. The dynamic prediction method of heavy metal content according to claim 1, characterized in that: the clustering includes K-means clustering, mean shift clustering, density-based clustering, maximum expected clustering, and agglomerative hierarchical clustering.

9. The dynamic prediction method of heavy metal content according to claim 1, characterized in that: the spectrum preprocessing step comprises remote sensing noise removal, image correction, spectrum transformation, atmosphere correction and image cutting; the influence correction comprises radiation correction and geometric correction.

10. The dynamic prediction method of heavy metal content according to claim 1, characterized in that: the value range of the R% is 0-200%.