WO2021184550A1 - 一种基于土壤传递函数的目标土壤性质含量预测方法 - Google Patents
一种基于土壤传递函数的目标土壤性质含量预测方法 Download PDFInfo
- Publication number
- WO2021184550A1 WO2021184550A1 PCT/CN2020/093267 CN2020093267W WO2021184550A1 WO 2021184550 A1 WO2021184550 A1 WO 2021184550A1 CN 2020093267 W CN2020093267 W CN 2020093267W WO 2021184550 A1 WO2021184550 A1 WO 2021184550A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- soil
- variable
- sampling point
- data value
- variables
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/30—Prediction of properties of chemical compounds, compositions or mixtures
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/24—Earth materials
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/70—Machine learning, data mining or chemometrics
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Definitions
- the invention relates to a method for predicting the content of target soil properties based on a soil transfer function, and belongs to the technical field of soil measurement.
- Phosphorus in the soil is one of the three main nutrients necessary for plant growth.
- the soil phosphorus content of most agricultural ecosystems in my country is lower than the demand for plants, which has also led to the increase in the use of phosphate fertilizers in my country in the past 30 years.
- Unreasonable excessive application of phosphate fertilizer directly leads to the accumulation of a large amount of phosphate in the soil, which significantly affects the seasonal utilization rate of phosphate fertilizer application (about 10%-20%), resulting in the loss and waste of phosphate fertilizer resources.
- the large accumulation of phosphorus in farmland soil also directly causes serious environmental problems, which are mainly reflected in the eutrophication of water bodies caused by the migration of phosphorus with water bodies. Therefore, regular monitoring of soil properties and content is of great significance.
- Soil databases created by different application departments generally only involve basic soil physical and chemical properties, and rarely involve soil trace element content and heavy metal content data.
- the WoSIS soil database created by Wageningen University and the United Nations Food and Agriculture Organization The HWSD (Harmonized World Soil Database v1.2) database only contains the basic physical and chemical properties of soil organic matter, pH, texture, nitrogen, phosphorus, and potassium.
- the available soil effective copper content measurement methods include: atomic absorption spectrometer, DTPA-TEA extraction and atomic absorption spectrophotometer, etc.; available soil total phosphorus content measurement methods include: high temperature burning acid extraction method, strong acid digestion Method, alkali fusion method, continuous flow analyzer, etc.
- the conventional method for obtaining soil property content information is field sample collection and indoor chemical analysis and testing. This method has high accuracy, but it is time-consuming and laborious, and it is difficult to obtain the spatial distribution information of regional soil property content.
- soil transfer function (PTF)
- the principle of the soil transfer function is based on the correlation between the physical and chemical properties of the soil, and the update of missing data is achieved by constructing a prediction model for the measured and unmeasured soil properties.
- Commonly used soil transfer function models mainly include statistical regression models, artificial neural networks, and physical experience models. Among them, the statistical regression model is a research model often used by application departments. It has the advantages of easy implementation, high prediction accuracy, and high degree of variable interpretation in specific applications.
- predicting the content of soil properties based on the soil transfer function has certain limitations in prediction and evaluation techniques, including:
- the acquired soil transfer function can only predict soil information at the sample point scale, and cannot be extended to the regional scale, forming a soil map that can serve more application sectors. Since the input data of the traditional soil transfer function is laboratory-analyzed soil physical and chemical property data, the constructed function only represents the relationship between the measured soil property content and other physical and chemical property data. The accuracy of the soil map in the covered area cannot reach the accuracy of laboratory analysis, so these soil transfer functions cannot be directly applied to the historical soil map to produce the spatial distribution map of soil properties.
- the technical problem to be solved by the present invention is to provide a method for predicting the content of target soil properties based on a soil transfer function, which adopts a new design framework to make up for the shortcomings of the existing technology, can efficiently achieve accurate prediction of the content of target soil properties, and improve work efficiency .
- the present invention designs a target soil property content prediction method based on a soil transfer function, which is used to achieve the prediction of the target soil property content in the target area, including the following steps:
- Step A Based on the existing soil data, select each sampling point location in the target area that meets the data value corresponding to the preset physical and chemical properties of the soil to be non-empty, and use the smallest circumscribed polygon of each sampling point location to form a The location of each sampling point is used as the location of each sampling point corresponding to the first-level area.
- the physical and chemical properties of each soil are preset to include the target soil properties.
- the target soil properties are defined as soil dependent variables.
- the physical and chemical properties of the soil constitute a set of soil independent variables, and then proceed to step B;
- Step B Obtain a layer covering the first-level area and each specified environmental variable related to the soil dependent variable, and extract the data value of each specified environmental variable corresponding to the sampling point location for each sampling point location corresponding to the first-level area, and Add the specified environmental variables to the soil independent variable set to update the soil independent variable set, and then go to step C;
- Step C Delete the respective variables that cause multicollinearity in the soil independent variable set and the respective variables whose correlation with the soil dependent variable is lower than the preset significant difference threshold of the correlation to achieve the update of the soil independent variable set, and then proceed to step D;
- Step D For each sampling point location corresponding to the first-level region, a stepwise multiple linear regression model is adopted, and based on the preset number of iterations, the data value of the soil dependent variable at the sampling point location and the data value of each soil independent variable in the soil independent variable set are trained The linear relationship between each iteration is obtained, and the temporary optimal independent variable sets selected in each iteration of the training are obtained, and the number of times that different temporary optimal independent variable sets are selected respectively; after the training of the preset number of iterations is completed, they will be selected The temporary optimal independent variable set with the highest frequency is used as the optimal independent variable set corresponding to the first-level area, and step E is entered;
- Step E Obtain the divided layer of the soil area with the preset attribute covering the first-level area, and extract the soil division area under the preset attribute where each sampling point location is located for each sampling point location corresponding to the first-level area, Based on the data value of the soil dependent variable at each sampling point location, analyze and obtain the difference result of the corresponding soil dependent variable between the different soil division areas.
- step G If the difference result is not greater than the preset significant difference threshold, enter the step G; If there is a difference result that is greater than the preset significant difference threshold in the difference results, merge the different soil division areas where the difference result is not greater than the preset significant difference threshold value, and combine the uncombined soil divisions Regions, each secondary region is formed, and based on the location of each sampling point corresponding to the primary region, the location of each sampling point corresponding to each secondary region is obtained, and then step F is entered;
- Step F For each secondary region, use the method of step D to obtain the optimal set of independent variables corresponding to each secondary region, and proceed to step G;
- Step G For each sampling point location corresponding to the first-level region, train the linear regression model and the nonlinear regression model between the soil dependent variable data value and the respective variable data value in the corresponding optimal independent variable set, and obtain the linear regression
- the determination coefficient of the model and the determination coefficient of the nonlinear regression model that is, the determination coefficient R_OLS of the linear regression model corresponding to the first-level region, and the determination coefficient R_NLS of the nonlinear regression model
- step H If there is no secondary region, go directly to step H; if there is a secondary region, then for each secondary region, train the soil dependent variable data value corresponding to each sampling point location and the respective variable data in the corresponding optimal independent variable set
- the linear regression model and the nonlinear regression model between the values, and the determination coefficient of the linear regression model and the determination coefficient of the nonlinear regression model are obtained, and then the determination coefficient of the linear regression model and the non-linear regression model corresponding to each secondary region are obtained.
- the linear regression model determines the coefficients, and further obtains the mean value R_OLS_mean of the linear regression model determination coefficients corresponding to all secondary regions and the mean value R_NLS_mean of the nonlinear regression model determination coefficients; then enter step H;
- Step H If there is no secondary area, go to step I;
- R_OLS_mean is greater than R_OLS, R_NLS, or R_NLS_mean is greater than R_OLS, R_NLS, go to step M;
- Step I Based on the location of each sampling point corresponding to the first-level area and the corresponding optimal set of independent variables, according to the location of each sampling point, the data values corresponding to the physical and chemical properties of the soil in the optimal set of independent variables are obtained to obtain The physical and chemical properties of each soil in the optimal independent variable set are respectively based on the prediction model of all the specified environmental variables in step B; then the physical and chemical properties of each soil in the optimal independent variable set are obtained by combining the specified environmental variable layers in step B Respectively correspond to the spatial distribution prediction layer of the first-level area, and then go to step J;
- Step J Combine the spatial distribution prediction layer of each soil physical and chemical property in the optimal independent variable set corresponding to the first-level region and the first-level region corresponding to each environmental variable in the optimal independent variable set to form one The optimal independent variable layer set corresponding to the first-level region, and then go to step K;
- Step K If R_OLS ⁇ R_NLS, for each sampling point location corresponding to the first-level area, extract the respective variable data value from the optimal independent variable layer set corresponding to the first-level area, and train it and the soil dependent variable data value
- the linear regression model between constitutes the first-level regional prediction model, and enters step L;
- Step L According to the optimal independent variable layer set corresponding to the first-level area, apply the first-level regional prediction model to obtain the spatial distribution map of the soil dependent variable, that is, the spatial distribution map of the target soil properties in the target area, to achieve the target soil properties in the target area Content prediction;
- Step M For each secondary region, use the method from step I to step J to obtain the optimal independent variable layer set corresponding to each secondary region, and then go to step N;
- Step N If R_OLS_mean ⁇ R_NLS_mean, for each secondary region, for each sampling point location corresponding to the secondary region, extract the respective variable data values from the optimal independent variable layer set corresponding to the secondary region, and train The linear regression model between it and the data value of the soil dependent variable constitutes the second-level regional prediction model; then each second-level regional prediction model is obtained, and step O is entered;
- Step O For each secondary region, apply the secondary region prediction model according to the optimal independent variable layer set corresponding to the secondary region to obtain the spatial distribution map of the soil dependent variable in the secondary region; and then obtain each secondary region
- the spatial distribution map of the soil dependent variables in the region is combined to form the spatial distribution map of the target soil properties in the target region to achieve the prediction of the target soil property content in the target region.
- step H is as follows:
- Step H If there is no secondary area, go to step H-I;
- R_OLS_mean is greater than R_OLS, R_NLS, or R_NLS_mean is greater than R_OLS, R_NLS, go to step H-M;
- Step HI If R_OLS ⁇ R_NLS, apply the linear regression model of the first-level region in step G to perform the prediction and supplementation of the soil dependent variable data value for each sampling point location in the target region where the soil dependent variable data value is missing, and then proceed to step I ;
- step G apply the non-linear regression model of the first-level region in step G to predict and supplement the soil dependent variable data value for each sampling point location in the target region where the soil dependent variable data value is missing, and then proceed to step I;
- Step HM If R_OLS_mean ⁇ R_NLS_mean, apply the linear regression model of each secondary area in step G to perform the prediction and supplementation of the soil dependent variable data value for each sampling point location in the target area where the soil dependent variable data value is missing, and then enter Step M;
- step G If R_NLS_mean>R_OLS_mean, then apply the non-linear regression model of each secondary region in step G to perform the prediction and supplementation of the soil dependent variable data value for each sampling point location in the target region where the soil dependent variable data value is missing, and then go to step M .
- the step A includes the following steps:
- Step A1 By specifying the existing soil data in each data source, according to the preset sampling point positions in the target area, perform the data value collection operation of each preset soil physical and chemical properties including the target soil properties, and then enter Step A2;
- Step A2 Select each sampling point location that meets the requirement that the data values corresponding to the physical and chemical properties of each soil are non-empty, and use the smallest circumscribed polygon of each sampling point location to form a first-level area, and the sampling point location is used as a The position of each sampling point corresponding to the level area, and then go to step A3;
- Step A3. Define the target soil property as a soil dependent variable.
- the physical and chemical properties of the soil other than the target soil property constitute a set of soil independent variables, and then proceed to step B.
- the step B includes the following steps:
- Step B1 Obtain the layer covering the first-level area and the specified environmental variables related to the soil dependent variable, and then proceed to step B2;
- Step B2 Convert the layers of the specified environment variables into environment variable raster layers, where, if the environment variable contains at least one band, each band of the environment variable is converted to the corresponding environment variable raster layer. , And then go to step B3;
- Step B3. Use bilinear interpolation to resample all environment variable raster layers, unify the spatial resolution of the raster data to the preset spatial resolution, and then proceed to step B4;
- Step B4 Obtain the area corresponding to the first-level area on the raster layer of all environmental variables, and extract the data value of each specified environmental variable corresponding to the sampling point location for each sampling point location corresponding to the first-level area, and then go to step B5;
- Step B5. Add the specified environmental variables to the soil independent variable set to update the soil independent variable set, and then proceed to step C.
- the step C includes the following steps:
- Step C1 For each sampling point location corresponding to the first-level region, train a linear regression model between the data value of the soil dependent variable and the data value of the respective variable in the soil independent variable set, and obtain the coefficient of determination of the respective variable in the soil independent variable set Then go to step C2; Indicates the coefficient of determination of the k-th independent variable in the set of soil independent variables;
- Step C2 For each independent variable in the soil independent variable set, press The calculation result of, obtain the variance expansion coefficient of each independent variable, and then go to step C3;
- Step C3. Determine whether the variance expansion coefficient of each variable in the soil independent variable set is less than the preset coefficient threshold. If yes, proceed to step C4; otherwise, delete the independent variable with the largest variance expansion coefficient in the soil independent variable set, update the soil independent variable set, and Return to step C1;
- Step C4 For each sampling point location corresponding to the first-level region, calculate the correlation between the data value of the soil dependent variable and the data value of the respective variable in the soil independent variable set, and delete the correlation between the soil independent variable set below Preset each independent variable with a significant difference threshold of correlation, update the set of soil independent variables, and then go to step D.
- step D after the training of the preset number of iterations is completed, the temporary optimal independent variable set with the highest selected number is used as the optimal candidate to be selected corresponding to the first-level region.
- Variable collection then also includes the following steps:
- Step D1 Continue to use the stepwise multiple linear regression model, and continue to train the linear relationship between the soil dependent variable data value at the sampling point location and the soil independent variable data value in the soil independent variable set based on the preset number of iterations, and obtain each time separately Iterative training selects the temporary optimal independent variable set, and continues to count the number of times that different temporary optimal independent variable sets are selected respectively; after the training of the preset number of iterations is completed, the temporary optimal independent variable with the highest number of times will be selected.
- the variable set is used as the optimal set of independent variables to be selected corresponding to the first-level region, and then step D2 is entered;
- Step D2. Determine whether the two optimal independent variable sets to be selected corresponding to the newly obtained first-level region are consistent, if yes, use the optimal independent variable set to be selected as the optimal independent variable set corresponding to the first-level region, and enter Step E; otherwise, return to step D1.
- the step E includes the following steps:
- Step E1 Obtain the land-use layer and soil-forming parent material layer covering the first-level area, and extract the land-use division area and soil-forming parent material division of each sampling point location corresponding to each sampling point location in the first-level area Area, and then go to step E2;
- Step E2 Based on the soil dependent variable data value of each sampling point corresponding to the first-level area, using Duncan multiple comparison analysis method, analyze and obtain the difference results of the corresponding soil dependent variables between different land use divisions, and analyze the different results Corresponding to the difference result of the soil dependent variable between the soil parent material division regions, then go to step E3;
- Step E3 If the difference results of the corresponding soil dependent variables between different land use division areas and the difference results of the corresponding soil dependent variables between different soil parent material division areas are not greater than the preset significant difference threshold, proceed to step G; otherwise, go to step E4;
- Step E4 If there is a difference result greater than the preset significant difference threshold among the difference results of the corresponding soil dependent variables between the different land use division areas, and the difference results of the corresponding soil dependent variables between the different soil parent material division areas , Are not greater than the preset significant difference threshold, merge the different land use division areas where the difference result is not greater than the preset significant difference threshold to form each secondary area, and then go to step E7; otherwise, go to step E5;
- Step E5. If the difference results of the corresponding soil dependent variables between different soil-forming parent material division areas, there is a difference result greater than the preset significant difference threshold, and the difference results of the corresponding soil dependent variables between the different land use division areas , Are not greater than the preset significant difference threshold, then merge the different soil-forming parent material division regions in which the difference result is not greater than the preset significant difference threshold to form each secondary region, and then go to step E7; otherwise, go to step E6;
- Step E6 If the difference results of the corresponding soil dependent variables between different soil-forming parent material division areas and the difference results of the corresponding soil dependent variables between different land use division areas, there are differences greater than the preset significant difference threshold For the result of sex, merge the different soil-forming parent material division areas where the difference result is not greater than the preset significant difference threshold to form each secondary area, and for the different land use divisions where the difference result is not greater than the preset significant difference threshold The regions are merged to form each secondary region, and then go to step E7;
- Step E7 Combine each secondary area obtained by merging in the land use layer, and each unmerged land use division area, and each secondary area obtained by merging the soil parent material layer, and each unmerged soil parent material division area , Perform spatial superposition to obtain each secondary area, and obtain the location of each sampling point corresponding to each secondary area based on the location of each sampling point corresponding to the primary area, and then proceed to step F.
- the step G includes the following steps:
- Step G1 For each sampling point position corresponding to the first-level area, divide the sampling point position of the first preset proportion as the training sample, and the remaining sampling point positions as the verification sample, and then go to step G2, the first pre Set the ratio to be greater than 50%;
- Step G2 For each sampling point position in the training sample, train the linear regression model OLS between the data value of the soil dependent variable and the data value of the respective variable in the corresponding optimal independent variable set, and proceed to step G3;
- Step G3 According to the data value of each variable in the corresponding optimal independent variable set corresponding to each sampling point position in the verification sample, apply the linear regression model OLS to obtain the predicted data value of the soil dependent variable corresponding to each sampling point position in the verification sample, and Go to step G4;
- Step G4 Calculate the coefficient of determination between the data value of the soil dependent variable corresponding to the location of each sampling point in the verification sample and the predicted data value of the corresponding soil dependent variable, that is, the coefficient of determination R_OLS of the linear regression model corresponding to the first-level region, and then enter Step G5;
- Step G5. For each independent variable in the optimal set of independent variables corresponding to the first-level region, perform a simulation of preset each designated function for the data value of the soil dependent variable at each sampling point in the training sample and the data value of the corresponding independent variable. And select the function with the highest prediction accuracy as the nonlinear fitting method corresponding to the independent variable; then obtain the nonlinear fitting method corresponding to the respective variables in the optimal set of independent variables, and then proceed to step G6;
- Step G6 According to the nonlinear fitting method corresponding to the respective variables in the optimal independent variable set corresponding to the first-level region, use the nonlinear least squares method to train the soil dependent variable data value for each sampling point position in the training sample Non-linear regression model NLS between the data values of the respective variables in the corresponding optimal independent variable set, and go to step G7;
- Step G7 According to the data value of each variable in the corresponding optimal independent variable set corresponding to each sampling point position in the verification sample, apply the non-linear regression model NLS to obtain the predicted data value of the soil dependent variable corresponding to each sampling point position in the verification sample. And go to step G8;
- Step G8 Calculate the determination coefficient between the soil dependent variable data value corresponding to each sampling point position in the verification sample and the corresponding soil dependent variable predicted data value, that is, the nonlinear regression model determination coefficient R_NLS corresponding to the first-level region, and then Go to step G9;
- Step G10 For each secondary region, execute the method from step G1 to step G8 to obtain the linear regression model determination coefficient and the nonlinear regression model determination coefficient corresponding to each secondary region; and further obtain all the secondary regions The mean value R_OLS_mean of the corresponding linear regression model determination coefficient and the mean value R_NLS_mean of the nonlinear regression model determination coefficient, and then go to step H.
- the step I includes the following steps:
- Step I1 Based on the location of each sampling point corresponding to the first-level area and the corresponding optimal set of independent variables, for each soil physical and chemical property in the optimal set of independent variables, corresponding to the soil according to the location of each sampling point
- the physical and chemical property data values and the data values of the designated environmental variables in step B respectively correspond to the ten-fold cross-validation method to train for each designated prediction model, obtain each prediction model, and select the prediction model with the highest prediction accuracy as the
- the physical and chemical properties of the soil are based on the prediction model of all the specified environmental variables in step B; then the physical and chemical properties of the soil in the optimal set of independent variables are obtained respectively based on the prediction model of all the specified environmental variables in step B, and then step I2 is entered;
- Step I2 According to the optimal set of independent variables corresponding to the first-level area, the physical and chemical properties of each soil are respectively based on the prediction model of all the specified environmental variables in step B, and the layers of the specified environmental variables in step B are combined to obtain the optimal independent variable.
- Each soil physical and chemical property in the variable set corresponds to the spatial distribution prediction layer of the first-level region.
- the method for predicting the content of target soil properties based on the soil transfer function of the present invention has the following technical effects:
- the proposed soil transfer function at the sample point scale can make full use of the existing geographic element information to improve the problem of low prediction accuracy of the target soil property content , Can directly serve the national natural resources survey data monitoring supplement and update, and can also provide technical services for dynamic ecological models and data supplementation in the simulation of surface processes; especially the dynamic screening mechanism of environmental variables in the forecasting process, which revises the traditional
- the limitations of the prediction technology have realized the universal soil properties prediction technology of "limited resources, multi-source applications", which has broad industrial application prospects in agricultural applications, land and resources and other sectors;
- the proposed partitioning mechanism of the spatial heterogeneity of the target soil property content can more accurately measure the variables and prediction functions in the soil transfer function fitting process
- the technical procedures for dynamically selecting the optimal set of independent variables can not only quantify the uncertainty involved in the relevant production process, but also determine the optimal set of dependent variables required by the soil transfer function to the greatest extent, thereby significantly improving The universality and robustness of the present invention
- the target soil property content prediction method based on the soil transfer function designed by the present invention is different from the traditional sample-oriented soil transfer function.
- the proposed technical process covers the mapping mechanism of soil maps and determination of soil physical and chemical properties. It can improve the soil transfer function oriented to the sample point scale, optimize the compatibility of the function parameters with the soil map, and then scale the fitted function to different research areas to realize the production of the soil map at the coverage area scale; this technology makes full use of the existing With the technical advantages of geographic information system, it can provide more urgent soil map products for more application departments.
- Figure 1 is a flow chart of the steps of the method for predicting the content of target soil properties based on the soil transfer function designed by the present invention
- Figure 2 is a schematic diagram of the construction of the original soil data set and the core soil data set in the present invention
- Fig. 3 is a schematic diagram of the secondary area division including 2 types of land use in the present invention.
- Fig. 4 is a schematic diagram of the division of the secondary region including 3 types of soil-forming parent materials in the present invention.
- Fig. 5 is a schematic diagram of the secondary area division including 2 types of land use and 3 types of soil-forming parent material in the present invention
- Figure 6 is a schematic diagram of the grid environment variable layer and soil sample points in the present invention.
- FIG. 7 shows the spatial distribution of secondary area layers and sampling points based on two types of soil-forming parent materials in an embodiment of the present invention
- Fig. 8 is an elevation layer covering a first-level area in an embodiment of the present invention.
- Fig. 9 is an annual average rainfall layer covering a first-level area in an embodiment of the present invention.
- Figure 10 is a spatial distribution diagram of the effective zinc content in the optimal set of independent variables predicted to be generated in an embodiment of the present invention
- Fig. 11 is a spatial distribution diagram of predicted and generated soil dependent variables in an embodiment of the present invention.
- the present invention designs a target soil property content prediction method based on the soil transfer function.
- the basic idea is to divide the data set containing all the measurement information on the basis of collecting multi-source soil data sets and environmental variables, and according to the spatial difference of soil properties Qualitatively divide secondary regions, screen the optimal set of independent variables of soil transfer function in different regions, and then perform linear and non-linear soil transfer function fitting for different regions. Through the accuracy comparison of different partitions and different functions, the best sample-oriented soil transfer function is selected to improve the soil sample database.
- the present invention designs a target soil property content prediction method based on a soil transfer function, which is used to achieve the prediction of the target soil property content in the target area.
- a soil transfer function which is used to achieve the prediction of the target soil property content in the target area.
- Step A Based on the existing soil data, select each sampling point location in the target area that meets the data value corresponding to the preset physical and chemical properties of the soil to be non-empty, and use the smallest circumscribed polygon of each sampling point location to form a The location of each sampling point is used as the location of each sampling point corresponding to the first-level area.
- the physical and chemical properties of each soil are preset to include the target soil properties.
- the target soil properties are defined as soil dependent variables.
- the physical and chemical properties of the soil constitute a set of soil independent variables, and then proceed to step B.
- step A specifically executes the following steps A1 to A3.
- Step A1 By specifying the existing soil data in each data source, according to the preset sampling point positions in the target area, perform the data value collection operation of each preset soil physical and chemical properties including the target soil properties, such as the figure As shown in 2, where S_1, S_2, S_3, S_4, and S_5 are the physical and chemical properties of each soil, and then go to step A2.
- each preset soil physical and chemical properties select the data value of the same soil depth position, such as selecting the data value of the soil physical and chemical properties under the 0-1m soil profile, or select the soil physical and chemical properties under the 0-20cm soil profile The data value of the property.
- Step A2 Select each sampling point location that meets the requirement that the data values corresponding to the physical and chemical properties of each soil are non-empty, and use the smallest circumscribed polygon of each sampling point location to form a first-level area, and the sampling point location is used as a The position of each sampling point corresponding to the level area, and then go to step A3.
- Step A3. Define the target soil property as a soil dependent variable.
- the physical and chemical properties of the soil other than the target soil property constitute a set of soil independent variables, and then proceed to step B.
- Step B Obtain a layer covering the first-level area and each specified environmental variable related to the soil dependent variable, and extract the data value of each specified environmental variable corresponding to the sampling point location for each sampling point location corresponding to the first-level area, and Add the specified environmental variables to the soil independent variable set to update the soil independent variable set, and then go to step C.
- step B specifically executes the following steps B1 to B5.
- Step B1. Obtain the layer covering the first-level area and each specified environmental variable related to the soil dependent variable, and then proceed to step B2.
- These environmental variables have affected the formation and evolution of soil to a certain extent, such as the selected environmental variable layers shown in Table 1 below.
- the environment variable layer can be a vector Shapefile data format or a raster format (such as TIFF, Grid).
- Step B2. Convert the layers of the specified environment variables into environment variable raster layers, where, if the environment variable contains at least one band, each band of the environment variable is converted to the corresponding environment variable raster layer. , And then go to step B3.
- Step B3. Use bilinear interpolation to resample all environment variable raster layers, unify the spatial resolution of the raster data to the preset spatial resolution, and then proceed to step B4. For example, if the coverage area of a raster is 100m ⁇ 100m, its spatial resolution is 100m. The higher the raster resolution, the higher the spatial detail of the expression elements.
- the raster data resampling method is not limited to bilinear interpolation, and techniques such as nearest neighbor method and cubic convolution interpolation method can also be used.
- Step B4 Obtain the area corresponding to the first-level area on the raster layer of all environmental variables, and extract the data value of each specified environmental variable corresponding to the sampling point location for each sampling point location corresponding to the first-level area, and then go to step B5.
- Step B5. Add the specified environmental variables to the soil independent variable set to update the soil independent variable set, and then proceed to step C.
- Step C Delete the respective variables that cause multicollinearity in the soil independent variable set and the respective variables whose correlation with the soil dependent variable is lower than the preset significant difference threshold of the correlation to achieve the update of the soil independent variable set, and then proceed to step D.
- step C specifically executes the following steps C1 to C4.
- Step C1 For each sampling point location corresponding to the first-level region, train a linear regression model between the data value of the soil dependent variable and the data value of the respective variable in the soil independent variable set, and obtain the coefficient of determination of the respective variable in the soil independent variable set Then go to step C2; Represents the coefficient of determination of the k-th independent variable in the set of soil independent variables.
- Step C2. For each independent variable in the soil independent variable set, press The calculation result of, obtain the variance expansion coefficient of each independent variable, and then go to step C3.
- Step C3. Determine whether the variance expansion coefficient of each variable in the soil independent variable set is less than the preset coefficient threshold. If the preset coefficient threshold is set to 5, then go to step C4; otherwise, delete the maximum variance expansion in the soil independent variable set For the independent variable of the coefficient, update the soil independent variable set, and return to step C1.
- Step C4 For each sampling point location corresponding to the first-level region, calculate the correlation between the data value of the soil dependent variable and the data value of the respective variable in the soil independent variable set, and delete the correlation between the soil independent variable set below Preset each independent variable with a significant difference threshold of correlation, update the set of soil independent variables, and then go to step D.
- Step D For each sampling point location corresponding to the first-level area, a stepwise multiple linear regression model is adopted, based on the preset number of iterations, such as 100 times, the soil dependent variable data value of the training sampling point location and the soil self-variable in the soil independent variable set
- the linear relationship between the variable data values is to obtain the temporary optimal set of independent variables selected in each iteration of training, and count the number of times that different temporary optimal sets of independent variables are selected respectively; after the training of the preset number of iterations is completed ,
- the temporary optimal independent variable set with the highest number of selections is taken as the optimal independent variable set corresponding to the first-level region, and then the following steps D1 to D2 are further executed.
- Step D1 Continue to use the stepwise multiple linear regression model, and continue to train the linear relationship between the soil dependent variable data value at the sampling point location and the soil independent variable data value in the soil independent variable set based on the preset number of increment iterations, such as 50 times. Respectively obtain the temporary optimal set of independent variables selected in each iteration of training, and continue to count the number of times that different temporary optimal sets of independent variables are selected respectively; after completing the training of the preset number of iterations, the one with the highest number of times will be selected The temporary optimal independent variable set is used as the candidate optimal independent variable set corresponding to the first-level region, and then step D2 is entered.
- the preset number of increment iterations such as 50 times.
- Step D2. Determine whether the two optimal independent variable sets to be selected corresponding to the newly obtained first-level region are consistent, if yes, use the optimal independent variable set to be selected as the optimal independent variable set corresponding to the first-level region, and enter Step E; otherwise, return to step D1.
- Step E Obtain the divided layer of the soil area with the preset attribute covering the first-level area, and extract the soil division area under the preset attribute where each sampling point location is located for each sampling point location corresponding to the first-level area, Based on the data value of the soil dependent variable at each sampling point location, analyze and obtain the difference result of the corresponding soil dependent variable between the different soil division areas.
- step G If the difference result is not greater than the preset significant difference threshold, enter the step G; If there is a difference result that is greater than the preset significant difference threshold in the difference results, merge the different soil division areas where the difference result is not greater than the preset significant difference threshold value, and combine the uncombined soil divisions Regions, each secondary region is formed, and based on the location of each sampling point corresponding to the primary region, the location of each sampling point corresponding to each secondary region is obtained, and then step F is entered.
- step E specifically executes the following steps E1 to E7.
- Step E1 Obtain the land-use layer and soil-forming parent material layer covering the first-level area, and extract the land-use division area and soil-forming parent material division of each sampling point location corresponding to each sampling point location in the first-level area Area, and then go to step E2.
- Step E2 Based on the soil dependent variable data value of each sampling point corresponding to the first-level area, using Duncan multiple comparison analysis method, analyze and obtain the difference results of the corresponding soil dependent variables between different land use divisions, and analyze the different results The difference results of the soil dependent variable between the soil parent material division regions, and then go to step E3.
- Step E3 If the difference results of the corresponding soil dependent variables between different land use division areas and the difference results of the corresponding soil dependent variables between different soil parent material division areas are not greater than the preset significant difference threshold, proceed to step G; otherwise, go to step E4.
- Step E4 If there is a difference result greater than the preset significant difference threshold among the difference results of the corresponding soil dependent variables between the different land use division areas, and the difference results of the corresponding soil dependent variables between the different soil parent material division areas , Are not greater than the preset significant difference threshold, then merge the different land use division areas where the difference result is not greater than the preset significant difference threshold to form each secondary area, and then proceed to step E7; otherwise, proceed to step E5.
- Step E5. If the difference results of the corresponding soil dependent variables between different soil-forming parent material division areas, there is a difference result greater than the preset significant difference threshold, and the difference results of the corresponding soil dependent variables between the different land use division areas , Are not greater than the preset significant difference threshold, then merge the different soil-forming parent material division regions in which the difference result is not greater than the preset significant difference threshold to form each secondary region, and then go to step E7; otherwise, go to step E6.
- Step E6 If the difference results of the corresponding soil dependent variables between different soil-forming parent material division areas and the difference results of the corresponding soil dependent variables between different land use division areas, there are differences greater than the preset significant difference threshold For the result of sex, merge the different soil-forming parent material division areas where the difference result is not greater than the preset significant difference threshold to form each secondary area, and for the different land use divisions where the difference result is not greater than the preset significant difference threshold The regions are merged to form each secondary region, and then go to step E7.
- Step E7 Combine each secondary area obtained by merging in the land use layer, and each unmerged land use division area, and each secondary area obtained by merging the soil parent material layer, and each unmerged soil parent material division area , Perform spatial superposition to obtain each secondary area, and obtain the location of each sampling point corresponding to each secondary area based on the location of each sampling point corresponding to the primary area, and then proceed to step F.
- the secondary area division that includes 2 types of land use is shown in Figure 3; the secondary area division that includes 3 types of soil-forming parent material is shown in Figure 4; it includes 2 types of land use and Figure 5 shows the division of the secondary regions of the three soil-forming parent material types.
- Step F For each secondary region, use the method of step D to obtain the optimal set of independent variables corresponding to each secondary region, and go to step G.
- Step G For each sampling point location corresponding to the first-level region, train the linear regression model and the nonlinear regression model between the soil dependent variable data value and the respective variable data value in the corresponding optimal independent variable set, and obtain the linear regression
- the determination coefficient of the model and the determination coefficient of the nonlinear regression model namely the determination coefficient R_OLS of the linear regression model corresponding to the first-level region, and the determination coefficient R_NLS of the nonlinear regression model.
- step H if there is no secondary region, go directly to step H; if there is a secondary region, then for each secondary region, train the soil dependent variable data value corresponding to each sampling point location and the respective variable data in the corresponding optimal independent variable set
- the linear regression model and the nonlinear regression model between the values, and the determination coefficient of the linear regression model and the determination coefficient of the nonlinear regression model are obtained, and then the determination coefficient of the linear regression model and the non-linear regression model corresponding to each secondary region are obtained.
- the linear regression model determines the coefficients, and further obtains the mean value R_OLS_mean of the linear regression model determination coefficients and the mean value R_NLS_mean of the nonlinear regression model determination coefficients corresponding to all the secondary regions; then enter step H.
- step G specifically executes the following steps G1 to G10.
- Step G1 For each sampling point position corresponding to the first-level area, divide the sampling point position of the first preset proportion as the training sample, and the remaining sampling point positions as the verification sample, and then go to step G2, the first pre Set the ratio to be greater than 50%, such as 75%.
- Step G2 For each sampling point position in the training sample, train the linear regression model OLS between the data value of the soil dependent variable and the data value of the respective variable in the corresponding optimal independent variable set, and proceed to step G3.
- Step G3 According to the data value of each variable in the corresponding optimal independent variable set corresponding to each sampling point position in the verification sample, apply the linear regression model OLS to obtain the predicted data value of the soil dependent variable corresponding to each sampling point position in the verification sample, and Go to step G4.
- Step G4 Calculate the coefficient of determination between the data value of the soil dependent variable corresponding to the location of each sampling point in the verification sample and the predicted data value of the corresponding soil dependent variable, that is, the coefficient of determination R_OLS of the linear regression model corresponding to the first-level region, and then enter Step G5.
- Step G5. For each independent variable in the optimal set of independent variables corresponding to the first-level region, perform a simulation of preset each designated function for the data value of the soil dependent variable at each sampling point in the training sample and the data value of the corresponding independent variable.
- each specified function is preset, such as power function, exponential function, hyperbolic function, and logarithmic function; then the function with the highest prediction accuracy is selected as the nonlinear fitting method corresponding to the independent variable; and the maximum value is obtained.
- Step G6 According to the nonlinear fitting method corresponding to the respective variables in the optimal independent variable set corresponding to the first-level region, use the nonlinear least squares method to train the soil dependent variable data value for each sampling point position in the training sample And the non-linear regression model NLS between the data values of the respective variables in the corresponding optimal independent variable set, and go to step G7.
- Step G7 According to the data value of each variable in the corresponding optimal independent variable set corresponding to each sampling point position in the verification sample, apply the non-linear regression model NLS to obtain the predicted data value of the soil dependent variable corresponding to each sampling point position in the verification sample. And go to step G8.
- Step G8 Calculate the determination coefficient between the soil dependent variable data value corresponding to each sampling point position in the verification sample and the corresponding soil dependent variable predicted data value, that is, the nonlinear regression model determination coefficient R_NLS corresponding to the first-level region, and then Go to step G9.
- Step G10 For each secondary region, execute the method from step G1 to step G8 to obtain the linear regression model determination coefficient and the nonlinear regression model determination coefficient corresponding to each secondary region; and further obtain all the secondary regions The mean value R_OLS_mean of the corresponding linear regression model determination coefficient and the mean value R_NLS_mean of the nonlinear regression model determination coefficient, and then go to step H.
- Step H If there is no secondary area, go to step H-I;
- step H-M is entered.
- Step HI If R_OLS ⁇ R_NLS, apply the linear regression model of the first-level region in step G to perform the prediction and supplementation of the soil dependent variable data value for each sampling point location in the target region where the soil dependent variable data value is missing, and then proceed to step I ;
- step G apply the non-linear regression model of the first-level region in step G to predict and supplement the soil dependent variable data value for each sampling point in the target region where the soil dependent variable data value is missing, and then go to step I.
- Step HM If R_OLS_mean ⁇ R_NLS_mean, apply the linear regression model of each secondary area in step G to perform the prediction and supplementation of the soil dependent variable data value for each sampling point location in the target area where the soil dependent variable data value is missing, and then enter Step M;
- step G If R_NLS_mean>R_OLS_mean, then apply the non-linear regression model of each secondary region in step G to perform the prediction and supplementation of the soil dependent variable data value for each sampling point location in the target region where the soil dependent variable data value is missing, and then go to step M .
- Step I Based on the location of each sampling point corresponding to the first-level area and the corresponding optimal set of independent variables, according to the location of each sampling point, the data values corresponding to the physical and chemical properties of the soil in the optimal set of independent variables are obtained to obtain The physical and chemical properties of each soil in the optimal independent variable set are respectively based on the prediction model of all the specified environmental variables in step B; then the physical and chemical properties of each soil in the optimal independent variable set are obtained by combining the specified environmental variable layers in step B Respectively correspond to the spatial distribution prediction layer of the first-level region, and then go to step J.
- step I specifically executes the following steps I1 to I2.
- Step I1 Based on the location of each sampling point corresponding to the first-level area and the corresponding optimal set of independent variables, for each soil physical and chemical property in the optimal set of independent variables, corresponding to the soil according to the location of each sampling point
- the physical and chemical property data values and the data values of the designated environmental variables in step B are respectively trained using the ten-fold cross-validation method for each designated prediction model to obtain each prediction model, where each prediction model is trained, such as including geographic Weighted regression, ordinary kriging, regression kriging, artificial neural network, enhanced regression tree.
- step B select the prediction model with the highest prediction accuracy as the prediction model for the physical and chemical properties of the soil based on all the specified environmental variables in step B; and then obtain the physical and chemical properties of the soil in the optimal set of independent variables based on all the specified environmental variables in step B. Forecast model, and then go to step I2.
- Step I2 According to the optimal set of independent variables corresponding to the first-level area, the physical and chemical properties of each soil are respectively based on the prediction model of all the specified environmental variables in step B, and the layers of the specified environmental variables in step B are combined to obtain the optimal independent variable.
- Each soil physical and chemical property in the variable set corresponds to the spatial distribution prediction layer of the first-level region.
- Step J Combine the spatial distribution prediction layer of each soil physical and chemical property in the optimal independent variable set corresponding to the first-level region and the first-level region corresponding to each environmental variable in the optimal independent variable set to form one The optimal independent variable layer set corresponding to the first-level region, and then go to step K.
- Step K If R_OLS ⁇ R_NLS, for each sampling point location corresponding to the first-level area, extract the respective variable data value from the optimal independent variable layer set corresponding to the first-level area, and train it and the soil dependent variable data value
- the linear regression model between constitutes the first-level regional prediction model, and enters step L;
- Step L According to the optimal independent variable layer set corresponding to the first-level area, apply the first-level regional prediction model to obtain the spatial distribution map of the soil dependent variable, that is, the spatial distribution map of the target soil properties in the target area, to achieve the target soil properties in the target area Prediction of content.
- Step M For each secondary region, use the method from step I to step J to obtain the optimal independent variable layer set corresponding to each secondary region, and then go to step N.
- Step N If R_OLS_mean ⁇ R_NLS_mean, for each secondary region, for each sampling point location corresponding to the secondary region, extract the respective variable data values from the optimal independent variable layer set corresponding to the secondary region, and train The linear regression model between it and the data value of the soil dependent variable constitutes the second-level regional prediction model; then each second-level regional prediction model is obtained, and step O is entered;
- Step O For each secondary region, apply the secondary region prediction model according to the optimal independent variable layer set corresponding to the secondary region to obtain the spatial distribution map of the soil dependent variable in the secondary region; and then obtain each secondary region
- the spatial distribution map of the soil dependent variables in the region is combined to form the spatial distribution map of the target soil properties in the target region to achieve the prediction of the target soil property content in the target region.
- Xuancheng is a prefecture-level city in Anhui province. It is located in the southeast of Anhui province. It is a central city in the region where Anhui, Jiangsu and Zhejiang meet.
- soil copper is not only involved in the treatment of soil heavy metal pollution, soil available copper is also an essential trace element for crop growth and development, and soil effective copper content technology has always attracted attention.
- Step A1 Specify the existing soil data in each data source, preset each sampling point location in the target area, and perform the effective copper content, organic matter content, effective phosphorus content, and effective potassium of the soil based on the 20cm soil sample collection depth.
- Step A2 Select each sampling point location that meets the requirement that the data values corresponding to the physical and chemical properties of the soil are non-empty.
- there are 383 sampling point locations and the smallest circumscribed polygon of the 383 sampling point locations is used, A first-level area is formed, and each sampling point position is taken as the sampling point position corresponding to the first-level area, and then step A3 is entered.
- the soil dependent variable here is the effective copper content.
- Organic matter content, effective phosphorus content, effective potassium content, effective iron content, effective manganese content, effective zinc content, pH, and total nitrogen content constitute a set of soil independent variables, and then proceed to step B.
- Step B1. Obtain a layer covering the first-level area and the specified environmental variables related to the soil dependent variables, including elevation (DEM), slope (Slope), profile curvature (ProCur), plane curvature (PlanCur), and terrain moisture index (TWI) ), annual average rainfall (MAP), annual average temperature (MAT), annual average soil temperature (SoilTemp), annual average sunshine (Solar), normalized vegetation index (NDVI) and net primary productivity (NPP), such as
- the elevation layer of the first-level area is shown in Figure 8
- the annual average rainfall layer covering the first-level area is shown in Figure 9.
- step B2 performs step B2 to step B4, and then perform step B5, add the specified environmental variables to the soil independent variable set, and update the soil independent variable set to ⁇ SOM, AP, AK, AFe ,AMn,AZn,pH,TN,DEM,Slope,ProCur,PlanCur,TWI,MAP,MAT,SoilTemp,Solar,NDVI,NPP ⁇ , and then go to step C.
- Step C Delete the respective variables that cause multicollinearity in the soil independent variable set, and the respective variables whose correlation results with the effective copper content of the soil dependent variable are lower than the preset significant difference threshold of the correlation, to achieve the update of the soil independent variable set,
- the updated soil independent variable set is ⁇ AP, AK, AFe, AZn, pH, TN, DEM, Slope, ProCur, MAP, Solar, NDVI ⁇ , and then go to step D.
- step D Execute as described in step D above to obtain the optimal set of independent variables ⁇ AZn, pH, TN, DEM, MAP ⁇ corresponding to the first-level region, and proceed to step E;
- step E the following steps E1 to E7 are executed.
- Step E1 Obtain the land-use layer and soil-forming parent material layer covering the first-level area, and extract the land-use division area and soil-forming parent material division of each sampling point location corresponding to each sampling point location in the first-level area Area, and then go to step E2.
- Step E2 Based on the soil dependent variable data value of each sampling point corresponding to the first-level area, using Duncan multiple comparison analysis method, analyze and obtain the difference results of the corresponding soil dependent variables between different land use divisions, and analyze the different results The difference results of the soil dependent variable between the soil parent material division regions, and then go to step E3.
- the analysis shows that the effective copper content corresponding to the 383 sampling point positions in this embodiment, the difference results corresponding to the effective copper content between the divided regions of different soil parent materials, there are more than the preset significant difference threshold The difference results, and the difference results corresponding to the effective copper content between different land use divisions are not greater than the preset significant difference threshold.
- the soil-forming parent material types are divided into two types: type a (calcareous sedimentary rock and corresponding metamorphic rock weathering) and type b (light-colored crystalline rock weathering) , Clastic sedimentary rock and corresponding metamorphic rock weathering material and loess), the area covered by each type of combined soil-forming parent material is a secondary area, and the secondary area based on the two types of soil-forming parent material is shown in Figure 7. , And then go to step E7.
- type a calcareous sedimentary rock and corresponding metamorphic rock weathering
- type b light-colored crystalline rock weathering
- Step E7 Combine each secondary area obtained by merging in the land use layer, and each unmerged land use division area, and each secondary area obtained by merging the soil parent material layer, and each unmerged soil parent material division area , Perform spatial superposition to obtain each secondary area, and obtain the location of each sampling point corresponding to each secondary area based on the location of each sampling point corresponding to the primary area, and then proceed to step F.
- Step F For each secondary region, use the method of step D to obtain the optimal set of independent variables corresponding to each secondary region, and go to step G.
- step G is realized by executing the following steps G1 to G10.
- Step G1 For each sampling point position corresponding to the first-level area, divide 75% of the sampling point positions as training samples, and the remaining 25% of each sampling point position as verification samples, and then proceed to step G2.
- Step G2 For each sampling point position in the training sample, train the linear regression model OLS between the data value of the soil dependent variable and the data value of the respective variable in the corresponding optimal independent variable set ⁇ AZn, pH, TN, DEM, MAP ⁇ as follows :
- Step G5. For each independent variable in the optimal independent variable set corresponding to the first-level region, the data value of the soil dependent variable and the data value of the corresponding independent variable at each sampling point in the training sample are respectively performed including a power function and an exponential function , Hyperbolic function and logarithmic function fitting; then select the function with the highest prediction accuracy as the nonlinear fitting method corresponding to the independent variable; and then obtain the nonlinearity corresponding to the respective variables in the optimal set of independent variables Fitting mode, and then go to step G6.
- Step G10 For each secondary region, execute the method from step G1 to step G8 to obtain the linear regression model determination coefficient and the nonlinear regression model determination coefficient corresponding to each secondary region; and further obtain all the secondary regions
- step H Based on the execution of step H and step HI, according to R_OLS ⁇ R_NLS, and R_OLS ⁇ R_OLS_mean, and R_OLS ⁇ R_NLS_mean, the linear regression model of the first-level region in step G is applied, for each sampling of missing soil dependent variable data values in the target region Point the position, carry out the prediction and supplement of the soil dependent variable data value, and then go to step I.
- step I1 completes step I, and obtain the spatial distribution prediction layer of each soil physical and chemical property corresponding to the first-level region in the optimal independent variable set corresponding to the first-level region, such as predicting the generation of the first-level coverage
- the effective zinc content layer of the area is shown in Figure 10, and then step J is entered.
- Step J Combine the spatial distribution prediction layer of each soil physical and chemical property in the optimal independent variable set corresponding to the first-level region and the first-level region corresponding to each environmental variable in the optimal independent variable set to form one The optimal independent variable layer set corresponding to the first-level region, and then go to step K.
- Step K According to R_OLS ⁇ R_NLS, for each sampling point location corresponding to the first-level area, extract the respective variable data value from the optimal independent variable layer set corresponding to the first-level area, and train it and the soil dependent variable data value
- the linear regression model between is as follows:
- step L is entered.
- Step L According to the optimal independent variable layer set corresponding to the first-level region, apply the first-level regional prediction model to obtain the spatial distribution map of the soil dependent variable, that is, the spatial distribution map of the effective copper content in the target region, as shown in Figure 11, to achieve Prediction of the content of the target soil properties in the target area.
- the target soil property content prediction method based on the soil transfer function designed in the present invention takes environmental variables and uncertainty analysis into consideration in the sample-oriented soil data prediction, and integrates the soil transfer function and temporary in the area-oriented digital soil mapping.
- the soil map produced by production avoids the uncertainty of soil data sample point prediction and regional mapping, and effectively solves the low correlation between environmental variables and soil physical and chemical properties in the traditional digital soil mapping method, which leads to low accuracy of the soil map produced by production.
- the method of the present invention has good transplantability, and can not only be applied to the production of soil maps of different scales and different soil physical and chemical properties, but also can be applied to perfect soil databases of different scales.
- the proposed technology needs to be applied in more technical fields to test its performance.
Landscapes
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Theoretical Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Computing Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Crystallography & Structural Chemistry (AREA)
- Bioinformatics & Computational Biology (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Environmental & Geological Engineering (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Biochemistry (AREA)
- Analytical Chemistry (AREA)
- Medicinal Chemistry (AREA)
- Food Science & Technology (AREA)
- Remote Sensing (AREA)
- Geology (AREA)
- General Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明涉及一种基于土壤传递函数的目标土壤性质含量预测方法,在收集多源土壤数据集、环境变量的基础上,划分包含全部测定信息的数据集,根据土壤性质的空间异质性划分二级区域,筛选不同区域土壤传递函数的最优自变量集合,进而面向不同分区进行线性、非线性土壤传递函数拟合;并通过不同分区、不同函数的精度对比,遴选最优的面向样点的土壤传递函数,用以完善土壤样点数据库;而且应用机器学习方式构建区域的土壤自变量图层,构建面向区域的土壤传递函数,制作生产目标区域的目标土壤性质含量空间分布图;进而能够高效实现目标土壤性质含量的准确预测,提高工作效率。
Description
本发明涉及一种基于土壤传递函数的目标土壤性质含量预测方法,属于计量土壤技术领域。
作为人类生存环境的重要载体,社会经济的发展与土壤存在着紧密的联系。自20世纪以来,全球工业化的快速发展导致了土壤重金属含量的急剧增加,尤其工业区与各种矿区周边土壤的重金属含量非常高,已经严重影响到生态系统的稳定,引起了世界各国相关部门、广大人民的重视,是全球公认的热点话题。可持续生态农业的发展建议使用更为生态、环保的有机肥料。但是在当前农业生产中广泛施用的牛粪和猪粪会在土壤中产生糖类、酚类、有机酸等化合物,易导致土壤铜的螯合或络合,带来潜在的环境污染。土壤中的磷元素是植物生长所必需的三大主要营养元素之一。然而,据相关资料统计,我国大多数的农业生态系统的土壤磷含量低于植物的需求量,这也导致了我国近三十年来磷肥的使用量逐年上升。不合理的过量施用磷肥直接导致了土壤中大量的磷酸盐累积,显著影响了施用磷肥的当季利用率(约10%-20%),造成了磷肥资源的损失与浪费。农田土壤中磷元素的大量累积也直接造成了严重的环境问题,主要体现在磷随水体迁移而造成的水体富营养化。因此,对土壤性质含量进行定期监测意义重大。
不同应用部门所创建的土壤数据库一般只是涉及到基本的土壤物理、化学性质,鲜有涉及到土壤微量元素含量、重金属含量数据,例如瓦赫宁根大学创建的WoSIS土壤数据库与联合国粮农组织创建的HWSD(Harmonized World Soil Database v 1.2)数据库只是包含了土壤有机质、pH、质地、氮、磷、钾等基本的理化性质。
国内外相关技术部门、公司与学者已提出一系列的土壤性质含量化学测定方法。例如,可用的土壤有效铜含量测量方法包括:原子吸收光谱仪、DTPA-TEA浸提法与原子吸收分光光度计等;可用的土壤全磷含量测量方法包括:高温烧灼酸浸提法、强酸消煮法、碱熔法、连续流动分析仪等。常规获取土壤性质含量信息的方法是野外样品采集和室内化学分析测试,该方法精度高,但费时费力,且难以获得区域土壤性质含量的空间分布信息。近年来有学者尝试使用室内反射光谱可见光/近红外光谱等技术来反演部分土壤性质含量。土壤中的铁氧化物、有机质对土壤重金属有一定的吸附作用,且在光谱曲线上体现一定的吸收特征,进而可以间接地预测土壤重金属含量。基于土壤元素对于光谱的响应特征,还可以构建土壤全磷与 不同光谱指标的预测模型(例如偏最小二乘回归)。这一类方法具有高效、无损、快速等显著优点,在土壤成分快速检测中的应用潜力较大。但该方法在具体应用中存在一定的测量误差,不同研究区、不同操作人员的测定误差相差非常大。
然而,不同的土壤调查相关部门在实际中的需求各异,限于预算支出,这些部门不可能测定所有的土壤理化指标,只会测定与具体需求相关的一些土壤理化性质,例如土壤微生物调查、工程土壤调查。因此,部分的土壤调查虽然收集了不少的土壤样品,但后续的化学分析没有测定土壤重金属含量。由于历史土壤数据库的土壤样品大多数已经丢失,无法通过化学实验弥补缺失的土壤养分数据(例如土壤的氮磷钾含量)。当农业、生态相关的部门集成了多源的土壤数据进行土壤质量、肥力评价时,经常发现土壤数据库中不少的土壤样点数据缺少土壤有效钾含量的信息。
针对类似的数据库缺少数据的问题,技术人员提出使用土壤传递函数(PTF)来弥补缺失的土壤数据。土壤传递函数的原理是基于土壤物理、化学性质间的相关性,通过构建已测定土壤性质与未测定土壤性质的预测模型,来实现缺失数据的更新。常用的土壤传递函数模型主要包括统计回归模型、人工神经网络、物理经验模型等。其中,统计回归模型是应用部门经常采用的研究模型,在具体应用中存在容易实现、预测精度高、变量解释程度高等优点。
随着传感器技术、地理信息系统、全球定位系统等技术的飞速发展,地理、地质、气象、遥感、土地规划等部门生产制作了大量的环境要素图层,例如土壤温度、蒸散发、年平均降水量、年平均气温、年平均日照、湿润指数、土地利用、高程、坡向等图层。从土壤演化的角度出发,土壤的演化受到了多种成土要素的综合作用:气候、地形、母质、生物和时间,可以使用成土要素变量来模拟预测土壤理化性质的空间变异特征,即土壤-景观模型。该技术在数字土壤制图领域已取得较为广泛的应用。
目前,基于土壤传递函数预测土壤性质含量,在预测技术与评估技术上存在一定的局限性,具体包括:
(1)通过对相关技术文献、专利与技术报告的检索,发现土壤传递函数预测土壤性质含量的技术较为匮乏。这主要是由于部分土壤性质含量与其他土壤其他理化性质相关性较低造成的。传统的土壤传递函数的构建也较少考虑不同尺度、地理范围全覆盖的环境变量的集成。尤其是当土壤理化性质间的相关性较低时,土壤传递函数的构建需要大量的土壤数据的支持,数据量的大小直接影响到了预测模型的精度。欠缺对环境变量的集成一定程度上影响了土壤传递函数在土壤性质含量预测的精度。
(2)数理统计模型在实际应用中需要考虑的一个重要因素就是不确定性。不确定性也是土壤传递函数在实际应用中较为欠缺的一个要素。例如基于最小二乘法的土壤传递函数,必须厘定各输入要素(土壤理化性质)的误差,才能通过相关预测模型评估该线性模型涉及到的不确定性传播。
(3)获取到的土壤传递函数仅能预测样点尺度的土壤信息,无法扩展到区域尺度,形成能够服务于更多应用部门的土壤图。由于传统的土壤传递函数的输入数据是实验室分析的土壤理化性质数据,所构建的函数仅代表了测定的土壤性质含量与其他理化性质数据间的关系。覆盖区域的土壤图的精度是无法达到实验室分析的精度,因此无法直接将这些土壤传递函数直接应用在历史土壤图上进行土壤性质含量空间分布图的制作。
以上所述现有土壤性质含量预测技术的不足,已影响到生物、农学和环境等相关应用部门生产、加工土壤信息产品的具体效益,一定程度上给国家生态规划、精细农业部署造成了经济损失。
发明内容
本发明所要解决的技术问题是提供一种基于土壤传递函数的目标土壤性质含量预测方法,采用全新设计架构,弥补了现有技术的不足,能够高效实现目标土壤性质含量的准确预测,提高工作效率。
本发明为了解决上述技术问题采用以下技术方案:本发明设计了一种基于土壤传递函数的目标土壤性质含量预测方法,用于实现目标区域中目标土壤性质含量的预测,包括如下步骤:
步骤A.基于已有土壤数据,选择目标区域中满足对应预设各土壤物理化学性质的数据值均为非空要求的各个采样点位置,并以该各采样点位置的最小外接多边形,构成一级区域,且该各采样点位置作为一级区域所对应的各采样点位置,预设各土壤物理化学性质中包含目标土壤性质,定义目标土壤性质为土壤因变量,除目标土壤性质以外的各土壤物理化学性质构成土壤自变量集合,然后进入步骤B;
步骤B.获得覆盖一级区域、与土壤因变量相关各指定环境变量的图层,分别针对一级区域所对应的各采样点位置,提取采样点位置所对应各指定环境变量的数据值,并将该各指定环境变量加入土壤自变量集合中,实现对土壤自变量集合的更新,然后进入步骤C;
步骤C.删除土壤自变量集合中引起多重共线性的各自变量、以及与土壤因变量相关性结果低于预设相关性显著差异阈值的各自变量,实现对土壤自变量集合的更新,然后进入步 骤D;
步骤D.针对一级区域所对应的各采样点位置,采用逐步多元线性回归模型,基于预设迭代次数,训练采样点位置土壤因变量数据值与土壤自变量集合中各土壤自变量数据值之间的线性关系,分别获取每次迭代训练中所筛选的临时最优自变量集合,并统计不同临时最优自变量集合分别被选中的次数;待完成预设迭代次数的训练后,将被选中次数最高的临时最优自变量集合作为一级区域所对应的最优自变量集合,并进入步骤E;
步骤E.获得覆盖一级区域的预设属性土壤区域的划分图层,并针对一级区域所对应的各采样点位置,提取该各采样点位置分别所在该预设属性下的土壤划分区域,基于该各采样点位置的土壤因变量的数据值,分析获得该不同土壤划分区域之间对应土壤因变量的差异性结果,若该各差异性结果均不大于预设显著差异阈值,则进入步骤G;若该各差异性结果中存在大于预设显著差异阈值的差异性结果,则针对其中差异性结果不大于预设显著差异阈值的不同土壤划分区域进行合并,并结合未合并的各土壤划分区域,构成各个二级区域,以及基于一级区域所对应的各采样点位置,获得各个二级区域分别所对应的各采样点位置,然后进入步骤F;
步骤F.分别针对各个二级区域,采用步骤D的方法,获得各个二级区域分别所对应的最优自变量集合,并进入步骤G;
步骤G.针对一级区域所对应的各采样点位置,训练土壤因变量数据值与对应最优自变量集合中各自变量数据值之间的线性回归模型、非线性回归模型,并获得该线性回归模型的确定系数、以及非线性回归模型的确定系数,即一级区域所对应的线性回归模型确定系数R_OLS、以及非线性回归模型确定系数R_NLS;
进一步若不存在二级区域,则直接进入步骤H;若存在二级区域,则分别针对各个二级区域,训练对应各采样点位置土壤因变量数据值与对应最优自变量集合中各自变量数据值之间的线性回归模型、非线性回归模型,并获得该线性回归模型的确定系数、以及非线性回归模型的确定系数,进而获得各二级区域分别所对应的线性回归模型确定系数、以及非线性回归模型确定系数,并进一步获得所有二级区域所对应的线性回归模型确定系数的均值R_OLS_mean、以及非线性回归模型确定系数的均值R_NLS_mean;然后进入步骤H;
步骤H.若不存在二级区域,则进入步骤I;
若存在二级区域,当R_OLS均大于R_OLS_mean、R_NLS_mean,或者R_NLS均大于R_OLS_mean、R_NLS_mean,则进入步骤I;
当R_OLS_mean均大于R_OLS、R_NLS,或者R_NLS_mean均大于R_OLS、R_NLS,则进入步 骤M;
步骤I.基于一级区域所对应的各个采样点位置、以及所对应的最优自变量集合,根据该各采样点位置分别对应该最优自变量集合中各土壤物理化学性质的数据值,获得该最优自变量集合中各土壤物理化学性质分别基于步骤B中全部指定环境变量的预测模型;然后结合步骤B中各指定环境变量图层,获得该最优自变量集合中各土壤物理化学性质分别对应一级区域的空间分布预测图层,然后进入步骤J;
步骤J.将一级区域所对应最优自变量集合中各土壤物理化学性质的空间分布预测图层、与该最优自变量集合中各环境变量对应一级区域的图层进行合并,构成一级区域所对应最优自变量图层集合,然后进入步骤K;
步骤K.若R_OLS≥R_NLS,则针对一级区域所对应的各采样点位置,由一级区域所对应最优自变量图层集合中提取各自变量数据值,并训练其与土壤因变量数据值之间的线性回归模型,构成一级区域预测模型,并进入步骤L;
若R_NLS>R_OLS,则针对一级区域所对应的各采样点位置,由一级区域所对应最优自变量图层集合中提取各自变量数据值,并训练其与土壤因变量数据值之间的非线性回归模型,构成一级区域预测模型,并进入步骤L;
步骤L.根据一级区域所对应最优自变量图层集合,应用一级区域预测模型,获得土壤因变量空间分布图,即目标区域中目标土壤性质空间分布图,实现目标区域中目标土壤性质含量的预测;
步骤M.分别针对各个二级区域,采用步骤I至步骤J的方法,获得各二级区域分别所对应最优自变量图层集合,然后进入步骤N;
步骤N.若R_OLS_mean≥R_NLS_mean,则分别针对各个二级区域,针对二级区域所对应各采样点位置,由该二级区域所对应最优自变量图层集合中提取各自变量数据值,并训练其与土壤因变量数据值之间的线性回归模型,构成该二级区域预测模型;进而获得各个二级区域预测模型,并进入步骤O;
若R_NLS_mean>R_OLS_mean,则分别针对各个二级区域,针对二级区域所对应各采样点位置,由该二级区域所对应最优自变量图层集合中提取各自变量数据值,并训练其与土壤因变量数据值之间的非线性回归模型,构成该二级区域预测模型;进而获得各个二级区域预测模型,并进入步骤O;
步骤O.分别针对各个二级区域,根据二级区域所对应最优自变量图层集合,应用该二级区域预测模型,获得该二级区域中土壤因变量空间分布图;进而获得各二级区域中土壤因 变量空间分布图,通过组合构成目标区域中目标土壤性质空间分布图,实现目标区域中目标土壤性质含量的预测。
作为本发明的一种优选技术方案,还包括步骤H-I和步骤H-M分别如下,且步骤H如下:
步骤H.若不存在二级区域,则进入步骤H-I;
若存在二级区域,当R_OLS均大于R_OLS_mean、R_NLS_mean,或者R_NLS均大于R_OLS_mean、R_NLS_mean,则进入步骤H-I;
当R_OLS_mean均大于R_OLS、R_NLS,或者R_NLS_mean均大于R_OLS、R_NLS,则进入步骤H-M;
步骤H-I.若R_OLS≥R_NLS,则应用步骤G中一级区域的线性回归模型,针对目标区域中缺失土壤因变量数据值的各采样点位置,进行土壤因变量数据值预测补充,然后进入步骤I;
若R_NLS>R_OLS,则应用步骤G中一级区域的非线性回归模型,针对目标区域中缺失土壤因变量数据值的各采样点位置,进行土壤因变量数据值预测补充,然后进入步骤I;
步骤H-M.若R_OLS_mean≥R_NLS_mean,则分别应用步骤G中各二级区域的线性回归模型,针对目标区域中缺失土壤因变量数据值的各采样点位置,进行土壤因变量数据值预测补充,然后进入步骤M;
若R_NLS_mean>R_OLS_mean,则分别应用步骤G中各二级区域的非线性回归模型,针对目标区域中缺失土壤因变量数据值的各采样点位置,进行土壤因变量数据值预测补充,然后进入步骤M。
作为本发明的一种优选技术方案,所述步骤A包括如下步骤:
步骤A1.由指定各个数据源中的已有土壤数据,针对目标区域中预设各采样点位置,分别进行包含目标土壤性质在内的各个预设土壤物理化学性质的数据值收集操作,然后进入步骤A2;
步骤A2.选择满足对应各土壤物理化学性质的数据值均为非空要求的各个采样点位置,并以该各采样点位置的最小外接多边形,构成一级区域,且该各采样点位置作为一级区域所对应的各采样点位置,然后进入步骤A3;
步骤A3.定义目标土壤性质为土壤因变量,除目标土壤性质以外的各土壤物理化学性质构成土壤自变量集合,然后进入步骤B。
作为本发明的一种优选技术方案,所述步骤B包括如下步骤:
步骤B1.获得覆盖一级区域、与土壤因变量相关各指定环境变量的图层,然后进入步骤 B2;
步骤B2.将各指定环境变量的图层分别转换为环境变量栅格图层,其中,若环境变量包含至少一个波段,则该环境变量的各波段分别转换为相对应的环境变量栅格图层,然后进入步骤B3;
步骤B3.使用双线性内插法,针对所有环境变量栅格图层进行重采样,统一栅格数据的空间分辨率为预设空间分辨率,然后进入步骤B4;
步骤B4.获得所有环境变量栅格图层上对应一级区域的区域,并分别针对一级区域所对应的各采样点位置,提取采样点位置所对应各指定环境变量的数据值,然后进入步骤B5;
步骤B5.将该各指定环境变量加入土壤自变量集合中,实现对土壤自变量集合的更新,然后进入步骤C。
作为本发明的一种优选技术方案,所述步骤C包括如下步骤:
步骤C1.针对一级区域所对应的各采样点位置,训练土壤因变量数据值与土壤自变量集合中各自变量数据值之间的线性回归模型,并获得土壤自变量集合中各自变量的确定系数
然后进入步骤C2;
表示土壤自变量集合中第k个自变量的确定系数;
步骤C3.判断土壤自变量集合中各自变量方差膨胀系数是否均小于预设系数阈值,是则进入步骤C4;否则删除土壤自变量集合中最大方差膨胀系数的自变量,更新土壤自变量集合,并返回步骤C1;
步骤C4.针对一级区域所对应的各采样点位置,计算获得土壤因变量数据值分别与土壤自变量集合中各自变量数据值之间的相关性,并删除土壤自变量集合中相关性低于预设相关性显著差异阈值的各个自变量,更新土壤自变量集合,然后进入步骤D。
作为本发明的一种优选技术方案,所述步骤D中,在完成预设迭代次数的训练后,将被选中次数最高的临时最优自变量集合作为一级区域所对应的待选最优自变量集合,然后还包括如下步骤:
步骤D1.继续采用逐步多元线性回归模型,基于预设增幅迭代次数,继续训练采样点位置土壤因变量数据值与土壤自变量集合中各土壤自变量数据值之间的线性关系,分别获取每次迭代训练中所筛选的临时最优自变量集合,并继续统计不同临时最优自变量集合分别被选 中的次数;待完成预设增幅迭代次数的训练后,将被选中次数最高的临时最优自变量集合作为一级区域所对应的待选最优自变量集合,然后进入步骤D2;
步骤D2.判断最新所获一级区域对应的两个待选最优自变量集合是否一致,是则将该待选最优自变量集合作为一级区域所对应的最优自变量集合,并进入步骤E;否则返回步骤D1。
作为本发明的一种优选技术方案,所述步骤E包括如下步骤:
步骤E1.获得覆盖一级区域的土地利用图层和成土母质图层,并针对一级区域所对应的各采样点位置,提取该各采样点位置分别所在土地利用划分区域、成土母质划分区域,然后进入步骤E2;
步骤E2.基于一级区域所对应各采样点位置的土壤因变量数据值,使用Duncan多重比较分析方法,分析获得不同土地利用划分区域之间对应土壤因变量的差异性结果,以及分析获得不同成土母质划分区域之间对应土壤因变量的差异性结果,然后进入步骤E3;
步骤E3.若不同土地利用划分区域之间对应土壤因变量的差异性结果、以及不同成土母质划分区域之间对应土壤因变量的差异性结果,均不大于预设显著差异阈值,则进入步骤G;否则进入步骤E4;
步骤E4.若不同土地利用划分区域之间对应土壤因变量的差异性结果中,存在大于预设显著差异阈值的差异性结果,且不同成土母质划分区域之间对应土壤因变量的差异性结果,均不大于预设显著差异阈值,则针对其中差异性结果不大于预设显著差异阈值的不同土地利用划分区域进行合并,构成各个二级区域,然后进入步骤E7;否则进入步骤E5;
步骤E5.若不同成土母质划分区域之间对应土壤因变量的差异性结果中,存在大于预设显著差异阈值的差异性结果,且不同土地利用划分区域之间对应土壤因变量的差异性结果,均不大于预设显著差异阈值,则针对其中差异性结果不大于预设显著差异阈值的不同成土母质划分区域进行合并,构成各个二级区域,然后进入步骤E7;否则进入步骤E6;
步骤E6.若不同成土母质划分区域之间对应土壤因变量的差异性结果中、以及不同土地利用划分区域之间对应土壤因变量的差异性结果中,均存在大于预设显著差异阈值的差异性结果,则针对其中差异性结果不大于预设显著差异阈值的不同成土母质划分区域进行合并,构成各个二级区域,以及针对其中差异性结果不大于预设显著差异阈值的不同土地利用划分区域进行合并,构成各个二级区域,然后进入步骤E7;
步骤E7.将土地利用图层中合并所得各个二级区域、以及未合并的各土地利用划分区域,与成土母质图层中合并所得各个二级区域、以及未合并的各成土母质划分区域,进行空间叠加,获得各个二级区域,并基于一级区域所对应的各采样点位置,获得各个二级区域分别所 对应的各采样点位置,然后进入步骤F。
作为本发明的一种优选技术方案,所述步骤G包括如下步骤:
步骤G1.针对一级区域所对应的各采样点位置,划分其中第一预设比例数量的各采样点位置,作为训练样本,剩余各采样点位置作为验证样本,然后进入步骤G2,第一预设比例大于50%;
步骤G2.针对训练样本中的各采样点位置,训练土壤因变量数据值与对应最优自变量集合中各自变量数据值之间的线性回归模型OLS,并进入步骤G3;
步骤G3.针对验证样本中各采样点位置对应相应最优自变量集合中各自变量的数据值,应用该线性回归模型OLS,获得验证样本中各采样点位置所对应土壤因变量预测数据值,并进入步骤G4;
步骤G4.计算验证样本中各采样点位置所对应土壤因变量数据值、与所对应土壤因变量预测数据值之间的确定系数,即一级区域所对应的线性回归模型确定系数R_OLS,然后进入步骤G5;
步骤G5.分别针对一级区域所对应最优自变量集合中的各个自变量,针对训练样本中各采样点位置土壤因变量数据值与对应自变量的数据值,进行预设各指定函数的拟合,并选择预测精度最高的函数,作为该自变量所对应的非线性拟合方式;进而获得该最优自变量集合中各自变量分别所对应的非线性拟合方式,然后进入步骤G6;
步骤G6.根据一级区域所对应最优自变量集合中各自变量分别所对应的非线性拟合方式,使用非线性最小二乘法,针对训练样本中的各采样点位置,训练土壤因变量数据值与对应最优自变量集合中各自变量数据值之间的非线性回归模型NLS,并进入步骤G7;
步骤G7.针对验证样本中各采样点位置对应相应最优自变量集合中各自变量的数据值,应用该非线性回归模型NLS,获得验证样本中各采样点位置所对应土壤因变量预测数据值,并进入步骤G8;
步骤G8.计算验证样本中各采样点位置所对应土壤因变量数据值、与所对应土壤因变量预测数据值之间的确定系数,即一级区域所对应的非线性回归模型确定系数R_NLS,然后进入步骤G9;
步骤G9.若不存在二级区域,则直接进入步骤H;若存在二级区域,则进入步骤G10;
步骤G10.分别针对各个二级区域,执行步骤G1至步骤G8的方法,获得各二级区域分别所对应的线性回归模型确定系数、以及非线性回归模型确定系数;并进一步获得所有二级区域所对应的线性回归模型确定系数的均值R_OLS_mean、以及非线性回归模型确定系数的均 值R_NLS_mean,然后进入步骤H。
作为本发明的一种优选技术方案,所述步骤I包括如下步骤:
步骤I1.基于一级区域所对应的各个采样点位置、以及所对应的最优自变量集合,分别针对该最优自变量集合中的各个土壤物理化学性质,根据该各采样点位置分别对应土壤物理化学性质数据值、以及分别对应步骤B中各指定环境变量数据值,使用十折交叉验证的方式,针对指定各预测模型进行训练、获得各个预测模型,并选择最高预测精度的预测模型作为该土壤物理化学性质基于步骤B中全部指定环境变量的预测模型;进而获得该最优自变量集合中各土壤物理化学性质分别基于步骤B中全部指定环境变量的预测模型,然后进入步骤I2;
步骤I2.根据一级区域所对应最优自变量集合中、各土壤物理化学性质分别基于步骤B中全部指定环境变量的预测模型,结合步骤B中各指定环境变量图层,获得该最优自变量集合中各土壤物理化学性质分别对应一级区域的空间分布预测图层。
本发明所述一种基于土壤传递函数的目标土壤性质含量预测方法,采用以上技术方案与现有技术相比,具有以下技术效果:
(1)本发明所设计基于土壤传递函数的目标土壤性质含量预测方法中,所提出样点尺度的土壤传递函数,能够充分利用现有的地理要素信息,改进目标土壤性质含量预测精度低的难题,能够直接服务于国家自然资源调查数据监测补充、更新,也能为动态的生态模型、地表过程模拟中的数据补充提供技术服务;尤其是预测过程中对环境变量的动态筛选机制,修正了传统预测技术的局限性,实现了“有限资源,多源应用”的通用土壤性质预测技术,在农业应用、国土资源等部门具有广阔的工业化应用前景;
(2)本发明所设计基于土壤传递函数的目标土壤性质含量预测方法中,所提出目标土壤性质含量空间异质性的分区机制,更能准确地度量土壤传递函数拟合过程中变量与预测函数的不确定,动态筛选最优自变量集合的技术规程,不仅能够量化相关生产流程中涉及到的不确定性,也能最大程度上确定土壤传递函数所需的最优因变量集合,进而显著提升了本发明的普适性与稳健性;
(3)本发明所设计基于土壤传递函数的目标土壤性质含量预测方法,有别于传统的面向样点的土壤传递函数,所提出的技术流程涵盖了土壤图与测定土壤理化性质的映射机制,能够改进面向样点尺度的土壤传递函数,优化函数参数对土壤图的兼容性,进而将拟合的函数升尺度至不同的研究区域,实现覆盖区域尺度的土壤图制作;该技术充分利用了现有地理信息系统的技术优势,能够为更多应用部门提供更为迫切的土壤图产品。
图1是本发明所设计基于土壤传递函数的目标土壤性质含量预测方法的步骤流程图;
图2是本发明中原始土壤数据集合与核心土壤数据集合构建示意图;
图3是本发明中包含了2种土地利用类型的二级区域划分示意图;
图4是本发明中包含了3种成土母质类型的二级区域划分示意图;
图5是本发明中包含了2种土地利用类型与3种成土母质类型的二级区域划分示意图;
图6是本发明中栅格环境变量图层与土壤样点的示意图;
图7是本发明实施例中基于两类成土母质的二级区域图层与采样点空间分布;
图8是本发明实施例中覆盖一级区域的高程图层;
图9是本发明实施例中覆盖一级区域的年均降雨图层;
图10是本发明实施例中预测生成最优自变量集合中有效锌含量的空间分布图;
图11是本发明实施例中预测生成土壤因变量的空间分布图。
下面结合说明书附图对本发明的具体实施方式作进一步详细的说明。
本发明所设计一种基于土壤传递函数的目标土壤性质含量预测方法,基本思想是在收集多源土壤数据集、环境变量的基础上,划分包含全部测定信息的数据集,根据土壤性质的空间异质性划分二级区域,筛选不同区域土壤传递函数的最优自变量集合,进而面向不同分区进行线性、非线性土壤传递函数拟合。通过不同分区、不同函数的精度对比,遴选最优的面向样点的土壤传递函数,用以完善土壤样点数据库。使用较为成熟的机器学习构建区域的土壤图,提取土样样点的实验室分析数据与土壤图的数据,对比分析其误差,更新训练面向样点的土壤传递函数为面向区域的土壤传递函数,制作生产目标区域的目标土壤性质含量空间分布图。
本发明设计了一种基于土壤传递函数的目标土壤性质含量预测方法,用于实现目标区域中目标土壤性质含量的预测,实际应用当中,如图1所示,具体执行如下步骤A至步骤O。
步骤A.基于已有土壤数据,选择目标区域中满足对应预设各土壤物理化学性质的数据值均为非空要求的各个采样点位置,并以该各采样点位置的最小外接多边形,构成一级区域,且该各采样点位置作为一级区域所对应的各采样点位置,预设各土壤物理化学性质中包含目标土壤性质,定义目标土壤性质为土壤因变量,除目标土壤性质以外的各土壤物理化学性质构成土壤自变量集合,然后进入步骤B。
在具体实施中,上述步骤A具体执行如下步骤A1至步骤A3。
步骤A1.由指定各个数据源中的已有土壤数据,针对目标区域中预设各采样点位置,分别进行包含目标土壤性质在内的各个预设土壤物理化学性质的数据值收集操作,诸如图2所示,其中,S_1、S_2、S_3、S_4、S_5为各个土壤物理化学性质,然后进入步骤A2。
对于各个预设土壤物理化学性质的数据值,选择同一土壤深度位置的数据值,诸如选择0-1m土壤剖面下的土壤物理化学性质的数据值,或者选择0-20cm土壤剖面下的土壤物理化学性质的数据值。
步骤A2.选择满足对应各土壤物理化学性质的数据值均为非空要求的各个采样点位置,并以该各采样点位置的最小外接多边形,构成一级区域,且该各采样点位置作为一级区域所对应的各采样点位置,然后进入步骤A3。
步骤A3.定义目标土壤性质为土壤因变量,除目标土壤性质以外的各土壤物理化学性质构成土壤自变量集合,然后进入步骤B。
步骤B.获得覆盖一级区域、与土壤因变量相关各指定环境变量的图层,分别针对一级区域所对应的各采样点位置,提取采样点位置所对应各指定环境变量的数据值,并将该各指定环境变量加入土壤自变量集合中,实现对土壤自变量集合的更新,然后进入步骤C。
在具体实施中,上述步骤B具体执行如下步骤B1至步骤B5。
步骤B1.获得覆盖一级区域、与土壤因变量相关各指定环境变量的图层,然后进入步骤B2。这些环境变量在一定程度上影响了土壤的形成与演化,诸如下表1所示部分可供选择的环境变量图层。
表1
注:环境变量图层可以是矢量的Shapefile数据格式,也可以是栅格格式(如TIFF、Grid)。
步骤B2.将各指定环境变量的图层分别转换为环境变量栅格图层,其中,若环境变量包含至少一个波段,则该环境变量的各波段分别转换为相对应的环境变量栅格图层,然后进入步骤B3。
步骤B3.使用双线性内插法,针对所有环境变量栅格图层进行重采样,统一栅格数据的空间分辨率为预设空间分辨率,然后进入步骤B4。例如,一个栅格的覆盖面积为100m×100m,则其空间分辨率为100m,栅格分辨率越高,表达要素的空间详细程度越高。
实际应用中,栅格数据重采样方法不限于双线性内插法,也可以使用最邻近法、三次卷积插值法等技术。
实际应用中,所获栅格环境变量图层与土壤样点的示意图如图6所示。
步骤B4.获得所有环境变量栅格图层上对应一级区域的区域,并分别针对一级区域所对应的各采样点位置,提取采样点位置所对应各指定环境变量的数据值,然后进入步骤B5。
步骤B5.将该各指定环境变量加入土壤自变量集合中,实现对土壤自变量集合的更新,然后进入步骤C。
步骤C.删除土壤自变量集合中引起多重共线性的各自变量、以及与土壤因变量相关性结果低于预设相关性显著差异阈值的各自变量,实现对土壤自变量集合的更新,然后进入步骤D。
在具体实施中,上述步骤C具体执行如下步骤C1至步骤C4。
步骤C1.针对一级区域所对应的各采样点位置,训练土壤因变量数据值与土壤自变量集合中各自变量数据值之间的线性回归模型,并获得土壤自变量集合中各自变量的确定系数
然后进入步骤C2;
表示土壤自变量集合中第k个自变量的确定系数。
步骤C3.判断土壤自变量集合中各自变量方差膨胀系数是否均小于预设系数阈值,这里的预设系数阈值诸如设定为5,是则进入步骤C4;否则删除土壤自变量集合中最大方差膨胀系数的自变量,更新土壤自变量集合,并返回步骤C1。
步骤C4.针对一级区域所对应的各采样点位置,计算获得土壤因变量数据值分别与土壤自变量集合中各自变量数据值之间的相关性,并删除土壤自变量集合中相关性低于预设相关性显著差异阈值的各个自变量,更新土壤自变量集合,然后进入步骤D。
步骤D.针对一级区域所对应的各采样点位置,采用逐步多元线性回归模型,基于预设迭代次数,诸如100次,训练采样点位置土壤因变量数据值与土壤自变量集合中各土壤自变量数据值之间的线性关系,分别获取每次迭代训练中所筛选的临时最优自变量集合,并统计不同临时最优自变量集合分别被选中的次数;待完成预设迭代次数的训练后,将被选中次数最高的临时最优自变量集合作为一级区域所对应的最优自变量集合,然后进一步执行如下步骤D1至步骤D2。
步骤D1.继续采用逐步多元线性回归模型,基于预设增幅迭代次数,诸如50次,继续训练采样点位置土壤因变量数据值与土壤自变量集合中各土壤自变量数据值之间的线性关系,分别获取每次迭代训练中所筛选的临时最优自变量集合,并继续统计不同临时最优自变量集合分别被选中的次数;待完成预设增幅迭代次数的训练后,将被选中次数最高的临时最优自变量集合作为一级区域所对应的待选最优自变量集合,然后进入步骤D2。
步骤D2.判断最新所获一级区域对应的两个待选最优自变量集合是否一致,是则将该待选最优自变量集合作为一级区域所对应的最优自变量集合,并进入步骤E;否则返回步骤D1。
步骤E.获得覆盖一级区域的预设属性土壤区域的划分图层,并针对一级区域所对应的各采样点位置,提取该各采样点位置分别所在该预设属性下的土壤划分区域,基于该各采样点位置的土壤因变量的数据值,分析获得该不同土壤划分区域之间对应土壤因变量的差异性结果,若该各差异性结果均不大于预设显著差异阈值,则进入步骤G;若该各差异性结果中 存在大于预设显著差异阈值的差异性结果,则针对其中差异性结果不大于预设显著差异阈值的不同土壤划分区域进行合并,并结合未合并的各土壤划分区域,构成各个二级区域,以及基于一级区域所对应的各采样点位置,获得各个二级区域分别所对应的各采样点位置,然后进入步骤F。
在具体实施中,上述步骤E具体执行如下步骤E1至步骤E7。
步骤E1.获得覆盖一级区域的土地利用图层和成土母质图层,并针对一级区域所对应的各采样点位置,提取该各采样点位置分别所在土地利用划分区域、成土母质划分区域,然后进入步骤E2。
步骤E2.基于一级区域所对应各采样点位置的土壤因变量数据值,使用Duncan多重比较分析方法,分析获得不同土地利用划分区域之间对应土壤因变量的差异性结果,以及分析获得不同成土母质划分区域之间对应土壤因变量的差异性结果,然后进入步骤E3。
步骤E3.若不同土地利用划分区域之间对应土壤因变量的差异性结果、以及不同成土母质划分区域之间对应土壤因变量的差异性结果,均不大于预设显著差异阈值,则进入步骤G;否则进入步骤E4。
步骤E4.若不同土地利用划分区域之间对应土壤因变量的差异性结果中,存在大于预设显著差异阈值的差异性结果,且不同成土母质划分区域之间对应土壤因变量的差异性结果,均不大于预设显著差异阈值,则针对其中差异性结果不大于预设显著差异阈值的不同土地利用划分区域进行合并,构成各个二级区域,然后进入步骤E7;否则进入步骤E5。
步骤E5.若不同成土母质划分区域之间对应土壤因变量的差异性结果中,存在大于预设显著差异阈值的差异性结果,且不同土地利用划分区域之间对应土壤因变量的差异性结果,均不大于预设显著差异阈值,则针对其中差异性结果不大于预设显著差异阈值的不同成土母质划分区域进行合并,构成各个二级区域,然后进入步骤E7;否则进入步骤E6。
步骤E6.若不同成土母质划分区域之间对应土壤因变量的差异性结果中、以及不同土地利用划分区域之间对应土壤因变量的差异性结果中,均存在大于预设显著差异阈值的差异性结果,则针对其中差异性结果不大于预设显著差异阈值的不同成土母质划分区域进行合并,构成各个二级区域,以及针对其中差异性结果不大于预设显著差异阈值的不同土地利用划分区域进行合并,构成各个二级区域,然后进入步骤E7。
步骤E7.将土地利用图层中合并所得各个二级区域、以及未合并的各土地利用划分区域,与成土母质图层中合并所得各个二级区域、以及未合并的各成土母质划分区域,进行空间叠加,获得各个二级区域,并基于一级区域所对应的各采样点位置,获得各个二级区域分别所 对应的各采样点位置,然后进入步骤F。
实际应用中,诸如包含了2种土地利用类型的二级区域划分如图3所示;包含了3种成土母质类型的二级区域划分如图4所示;包含了2种土地利用类型与3种成土母质类型的二级区域划分如图5所示。
步骤F.分别针对各个二级区域,采用步骤D的方法,获得各个二级区域分别所对应的最优自变量集合,并进入步骤G。
步骤G.针对一级区域所对应的各采样点位置,训练土壤因变量数据值与对应最优自变量集合中各自变量数据值之间的线性回归模型、非线性回归模型,并获得该线性回归模型的确定系数、以及非线性回归模型的确定系数,即一级区域所对应的线性回归模型确定系数R_OLS、以及非线性回归模型确定系数R_NLS。
进一步若不存在二级区域,则直接进入步骤H;若存在二级区域,则分别针对各个二级区域,训练对应各采样点位置土壤因变量数据值与对应最优自变量集合中各自变量数据值之间的线性回归模型、非线性回归模型,并获得该线性回归模型的确定系数、以及非线性回归模型的确定系数,进而获得各二级区域分别所对应的线性回归模型确定系数、以及非线性回归模型确定系数,并进一步获得所有二级区域所对应的线性回归模型确定系数的均值R_OLS_mean、以及非线性回归模型确定系数的均值R_NLS_mean;然后进入步骤H。
在具体实施中,上述步骤G具体执行如下步骤G1至步骤G10。
步骤G1.针对一级区域所对应的各采样点位置,划分其中第一预设比例数量的各采样点位置,作为训练样本,剩余各采样点位置作为验证样本,然后进入步骤G2,第一预设比例大于50%,诸如75%。
步骤G2.针对训练样本中的各采样点位置,训练土壤因变量数据值与对应最优自变量集合中各自变量数据值之间的线性回归模型OLS,并进入步骤G3。
步骤G3.针对验证样本中各采样点位置对应相应最优自变量集合中各自变量的数据值,应用该线性回归模型OLS,获得验证样本中各采样点位置所对应土壤因变量预测数据值,并进入步骤G4。
步骤G4.计算验证样本中各采样点位置所对应土壤因变量数据值、与所对应土壤因变量预测数据值之间的确定系数,即一级区域所对应的线性回归模型确定系数R_OLS,然后进入步骤G5。
步骤G5.分别针对一级区域所对应最优自变量集合中的各个自变量,针对训练样本中各采样点位置土壤因变量数据值与对应自变量的数据值,进行预设各指定函数的拟合,这里预 设各指定函数,诸如包括幂函数、指数函数、双曲线函数与对数函数;然后选择预测精度最高的函数,作为该自变量所对应的非线性拟合方式;进而获得该最优自变量集合中各自变量分别所对应的非线性拟合方式,然后进入步骤G6。
步骤G6.根据一级区域所对应最优自变量集合中各自变量分别所对应的非线性拟合方式,使用非线性最小二乘法,针对训练样本中的各采样点位置,训练土壤因变量数据值与对应最优自变量集合中各自变量数据值之间的非线性回归模型NLS,并进入步骤G7。
步骤G7.针对验证样本中各采样点位置对应相应最优自变量集合中各自变量的数据值,应用该非线性回归模型NLS,获得验证样本中各采样点位置所对应土壤因变量预测数据值,并进入步骤G8。
步骤G8.计算验证样本中各采样点位置所对应土壤因变量数据值、与所对应土壤因变量预测数据值之间的确定系数,即一级区域所对应的非线性回归模型确定系数R_NLS,然后进入步骤G9。
步骤G9.若不存在二级区域,则直接进入步骤H;若存在二级区域,则进入步骤G10。
步骤G10.分别针对各个二级区域,执行步骤G1至步骤G8的方法,获得各二级区域分别所对应的线性回归模型确定系数、以及非线性回归模型确定系数;并进一步获得所有二级区域所对应的线性回归模型确定系数的均值R_OLS_mean、以及非线性回归模型确定系数的均值R_NLS_mean,然后进入步骤H。
步骤H.若不存在二级区域,则进入步骤H-I;
若存在二级区域,当R_OLS均大于R_OLS_mean、R_NLS_mean,或者R_NLS均大于R_OLS_mean、R_NLS_mean,则进入步骤H-I;
当R_OLS_mean均大于R_OLS、R_NLS,或者R_NLS_mean均大于R_OLS、R_NLS,则进入步骤H-M。
步骤H-I.若R_OLS≥R_NLS,则应用步骤G中一级区域的线性回归模型,针对目标区域中缺失土壤因变量数据值的各采样点位置,进行土壤因变量数据值预测补充,然后进入步骤I;
若R_NLS>R_OLS,则应用步骤G中一级区域的非线性回归模型,针对目标区域中缺失土壤因变量数据值的各采样点位置,进行土壤因变量数据值预测补充,然后进入步骤I。
步骤H-M.若R_OLS_mean≥R_NLS_mean,则分别应用步骤G中各二级区域的线性回归模型,针对目标区域中缺失土壤因变量数据值的各采样点位置,进行土壤因变量数据值预测补充,然后进入步骤M;
若R_NLS_mean>R_OLS_mean,则分别应用步骤G中各二级区域的非线性回归模型,针对目标区域中缺失土壤因变量数据值的各采样点位置,进行土壤因变量数据值预测补充,然后进入步骤M。
步骤I.基于一级区域所对应的各个采样点位置、以及所对应的最优自变量集合,根据该各采样点位置分别对应该最优自变量集合中各土壤物理化学性质的数据值,获得该最优自变量集合中各土壤物理化学性质分别基于步骤B中全部指定环境变量的预测模型;然后结合步骤B中各指定环境变量图层,获得该最优自变量集合中各土壤物理化学性质分别对应一级区域的空间分布预测图层,然后进入步骤J。
在具体实施中,上述步骤I具体执行如下步骤I1至步骤I2。
步骤I1.基于一级区域所对应的各个采样点位置、以及所对应的最优自变量集合,分别针对该最优自变量集合中的各个土壤物理化学性质,根据该各采样点位置分别对应土壤物理化学性质数据值、以及分别对应步骤B中各指定环境变量数据值,使用十折交叉验证的方式,针对指定各预测模型进行训练、获得各个预测模型,这里各预测模型进行训练,诸如包括地理加权回归、普通克里格、回归克里格、人工神经网络、增强回归树。
然后选择最高预测精度的预测模型作为该土壤物理化学性质基于步骤B中全部指定环境变量的预测模型;进而获得该最优自变量集合中各土壤物理化学性质分别基于步骤B中全部指定环境变量的预测模型,然后进入步骤I2。
步骤I2.根据一级区域所对应最优自变量集合中、各土壤物理化学性质分别基于步骤B中全部指定环境变量的预测模型,结合步骤B中各指定环境变量图层,获得该最优自变量集合中各土壤物理化学性质分别对应一级区域的空间分布预测图层。
步骤J.将一级区域所对应最优自变量集合中各土壤物理化学性质的空间分布预测图层、与该最优自变量集合中各环境变量对应一级区域的图层进行合并,构成一级区域所对应最优自变量图层集合,然后进入步骤K。
步骤K.若R_OLS≥R_NLS,则针对一级区域所对应的各采样点位置,由一级区域所对应最优自变量图层集合中提取各自变量数据值,并训练其与土壤因变量数据值之间的线性回归模型,构成一级区域预测模型,并进入步骤L;
若R_NLS>R_OLS,则针对一级区域所对应的各采样点位置,由一级区域所对应最优自变量图层集合中提取各自变量数据值,并训练其与土壤因变量数据值之间的非线性回归模型,构成一级区域预测模型,并进入步骤L。
步骤L.根据一级区域所对应最优自变量图层集合,应用一级区域预测模型,获得土壤 因变量空间分布图,即目标区域中目标土壤性质空间分布图,实现目标区域中目标土壤性质含量的预测。
步骤M.分别针对各个二级区域,采用步骤I至步骤J的方法,获得各二级区域分别所对应最优自变量图层集合,然后进入步骤N。
步骤N.若R_OLS_mean≥R_NLS_mean,则分别针对各个二级区域,针对二级区域所对应各采样点位置,由该二级区域所对应最优自变量图层集合中提取各自变量数据值,并训练其与土壤因变量数据值之间的线性回归模型,构成该二级区域预测模型;进而获得各个二级区域预测模型,并进入步骤O;
若R_NLS_mean>R_OLS_mean,则分别针对各个二级区域,针对二级区域所对应各采样点位置,由该二级区域所对应最优自变量图层集合中提取各自变量数据值,并训练其与土壤因变量数据值之间的非线性回归模型,构成该二级区域预测模型;进而获得各个二级区域预测模型,并进入步骤O。
步骤O.分别针对各个二级区域,根据二级区域所对应最优自变量图层集合,应用该二级区域预测模型,获得该二级区域中土壤因变量空间分布图;进而获得各二级区域中土壤因变量空间分布图,通过组合构成目标区域中目标土壤性质空间分布图,实现目标区域中目标土壤性质含量的预测。
针对本发明所设计基于土壤传递函数的目标土壤性质含量预测方法,下面以安徽省宣城市研究样区的土壤有效铜含量预测为具体实施例,进行具体介绍。
宣城是安徽省地级市,地处安徽省东南部,是皖苏浙交汇区域中心城市,东南沿海沟通内地的重要通道。近年来,工业和城市的飞速发展导致了重金属的持续排放和积累,对粮食生产与生态环境造成了严重的影响。因为土壤铜不仅涉及到土壤重金属污染治理问题,土壤有效铜又是作物生长发育必要的微量元素,土壤有效铜含量技术一直备受关注。
应用本发明所设计预测方法,具体实现土壤有效铜预测,实现对数据库中缺少的有效铜数据进行补充,并结合环境变量图层完成该区域的土壤有效铜含量空间分布图的制作,按图1所示,具体执行过程如下。
步骤A1.由指定各个数据源中的已有土壤数据,针对目标区域中预设各采样点位置,基于20cm土样采集深度,分别进行土壤的有效铜含量、有机质含量、有效磷含量、有效钾含量、有效铁含量、有效锰含量、有效锌含量、pH、全氮含量的数据值收集操作,然后进入步骤A2。
步骤A2.选择满足对应各土壤物理化学性质的数据值均为非空要求的各个采样点位置, 此实施例下即为383个采样点位置,并以该383个采样点位置的最小外接多边形,构成一级区域,且该各采样点位置作为一级区域所对应的各采样点位置,然后进入步骤A3。
步骤A3.这里土壤因变量即为有效铜含量,有机质含量、有效磷含量、有效钾含量、有效铁含量、有效锰含量、有效锌含量、pH、全氮含量构成土壤自变量集合,然后进入步骤B。
按如下步骤B1至步骤B5,执行步骤B。
步骤B1.获得覆盖一级区域、与土壤因变量相关各指定环境变量的图层,包括高程(DEM)、坡度(Slope)、剖面曲率(ProCur)、平面曲率(PlanCur)、地形湿度指数(TWI)、年均降雨(MAP)、年均气温(MAT)、年均土壤温度(SoilTemp)、年平均日照(Solar)、归一化植被指数(NDVI)与净初级生产力(NPP),诸如覆盖一级区域的高程图层如图8所示,覆盖一级区域的年均降雨图层如图9所示。
然后基于空间分辨率500m的设定,执行步骤B2至步骤B4,然后执行步骤B5,将该各指定环境变量加入土壤自变量集合中,对土壤自变量集合更新为{SOM,AP,AK,AFe,AMn,AZn,pH,TN,DEM,Slope,ProCur,PlanCur,TWI,MAP,MAT,SoilTemp,Solar,NDVI,NPP},然后进入步骤C。
步骤C.删除土壤自变量集合中引起多重共线性的各自变量、以及与土壤因变量有效铜含量相关性结果低于预设相关性显著差异阈值的各自变量,实现对土壤自变量集合的更新,更新后的土壤自变量集合为{AP,AK,AFe,AZn,pH,TN,DEM,Slope,ProCur,MAP,Solar,NDVI},然后进入步骤D。
按上述步骤D的描述进行执行,获得一级区域所对应的最优自变量集合{AZn,pH,TN,DEM,MAP},并进入步骤E;
步骤E中执行如下步骤E1至步骤E7。
步骤E1.获得覆盖一级区域的土地利用图层和成土母质图层,并针对一级区域所对应的各采样点位置,提取该各采样点位置分别所在土地利用划分区域、成土母质划分区域,然后进入步骤E2。
步骤E2.基于一级区域所对应各采样点位置的土壤因变量数据值,使用Duncan多重比较分析方法,分析获得不同土地利用划分区域之间对应土壤因变量的差异性结果,以及分析获得不同成土母质划分区域之间对应土壤因变量的差异性结果,然后进入步骤E3。
经过步骤E1、E2,分析可知对应本实施例中的383个采样点位置的有效铜含量,不同成土母质划分区域之间对应有效铜含量的差异性结果中,存在大于预设显著差异阈值的差异性结果,且不同土地利用划分区域之间对应有效铜含量的差异性结果,均不大于预设显著差异 阈值,
由于该地区共四种成土母质,石灰质沉积岩及相应的变质岩风化物、浅色结晶岩风化物、碎屑沉积岩及相应的变质岩风化物与黄土。
则根据不同土地利用划分区域之间对应有效铜含量的差异性结果,成土母质类型分为两类:a类(石灰质沉积岩及相应的变质岩风化物)与b类(浅色结晶岩风化物、碎屑沉积岩及相应的变质岩风化物与黄土),每一类合并后的成土母质类型覆盖的区域为一个二级区域,则基于两类成土母质的二级区域如图7所示,然后进入步骤E7。
步骤E7.将土地利用图层中合并所得各个二级区域、以及未合并的各土地利用划分区域,与成土母质图层中合并所得各个二级区域、以及未合并的各成土母质划分区域,进行空间叠加,获得各个二级区域,并基于一级区域所对应的各采样点位置,获得各个二级区域分别所对应的各采样点位置,然后进入步骤F。
步骤F.分别针对各个二级区域,采用步骤D的方法,获得各个二级区域分别所对应的最优自变量集合,并进入步骤G。
通过执行如下步骤G1至步骤G10,实现步骤G的执行。
步骤G1.针对一级区域所对应的各采样点位置,划分其中75%的各采样点位置,作为训练样本,剩余25%的各采样点位置作为验证样本,然后进入步骤G2。
步骤G2.针对训练样本中的各采样点位置,训练土壤因变量数据值与对应最优自变量集合{AZn,pH,TN,DEM,MAP}中各自变量数据值之间的线性回归模型OLS如下:
ACu=-5.453+0.802×AZn+0.615×pH+0.712×TN-0.00552×DEM+0.00232×MAP
然后进入步骤G3。
然后执行步骤G3至步骤G4,获得一级区域所对应的线性回归模型确定系数R_OLS=0.51,然后进入步骤G5。
步骤G5.分别针对一级区域所对应最优自变量集合中的各个自变量,针对训练样本中各采样点位置土壤因变量数据值与对应自变量的数据值,分别进行包括幂函数、指数函数、双曲线函数与对数函数的拟合;然后选择预测精度最高的函数,作为该自变量所对应的非线性拟合方式;进而获得该最优自变量集合中各自变量分别所对应的非线性拟合方式,然后进入步骤G6。
进一步依次执行步骤G6至步骤G8,获得一级区域所对应的非线性回归模型确定系数R_NLS=0.46,然后进入步骤G9。
步骤G9.若不存在二级区域,则直接进入步骤H;若存在二级区域,则进入步骤G10。
步骤G10.分别针对各个二级区域,执行步骤G1至步骤G8的方法,获得各二级区域分别所对应的线性回归模型确定系数、以及非线性回归模型确定系数;并进一步获得所有二级区域所对应的线性回归模型确定系数的均值R_OLS_mean=0.45、以及非线性回归模型确定系数的均值R_NLS_mean=0.37,然后进入步骤H。
基于步骤H与步骤H-I的执行,根据R_OLS≥R_NLS,且R_OLS≥R_OLS_mean,且R_OLS≥R_NLS_mean,则应用步骤G中一级区域的线性回归模型,针对目标区域中缺失土壤因变量数据值的各采样点位置,进行土壤因变量数据值预测补充,然后进入步骤I。
通过执行上述步骤I1至步骤I2的描述,完成步骤I,获得一级区域所对应最优自变量集合中各土壤物理化学性质分别对应一级区域的空间分布预测图层,诸如预测生成覆盖一级区域的有效锌含量图层如图10所示,然后进入步骤J。
步骤J.将一级区域所对应最优自变量集合中各土壤物理化学性质的空间分布预测图层、与该最优自变量集合中各环境变量对应一级区域的图层进行合并,构成一级区域所对应最优自变量图层集合,然后进入步骤K。
步骤K.根据R_OLS≥R_NLS,则针对一级区域所对应的各采样点位置,由一级区域所对应最优自变量图层集合中提取各自变量数据值,并训练其与土壤因变量数据值之间的线性回归模型如下:
ACu=-3.459+1.249×AZn+0.939×pH+0.127×TN-0.00509×DEM-0.000522×MAP
即构成一级区域预测模型,并进入步骤L。
步骤L.根据一级区域所对应最优自变量图层集合,应用一级区域预测模型,获得土壤因变量空间分布图,即目标区域中有效铜含量空间分布图,如图11所示,实现目标区域中目标土壤性质含量的预测。
本发明所设计基于土壤传递函数的目标土壤性质含量预测方法,在面向样点的土壤数据预测中考虑了环境变量与不确定性分析,在面向区域的数字土壤制图中集成了土壤传递函数与临时生产制作的土壤图,避免了土壤数据样点预测与区域制图中的不确定性,有效解决了传统数字土壤制图方法中环境变量与土壤理化性质相关性低而导致生产制作的土壤图精度低的技术瓶颈。本发明的方法具有较好的移植性,不仅能够应用于不同尺度、不同土壤物理化学性质的土壤图制作,也能应用与不同规模的土壤数据库完善。所提出技术有待于在更多的技术领域应用,以检验其性能。
上面结合附图对本发明的实施方式作了详细说明,但是本发明并不限于上述实施方式,在本领域普通技术人员所具备的知识范围内,还可以在不脱离本发明宗旨的前提下做出各种 变化。
Claims (9)
- 一种基于土壤传递函数的目标土壤性质含量预测方法,用于实现目标区域中目标土壤性质含量的预测,其特征在于,包括如下步骤:步骤A.基于已有土壤数据,选择目标区域中满足对应预设各土壤物理化学性质的数据值均为非空要求的各个采样点位置,并以该各采样点位置的最小外接多边形,构成一级区域,且该各采样点位置作为一级区域所对应的各采样点位置,预设各土壤物理化学性质中包含目标土壤性质,定义目标土壤性质为土壤因变量,除目标土壤性质以外的各土壤物理化学性质构成土壤自变量集合,然后进入步骤B;步骤B.获得覆盖一级区域、与土壤因变量相关各指定环境变量的图层,分别针对一级区域所对应的各采样点位置,提取采样点位置所对应各指定环境变量的数据值,并将该各指定环境变量加入土壤自变量集合中,实现对土壤自变量集合的更新,然后进入步骤C;步骤C.删除土壤自变量集合中引起多重共线性的各自变量、以及与土壤因变量相关性结果低于预设相关性显著差异阈值的各自变量,实现对土壤自变量集合的更新,然后进入步骤D;步骤D.针对一级区域所对应的各采样点位置,采用逐步多元线性回归模型,基于预设迭代次数,训练采样点位置土壤因变量数据值与土壤自变量集合中各土壤自变量数据值之间的线性关系,分别获取每次迭代训练中所筛选的临时最优自变量集合,并统计不同临时最优自变量集合分别被选中的次数;待完成预设迭代次数的训练后,将被选中次数最高的临时最优自变量集合作为一级区域所对应的最优自变量集合,并进入步骤E;步骤E.获得覆盖一级区域的预设属性土壤区域的划分图层,并针对一级区域所对应的各采样点位置,提取该各采样点位置分别所在该预设属性下的土壤划分区域,基于该各采样点位置的土壤因变量的数据值,分析获得该不同土壤划分区域之间对应土壤因变量的差异性结果,若该各差异性结果均不大于预设显著差异阈值,则进入步骤G;若该各差异性结果中存在大于预设显著差异阈值的差异性结果,则针对其中差异性结果不大于预设显著差异阈值的不同土壤划分区域进行合并,并结合未合并的各土壤划分区域,构成各个二级区域,以及基于一级区域所对应的各采样点位置,获得各个二级区域分别所对应的各采样点位置,然后进入步骤F;步骤F.分别针对各个二级区域,采用步骤D的方法,获得各个二级区域分别所对应的最优自变量集合,并进入步骤G;步骤G.针对一级区域所对应的各采样点位置,训练土壤因变量数据值与对应最优自变量集合中各自变量数据值之间的线性回归模型、非线性回归模型,并获得该线性回归模型的确定系数、以及非线性回归模型的确定系数,即一级区域所对应的线性回归模型确定系数R_OLS、 以及非线性回归模型确定系数R_NLS;进一步若不存在二级区域,则直接进入步骤H;若存在二级区域,则分别针对各个二级区域,训练对应各采样点位置土壤因变量数据值与对应最优自变量集合中各自变量数据值之间的线性回归模型、非线性回归模型,并获得该线性回归模型的确定系数、以及非线性回归模型的确定系数,进而获得各二级区域分别所对应的线性回归模型确定系数、以及非线性回归模型确定系数,并进一步获得所有二级区域所对应的线性回归模型确定系数的均值R_OLS_mean、以及非线性回归模型确定系数的均值R_NLS_mean;然后进入步骤H;步骤H.若不存在二级区域,则进入步骤I;若存在二级区域,当R_OLS均大于R_OLS_mean、R_NLS_mean,或者R_NLS均大于R_OLS_mean、R_NLS_mean,则进入步骤I;当R_OLS_mean均大于R_OLS、R_NLS,或者R_NLS_mean均大于R_OLS、R_NLS,则进入步骤M;步骤I.基于一级区域所对应的各个采样点位置、以及所对应的最优自变量集合,根据该各采样点位置分别对应该最优自变量集合中各土壤物理化学性质的数据值,获得该最优自变量集合中各土壤物理化学性质分别基于步骤B中全部指定环境变量的预测模型;然后结合步骤B中各指定环境变量图层,获得该最优自变量集合中各土壤物理化学性质分别对应一级区域的空间分布预测图层,然后进入步骤J;步骤J.将一级区域所对应最优自变量集合中各土壤物理化学性质的空间分布预测图层、与该最优自变量集合中各环境变量对应一级区域的图层进行合并,构成一级区域所对应最优自变量图层集合,然后进入步骤K;步骤K.若R_OLS≥R_NLS,则针对一级区域所对应的各采样点位置,由一级区域所对应最优自变量图层集合中提取各自变量数据值,并训练其与土壤因变量数据值之间的线性回归模型,构成一级区域预测模型,并进入步骤L;若R_NLS>R_OLS,则针对一级区域所对应的各采样点位置,由一级区域所对应最优自变量图层集合中提取各自变量数据值,并训练其与土壤因变量数据值之间的非线性回归模型,构成一级区域预测模型,并进入步骤L;步骤L.根据一级区域所对应最优自变量图层集合,应用一级区域预测模型,获得土壤因变量空间分布图,即目标区域中目标土壤性质空间分布图,实现目标区域中目标土壤性质含量的预测;步骤M.分别针对各个二级区域,采用步骤I至步骤J的方法,获得各二级区域分别所对应最优自变量图层集合,然后进入步骤N;步骤N.若R_OLS_mean≥R_NLS_mean,则分别针对各个二级区域,针对二级区域所对应各采样点位置,由该二级区域所对应最优自变量图层集合中提取各自变量数据值,并训练其与土壤因变量数据值之间的线性回归模型,构成该二级区域预测模型;进而获得各个二级区域预测模型,并进入步骤O;若R_NLS_mean>R_OLS_mean,则分别针对各个二级区域,针对二级区域所对应各采样点位置,由该二级区域所对应最优自变量图层集合中提取各自变量数据值,并训练其与土壤因变量数据值之间的非线性回归模型,构成该二级区域预测模型;进而获得各个二级区域预测模型,并进入步骤O;步骤O.分别针对各个二级区域,根据二级区域所对应最优自变量图层集合,应用该二级区域预测模型,获得该二级区域中土壤因变量空间分布图;进而获得各二级区域中土壤因变量空间分布图,通过组合构成目标区域中目标土壤性质空间分布图,实现目标区域中目标土壤性质含量的预测。
- 根据权利要求1所述一种基于土壤传递函数的目标土壤性质含量预测方法,其特征在于,还包括步骤H-I和步骤H-M分别如下,且步骤H如下:步骤H.若不存在二级区域,则进入步骤H-I;若存在二级区域,当R_OLS均大于R_OLS_mean、R_NLS_mean,或者R_NLS均大于R_OLS_mean、R_NLS_mean,则进入步骤H-I;当R_OLS_mean均大于R_OLS、R_NLS,或者R_NLS_mean均大于R_OLS、R_NLS,则进入步骤H-M;步骤H-I.若R_OLS≥R_NLS,则应用步骤G中一级区域的线性回归模型,针对目标区域中缺失土壤因变量数据值的各采样点位置,进行土壤因变量数据值预测补充,然后进入步骤I;若R_NLS>R_OLS,则应用步骤G中一级区域的非线性回归模型,针对目标区域中缺失土壤因变量数据值的各采样点位置,进行土壤因变量数据值预测补充,然后进入步骤I;步骤H-M.若R_OLS_mean≥R_NLS_mean,则分别应用步骤G中各二级区域的线性回归模型,针对目标区域中缺失土壤因变量数据值的各采样点位置,进行土壤因变量数据值预测补充,然后进入步骤M;若R_NLS_mean>R_OLS_mean,则分别应用步骤G中各二级区域的非线性回归模型,针对目标区域中缺失土壤因变量数据值的各采样点位置,进行土壤因变量数据值预测补充,然后进入步骤M。
- 根据权利要求1所述一种基于土壤传递函数的目标土壤性质含量预测方法,其特征在于, 所述步骤A包括如下步骤:步骤A1.由指定各个数据源中的已有土壤数据,针对目标区域中预设各采样点位置,分别进行包含目标土壤性质在内的各个预设土壤物理化学性质的数据值收集操作,然后进入步骤A2;步骤A2.选择满足对应各土壤物理化学性质的数据值均为非空要求的各个采样点位置,并以该各采样点位置的最小外接多边形,构成一级区域,且该各采样点位置作为一级区域所对应的各采样点位置,然后进入步骤A3;步骤A3.定义目标土壤性质为土壤因变量,除目标土壤性质以外的各土壤物理化学性质构成土壤自变量集合,然后进入步骤B。
- 根据权利要求1所述一种基于土壤传递函数的目标土壤性质含量预测方法,其特征在于,所述步骤B包括如下步骤:步骤B1.获得覆盖一级区域、与土壤因变量相关各指定环境变量的图层,然后进入步骤B2;步骤B2.将各指定环境变量的图层分别转换为环境变量栅格图层,其中,若环境变量包含至少一个波段,则该环境变量的各波段分别转换为相对应的环境变量栅格图层,然后进入步骤B3;步骤B3.使用双线性内插法,针对所有环境变量栅格图层进行重采样,统一栅格数据的空间分辨率为预设空间分辨率,然后进入步骤B4;步骤B4.获得所有环境变量栅格图层上对应一级区域的区域,并分别针对一级区域所对应的各采样点位置,提取采样点位置所对应各指定环境变量的数据值,然后进入步骤B5;步骤B5.将该各指定环境变量加入土壤自变量集合中,实现对土壤自变量集合的更新,然后进入步骤C。
- 根据权利要求1所述一种基于土壤传递函数的目标土壤性质含量预测方法,其特征在于,所述步骤C包括如下步骤:步骤C1.针对一级区域所对应的各采样点位置,训练土壤因变量数据值与土壤自变量集合中各自变量数据值之间的线性回归模型,并获得土壤自变量集合中各自变量的确定系数 然后进入步骤C2; 表示土壤自变量集合中第k个自变量的确定系数;步骤C3.判断土壤自变量集合中各自变量方差膨胀系数是否均小于预设系数阈值,是则进入 步骤C4;否则删除土壤自变量集合中最大方差膨胀系数的自变量,更新土壤自变量集合,并返回步骤C1;步骤C4.针对一级区域所对应的各采样点位置,计算获得土壤因变量数据值分别与土壤自变量集合中各自变量数据值之间的相关性,并删除土壤自变量集合中相关性低于预设相关性显著差异阈值的各个自变量,更新土壤自变量集合,然后进入步骤D。
- 根据权利要求1所述一种基于土壤传递函数的目标土壤性质含量预测方法,其特征在于,所述步骤D中,在完成预设迭代次数的训练后,将被选中次数最高的临时最优自变量集合作为一级区域所对应的待选最优自变量集合,然后还包括如下步骤:步骤D1.继续采用逐步多元线性回归模型,基于预设增幅迭代次数,继续训练采样点位置土壤因变量数据值与土壤自变量集合中各土壤自变量数据值之间的线性关系,分别获取每次迭代训练中所筛选的临时最优自变量集合,并继续统计不同临时最优自变量集合分别被选中的次数;待完成预设增幅迭代次数的训练后,将被选中次数最高的临时最优自变量集合作为一级区域所对应的待选最优自变量集合,然后进入步骤D2;步骤D2.判断最新所获一级区域对应的两个待选最优自变量集合是否一致,是则将该待选最优自变量集合作为一级区域所对应的最优自变量集合,并进入步骤E;否则返回步骤D1。
- 根据权利要求1所述一种基于土壤传递函数的目标土壤性质含量预测方法,其特征在于,所述步骤E包括如下步骤:步骤E1.获得覆盖一级区域的土地利用图层和成土母质图层,并针对一级区域所对应的各采样点位置,提取该各采样点位置分别所在土地利用划分区域、成土母质划分区域,然后进入步骤E2;步骤E2.基于一级区域所对应各采样点位置的土壤因变量数据值,使用Duncan多重比较分析方法,分析获得不同土地利用划分区域之间对应土壤因变量的差异性结果,以及分析获得不同成土母质划分区域之间对应土壤因变量的差异性结果,然后进入步骤E3;步骤E3.若不同土地利用划分区域之间对应土壤因变量的差异性结果、以及不同成土母质划分区域之间对应土壤因变量的差异性结果,均不大于预设显著差异阈值,则进入步骤G;否则进入步骤E4;步骤E4.若不同土地利用划分区域之间对应土壤因变量的差异性结果中,存在大于预设显著差异阈值的差异性结果,且不同成土母质划分区域之间对应土壤因变量的差异性结果,均不大于预设显著差异阈值,则针对其中差异性结果不大于预设显著差异阈值的不同土地利用划分区域进行合并,构成各个二级区域,然后进入步骤E7;否则进入步骤E5;步骤E5.若不同成土母质划分区域之间对应土壤因变量的差异性结果中,存在大于预设显著差异阈值的差异性结果,且不同土地利用划分区域之间对应土壤因变量的差异性结果,均不大于预设显著差异阈值,则针对其中差异性结果不大于预设显著差异阈值的不同成土母质划分区域进行合并,构成各个二级区域,然后进入步骤E7;否则进入步骤E6;步骤E6.若不同成土母质划分区域之间对应土壤因变量的差异性结果中、以及不同土地利用划分区域之间对应土壤因变量的差异性结果中,均存在大于预设显著差异阈值的差异性结果,则针对其中差异性结果不大于预设显著差异阈值的不同成土母质划分区域进行合并,构成各个二级区域,以及针对其中差异性结果不大于预设显著差异阈值的不同土地利用划分区域进行合并,构成各个二级区域,然后进入步骤E7;步骤E7.将土地利用图层中合并所得各个二级区域、以及未合并的各土地利用划分区域,与成土母质图层中合并所得各个二级区域、以及未合并的各成土母质划分区域,进行空间叠加,获得各个二级区域,并基于一级区域所对应的各采样点位置,获得各个二级区域分别所对应的各采样点位置,然后进入步骤F。
- 根据权利要求1所述一种基于土壤传递函数的目标土壤性质含量预测方法,其特征在于,所述步骤G包括如下步骤:步骤G1.针对一级区域所对应的各采样点位置,划分其中第一预设比例数量的各采样点位置,作为训练样本,剩余各采样点位置作为验证样本,然后进入步骤G2,第一预设比例大于50%;步骤G2.针对训练样本中的各采样点位置,训练土壤因变量数据值与对应最优自变量集合中各自变量数据值之间的线性回归模型OLS,并进入步骤G3;步骤G3.针对验证样本中各采样点位置对应相应最优自变量集合中各自变量的数据值,应用该线性回归模型OLS,获得验证样本中各采样点位置所对应土壤因变量预测数据值,并进入步骤G4;步骤G4.计算验证样本中各采样点位置所对应土壤因变量数据值、与所对应土壤因变量预测数据值之间的确定系数,即一级区域所对应的线性回归模型确定系数R_OLS,然后进入步骤G5;步骤G5.分别针对一级区域所对应最优自变量集合中的各个自变量,针对训练样本中各采样点位置土壤因变量数据值与对应自变量的数据值,进行预设各指定函数的拟合,并选择预测精度最高的函数,作为该自变量所对应的非线性拟合方式;进而获得该最优自变量集合中各自变量分别所对应的非线性拟合方式,然后进入步骤G6;步骤G6.根据一级区域所对应最优自变量集合中各自变量分别所对应的非线性拟合方式,使 用非线性最小二乘法,针对训练样本中的各采样点位置,训练土壤因变量数据值与对应最优自变量集合中各自变量数据值之间的非线性回归模型NLS,并进入步骤G7;步骤G7.针对验证样本中各采样点位置对应相应最优自变量集合中各自变量的数据值,应用该非线性回归模型NLS,获得验证样本中各采样点位置所对应土壤因变量预测数据值,并进入步骤G8;步骤G8.计算验证样本中各采样点位置所对应土壤因变量数据值、与所对应土壤因变量预测数据值之间的确定系数,即一级区域所对应的非线性回归模型确定系数R_NLS,然后进入步骤G9;步骤G9.若不存在二级区域,则直接进入步骤H;若存在二级区域,则进入步骤G10;步骤G10.分别针对各个二级区域,执行步骤G1至步骤G8的方法,获得各二级区域分别所对应的线性回归模型确定系数、以及非线性回归模型确定系数;并进一步获得所有二级区域所对应的线性回归模型确定系数的均值R_OLS_mean、以及非线性回归模型确定系数的均值R_NLS_mean,然后进入步骤H。
- 根据权利要求1所述一种基于土壤传递函数的目标土壤性质含量预测方法,其特征在于,所述步骤I包括如下步骤:步骤I1.基于一级区域所对应的各个采样点位置、以及所对应的最优自变量集合,分别针对该最优自变量集合中的各个土壤物理化学性质,根据该各采样点位置分别对应土壤物理化学性质数据值、以及分别对应步骤B中各指定环境变量数据值,使用十折交叉验证的方式,针对指定各预测模型进行训练、获得各个预测模型,并选择最高预测精度的预测模型作为该土壤物理化学性质基于步骤B中全部指定环境变量的预测模型;进而获得该最优自变量集合中各土壤物理化学性质分别基于步骤B中全部指定环境变量的预测模型,然后进入步骤I2;步骤I2.根据一级区域所对应最优自变量集合中、各土壤物理化学性质分别基于步骤B中全部指定环境变量的预测模型,结合步骤B中各指定环境变量图层,获得该最优自变量集合中各土壤物理化学性质分别对应一级区域的空间分布预测图层。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/577,006 US11600363B2 (en) | 2020-03-19 | 2022-01-16 | PTF-based method for predicting target soil property and content |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010195020.4 | 2020-03-19 | ||
CN202010195020.4A CN111508569B (zh) | 2020-03-19 | 2020-03-19 | 一种基于土壤传递函数的目标土壤性质含量预测方法 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/577,006 Continuation US11600363B2 (en) | 2020-03-19 | 2022-01-16 | PTF-based method for predicting target soil property and content |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021184550A1 true WO2021184550A1 (zh) | 2021-09-23 |
Family
ID=71875808
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/093267 WO2021184550A1 (zh) | 2020-03-19 | 2020-05-29 | 一种基于土壤传递函数的目标土壤性质含量预测方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US11600363B2 (zh) |
CN (1) | CN111508569B (zh) |
WO (1) | WO2021184550A1 (zh) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114004147A (zh) * | 2021-10-27 | 2022-02-01 | 金陵科技学院 | 一种土壤湿润状态下同时预测多种土壤属性的方法 |
CN114021348A (zh) * | 2021-11-05 | 2022-02-08 | 中国矿业大学(北京) | 一种精细土地利用类型的矿区植被碳汇遥感反演方法 |
CN114357891A (zh) * | 2022-01-11 | 2022-04-15 | 中国冶金地质总局矿产资源研究院 | 一种土壤镉元素含量的高光谱遥感定量反演方法 |
CN114780914A (zh) * | 2022-04-08 | 2022-07-22 | 华中科技大学 | 一种通过pH值快速判定普洱茶发酵程度的方法 |
CN114819751A (zh) * | 2022-06-24 | 2022-07-29 | 广东省农业科学院农业质量标准与监测技术研究所 | 一种农产品产地环境风险诊断方法及系统 |
CN114821359A (zh) * | 2022-05-06 | 2022-07-29 | 华中农业大学 | 一种基于土壤—环境知识获取多尺度环境因子集方法 |
CN114965315A (zh) * | 2022-05-18 | 2022-08-30 | 重庆大学 | 一种基于高光谱成像的岩体损伤劣化快速评估方法 |
WO2023168519A1 (en) * | 2022-03-07 | 2023-09-14 | Roshan Water Solutions Incorporated | Cloud-based apparatus, system, and method for sample-testing |
CN116754738A (zh) * | 2023-08-16 | 2023-09-15 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | 水域碳动态监测方法、设备、机器人及计算机存储介质 |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112651108B (zh) * | 2020-12-07 | 2024-04-19 | 中国水利水电科学研究院 | 一种解耦气象要素和植被动态对水文要素影响的方法 |
US11899006B2 (en) * | 2022-02-22 | 2024-02-13 | Trace Genomics, Inc. | Precision farming system with scaled soil characteristics |
CN115759524B (zh) * | 2022-10-20 | 2023-12-08 | 中国农业大学 | 一种基于遥感影像植被指数的土壤生产力等级识别方法 |
CN115859596B (zh) * | 2022-11-24 | 2023-07-04 | 中国科学院生态环境研究中心 | 城-郊梯度区域土壤重金属累积过程时空模拟方法 |
CN116050935B (zh) * | 2023-02-24 | 2024-03-15 | 北京师范大学 | 一种确定生物多样性优先保护地信息的方法及装置 |
CN116483807B (zh) * | 2023-05-09 | 2023-10-24 | 生态环境部南京环境科学研究所 | 一种土壤污染物环境与毒性数据库构建方法 |
CN116580302B (zh) * | 2023-05-09 | 2023-11-21 | 湖北一方科技发展有限责任公司 | 一种高维水文数据处理系统及方法 |
CN117271968B (zh) * | 2023-11-22 | 2024-02-23 | 中国农业科学院农业环境与可持续发展研究所 | 一种土壤固碳量的核算方法及系统 |
CN117422313B (zh) * | 2023-12-18 | 2024-04-19 | 中科星图智慧科技安徽有限公司 | 一种综合气象因素和土壤因素预测森林火险等级的方法 |
CN117633720B (zh) * | 2024-01-09 | 2024-08-27 | 广东省农业科学院农业资源与环境研究所 | 一种土壤剖面的铁结合态有机碳含量预测方法 |
CN118136109A (zh) * | 2024-03-29 | 2024-06-04 | 江苏中气环境科技有限公司 | 一种农作物土壤中重金属污染的检测方法及系统 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106980750A (zh) * | 2017-02-23 | 2017-07-25 | 中国科学院南京土壤研究所 | 一种基于典型对应分析的土壤氮储量估算方法 |
CN106980603A (zh) * | 2017-02-23 | 2017-07-25 | 中国科学院南京土壤研究所 | 基于土壤类型归并与多元回归的土壤锰含量预测方法 |
CN108764527A (zh) * | 2018-04-23 | 2018-11-06 | 中国科学院南京土壤研究所 | 一种土壤有机碳库时空动态预测最优环境变量筛选方法 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102680413B (zh) * | 2012-05-25 | 2014-02-19 | 浙江大学 | 全景环带高光谱快速检测野外土壤有机质含量的装置和方法 |
US11109523B2 (en) * | 2016-08-19 | 2021-09-07 | The Regents Of The University Of California | Precision crop production-function models |
CA3041142C (en) * | 2016-12-16 | 2021-06-15 | Farmers Edge Inc. | Classification of soil texture and content by near-infrared spectroscopy |
US11263707B2 (en) * | 2017-08-08 | 2022-03-01 | Indigo Ag, Inc. | Machine learning in agricultural planting, growing, and harvesting contexts |
CN110455726B (zh) * | 2019-07-30 | 2022-02-11 | 东方智感(浙江)科技股份有限公司 | 一种实时预测土壤水分和全氮含量的方法 |
-
2020
- 2020-03-19 CN CN202010195020.4A patent/CN111508569B/zh active Active
- 2020-05-29 WO PCT/CN2020/093267 patent/WO2021184550A1/zh active Application Filing
-
2022
- 2022-01-16 US US17/577,006 patent/US11600363B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106980750A (zh) * | 2017-02-23 | 2017-07-25 | 中国科学院南京土壤研究所 | 一种基于典型对应分析的土壤氮储量估算方法 |
CN106980603A (zh) * | 2017-02-23 | 2017-07-25 | 中国科学院南京土壤研究所 | 基于土壤类型归并与多元回归的土壤锰含量预测方法 |
CN108764527A (zh) * | 2018-04-23 | 2018-11-06 | 中国科学院南京土壤研究所 | 一种土壤有机碳库时空动态预测最优环境变量筛选方法 |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114004147A (zh) * | 2021-10-27 | 2022-02-01 | 金陵科技学院 | 一种土壤湿润状态下同时预测多种土壤属性的方法 |
CN114004147B (zh) * | 2021-10-27 | 2024-04-12 | 金陵科技学院 | 一种土壤湿润状态下同时预测多种土壤属性的方法 |
CN114021348A (zh) * | 2021-11-05 | 2022-02-08 | 中国矿业大学(北京) | 一种精细土地利用类型的矿区植被碳汇遥感反演方法 |
CN114357891A (zh) * | 2022-01-11 | 2022-04-15 | 中国冶金地质总局矿产资源研究院 | 一种土壤镉元素含量的高光谱遥感定量反演方法 |
WO2023168519A1 (en) * | 2022-03-07 | 2023-09-14 | Roshan Water Solutions Incorporated | Cloud-based apparatus, system, and method for sample-testing |
CN114780914A (zh) * | 2022-04-08 | 2022-07-22 | 华中科技大学 | 一种通过pH值快速判定普洱茶发酵程度的方法 |
CN114821359A (zh) * | 2022-05-06 | 2022-07-29 | 华中农业大学 | 一种基于土壤—环境知识获取多尺度环境因子集方法 |
CN114965315A (zh) * | 2022-05-18 | 2022-08-30 | 重庆大学 | 一种基于高光谱成像的岩体损伤劣化快速评估方法 |
CN114819751A (zh) * | 2022-06-24 | 2022-07-29 | 广东省农业科学院农业质量标准与监测技术研究所 | 一种农产品产地环境风险诊断方法及系统 |
CN114819751B (zh) * | 2022-06-24 | 2022-09-20 | 广东省农业科学院农业质量标准与监测技术研究所 | 一种农产品产地环境风险诊断方法及系统 |
CN116754738A (zh) * | 2023-08-16 | 2023-09-15 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | 水域碳动态监测方法、设备、机器人及计算机存储介质 |
CN116754738B (zh) * | 2023-08-16 | 2023-11-28 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | 水域碳动态监测方法、设备、机器人及计算机存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN111508569A (zh) | 2020-08-07 |
US11600363B2 (en) | 2023-03-07 |
CN111508569B (zh) | 2023-05-09 |
US20220189588A1 (en) | 2022-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021184550A1 (zh) | 一种基于土壤传递函数的目标土壤性质含量预测方法 | |
CN101916337B (zh) | 一种基于地理信息系统的水稻生产潜力动态预测方法 | |
CN108549858B (zh) | 一种城市热岛效应的定量评价方法 | |
CN108446999B (zh) | 基于冠气温差和遥感信息进行灌区不同作物et估算方法 | |
Long et al. | Reconstruction of historical arable land use patterns using constrained cellular automata: A case study of Jiangsu, China | |
CN105912836A (zh) | 一种纯遥感数据驱动的流域水循环模拟方法 | |
CN105184427A (zh) | 一种对农田生态环境进行预警的方法及装置 | |
CN116757357B (zh) | 一种耦合多源遥感信息的土地生态状况评估方法 | |
CN110287615B (zh) | 基于遥感解译和降雨实验的雨水径流污染负荷测算方法 | |
CN110378576A (zh) | 城市化植被效应有效距离的定量化检测方法 | |
Liu et al. | Optimization of planning structure in irrigated district considering water footprint under uncertainty | |
CN110413666A (zh) | 一种耕地质量多源异构数据整合方法 | |
CN101836561B (zh) | 一种海滨盐土蓖麻产量预测模型及其构建方法和应用 | |
CN110909933A (zh) | 一种耦合作物模型与机器学习语言的农业干旱快速诊断和评估方法 | |
CN108764527B (zh) | 一种土壤有机碳库时空动态预测最优环境变量筛选方法 | |
CN113706357A (zh) | 基于gis和csle的区域土壤侵蚀评价 | |
Shi et al. | Influence of climate and socio-economic factors on the spatio-temporal variability of soil organic matter: A case study of Central Heilongjiang Province, China | |
CN108205718B (zh) | 一种粮食作物抽样测产方法及系统 | |
Liu et al. | Spatial and temporal evolution and greenhouse gas emissions of China's agricultural plastic greenhouses | |
CN113592186B (zh) | 一种基于实测径流数据的水文预报状态变量实时修正方法 | |
Chen et al. | Variation of gross primary productivity dominated by leaf area index in significantly greening area | |
CN101276446B (zh) | 一种区域作物需水量测算方法 | |
Liu et al. | Development of ecohydrological assessment tool and its application | |
CN113987778B (zh) | 一种基于野外站点的水土流失模拟值时空加权校正方法 | |
CN113435667B (zh) | 一种稻田灌排模式水资源利用综合评价方法及系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20926205 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20926205 Country of ref document: EP Kind code of ref document: A1 |