CN114140591B - Soil organic matter remote sensing mapping method combining machine learning and ground statistics - Google Patents

Soil organic matter remote sensing mapping method combining machine learning and ground statistics Download PDF

Info

Publication number
CN114140591B
CN114140591B CN202111437930.XA CN202111437930A CN114140591B CN 114140591 B CN114140591 B CN 114140591B CN 202111437930 A CN202111437930 A CN 202111437930A CN 114140591 B CN114140591 B CN 114140591B
Authority
CN
China
Prior art keywords
soil organic
organic matter
remote sensing
machine learning
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111437930.XA
Other languages
Chinese (zh)
Other versions
CN114140591A (en
Inventor
李振旺
杜昌文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Soil Science of CAS
Original Assignee
Institute of Soil Science of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Soil Science of CAS filed Critical Institute of Soil Science of CAS
Priority to CN202111437930.XA priority Critical patent/CN114140591B/en
Publication of CN114140591A publication Critical patent/CN114140591A/en
Application granted granted Critical
Publication of CN114140591B publication Critical patent/CN114140591B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • G06T2207/10044Radar image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30181Earth observation
    • G06T2207/30188Vegetation; Agriculture
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/10Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Geometry (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Remote Sensing (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The invention belongs to the technical field of soil remote sensing drawing and precise agriculture, and discloses a soil organic matter remote sensing drawing method combining machine learning and ground statistics, wherein DEM, optical and radar remote sensing data are acquired on a GEE remote sensing cloud platform, and characteristic variables are acquired through mathematical change and processing; extracting characteristic variable information corresponding to the sampling points, and carrying out exploratory data analysis and screening on a sample parameter set to obtain model input variables; dividing a sample set, establishing a soil organic matter inversion machine learning model and developing a preliminary drawing; then calculating residual errors of the actual measurement data and the prediction data of the soil organic matters, taking the residual errors as main variables, taking main components of characteristic variables as covariates, and predicting a spatial distribution map of the residual errors of the soil organic matters by using a ground statistics method; adding the soil organic matter map predicted by the machine learning model to obtain a final soil organic matter prediction map; and simultaneously, verifying the result. The invention can carry out fine and accurate drawing on soil organic matters.

Description

Soil organic matter remote sensing mapping method combining machine learning and ground statistics
Technical Field
The invention belongs to the technical field of soil remote sensing drawing and precise agriculture, and particularly relates to a soil organic matter remote sensing drawing method combining machine learning and ground statistics.
Background
At present, soil organic matters are an important index for evaluating soil quality, are decisive factors for influencing soil fertility and crop yield, and have important significance for soil nutrient element circulation and agricultural sustainable development. Therefore, there is a need for high resolution and high precision graphics of soil organic matter in developing agricultural precision management.
The farmland soil organic matter influencing factors are complex and closely related to factors such as topography, climate, matrix, vegetation, human activities and the like. The classical soil landscape relation theory is to estimate the spatial variation of the soil organic matters which are difficult to be observed in a spatially continuous way by correlating the soil organic matters with the soil-forming environmental factors. The space drawing method represented by geostatistics (such as Kriging, regression Kriging, cooperation Kriging and the like) is a main means for drawing large-scale soil organic matters due to simplicity, remarkable interpolation effect and good space autocorrelation. However, for small-range scattered plough map spots, the environmental gradients of climate, topography, matrix, vegetation and the like are small, the spatial synergy degree with soil organic matters is low, and the soil-forming environmental variable data with medium and low resolution can not or only partially reflect the soil spatial change, so that the high-precision drawing of the soil organic matters with field dimensions is difficult. Meanwhile, the soil statistics method fails to consider the complex nonlinear relation between the soil attribute and the environmental factor, and limits the application of the soil statistics method in areas where the soil organic matters such as complex topography and topography possibly generate severe changes.
The development of satellite remote sensing technology provides a stable data source for acquiring global earth surface environment parameters. The satellite remote sensing image has the characteristics of continuous space, strong timeliness, high resolution, easy acquisition and the like, the contained surface spectrum information can be used for inverting the attribute of the ground object and classifying the ground object, and the extracted multiple remote sensing indexes can be used for quantitatively or qualitatively expressing the vegetation growth condition, so that the current remote sensing information is gradually applied to soil mapping. The Sentinel-2A, B satellite system is an optical multispectral imager emitted by the European space agency, and can be used for carrying out ground monitoring on 13 wave bands such as visible light, near infrared, short wave infrared and infrared with a period of 5 days, wherein the spatial resolution of the visible light near infrared wave band can reach 10 meters, and the short wave infrared reaches 20 meters. Compared with Landsat satellites in the United states, the Sentinel-2 satellite system has richer spectral bands, shorter revisitation period and higher spatial resolution, so that the Sentinel-2 remote sensing image has more excellent performance in land utilization change, ecological environment evolution monitoring and vegetation crop growth monitoring of field scale. The Sentinel-1 synthetic aperture radar based on the active remote sensing technology is different from passive optical remote sensing, has stronger penetrating power, can effectively avoid the influence of cloud, rain and fog, and can acquire the information of the dielectric constant of the earth surface, the physical characteristics of soil, the geometric shape and the like of the target by measuring the polarized scattering characteristics of the target. Research shows that parameters such as polarization characteristics, backscattering coefficients and the like of SAR have high correlation with soil properties, and the method has successful application in aspects such as soil moisture, soil texture, surface roughness estimation and the like. Application to digital soil mapping in combination with optical and SAR data will also help to improve model predictive power. However, in the past research, optical remote sensing data such as multispectral, hyperspectral and the like are too much relied on, and the research of the synthetic aperture radar in soil organic matter mapping is relatively less.
The prediction model between the auxiliary environment variable and the soil attribute established by the machine learning algorithm is increasingly used for regional soil attribute spatial prediction, and the machine learning algorithm has great potential in soil organic matter spatial prediction research due to good generalization capability and predictive capability in solving high-dimensional problems, nonlinear problems and the like. However, the machine learning model cannot consider the spatial autocorrelation relation among samples when predicting the soil properties, so that the combination of the machine learning and the geostatistical method for developing the soil organic matter drawing can be used for improving the prediction precision by considering the spatial autocorrelation relation of the soil properties on the basis of ensuring the prediction precision.
The prior related art is as follows:
In the first related art, the existing mainstream soil organic matter mapping method is a soil-landscape model method based on soil genetics as a theoretical basis, and the theoretical method considers that: soil is the product of the combined effects of climate, biology, topography matrix, and time. Therefore, in a certain soil formation environment, soil properties corresponding to the environment are necessarily generated. By sampling the finite points, a relation model, such as linear regression, fuzzy inference classification tree and the like, between the environmental factors and the soil organic matters is established, and then the model is applied to an environmental factor database of a research area to predict and map the soil organic matters of the research area. The common methods include collaborative kriging, regression kriging and other statistical models and linear regression, fuzzy mathematics, expert knowledge, linear discriminant analysis, generalized linear models and other mathematical models.
In the related art, the method based on remote sensing data and machine learning utilizes the means of machine learning and space data mining, such as an artificial neural network model, a Bayesian model, a decision tree, a random forest and the like to establish the relationship between the remote sensing spectrum signal and the inversion environmental elements thereof and the spatial change of soil organic matters, and presumes the spatial distribution of the soil organic matters according to the relationship. The machine learning method can effectively solve the problem of nonlinearity between soil and environmental factors, and is a method widely applied so far.
Through the above analysis, the problems and defects existing in the prior art are as follows:
(1) The existing soil-landscape model method is only suitable for large-scale soil organic matter drawing under obvious topography fluctuation or complex environmental conditions, needs a large number of sample points, and is not suitable for fine drawing of crushed farmlands under long-term artificial influence; meanwhile, the complex nonlinear relation between the soil organic matters and the environmental factors cannot be considered, and the application of the complex nonlinear relation in the area where the soil organic matters possibly generate severe change under the complex environmental conditions is limited.
(2) In the prior art, when a machine learning algorithm and remote sensing data are utilized to map soil organic matters, the spatial autocorrelation of the soil organic matters is not considered, so that the spatial local feature description of the soil organic matters is not fine enough, and the phenomenon of spatial mutation of the soil organic matters which is inconsistent with reality is caused.
The difficulty of solving the problems and the defects is as follows: for scattered map spots under artificial influence, the common climate, topography, matrix, vegetation and other element environments have small gradient, the space synergy degree with soil organic matters is low, the soil-forming environment variable data with medium and low resolution can not or only reflect the soil space change very little, and the farmland soil organic matters with field scale are difficult to map with high precision. There is a need for feature variables and new methods that reflect the spatial variation of soil organic matter at the field scale.
The meaning of solving the problems and the defects is as follows: the invention can be simultaneously applied to mild or complex terrain conditions, reduces the workload of acquiring ground samples, improves the soil organic matter drawing precision of a large-scale field scale, and provides support for accurate management of crops and sustainable development of agriculture.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides a soil organic matter remote sensing drawing method combining machine learning and ground statistics. The invention relates to a soil organic matter remote sensing drawing method combining machine learning and ground statistics. According to the invention, by utilizing high-spatial-resolution DEM, sentinel-1 radar remote sensing data and Sentinel-2 optical remote sensing data and combining a ground statistics method and a machine learning algorithm, the nonlinear problem between soil organic matters and environmental elements is solved by fully utilizing machine learning, and simultaneously, the autocorrelation of the soil organic matters in space is considered, so that the drawing precision of the soil organic matters is improved, and the fine drawing of the soil organic matters in the field scale is realized.
The invention is realized in such a way that a soil organic matter remote sensing drawing method combining machine learning and ground statistics is realized, and the soil organic matter remote sensing drawing method combining machine learning and ground statistics comprises the following steps:
Firstly, acquiring DEM (digital elevation model), optical and radar remote sensing data on a GEE (google EARTH ENGINE) remote sensing cloud platform, performing digital change and processing, and extracting a topography factor, a spectrum variable, a vegetation index, a color index, a brightness index and a texture index as characteristic variables;
step two, acquiring soil organic matter sample point data in a research area; extracting characteristic variable information of corresponding sampling points, carrying out exploratory data analysis on a data set, and screening out model input variables;
Dividing a sample set into a training set and a verification set, establishing a corresponding soil organic matter inversion machine learning model, and developing a preliminary drawing of the soil organic matter;
Subtracting the predicted soil organic matters from the actual measurement value of the trained soil organic matters to obtain residual errors of the two; carrying out principal component analysis on the model input variables, taking residual errors as principal variables, taking the obtained principal components as covariates, and predicting a spatial distribution map of residual errors of soil organic matters in the whole research area by using a geostatistical method;
Step five, adding the soil organic matter map predicted by the machine learning model and the residual map obtained by the ground statistics method to obtain a final soil organic matter prediction map; and simultaneously, verifying the prediction result by using the verification set data.
Further, in the first step, the specific process of obtaining the remote sensing data on the platform is as follows: obtaining remote sensing data of DEM, sentinel-1 and Sentinel-2 on a Google EARTH ENGINE platform;
Further, in the first step, the specific process of mathematical change and processing of the remote sensing data is as follows: carrying out mathematical transformation such as band combination and gray level co-occurrence matrix calculation on Sentinel-2 optical remote sensing data, and extracting 5 groups of characteristic variables: the method comprises the steps of (1) calculating an earth surface reflectivity spectrum variable, a vegetation index variable, a brightness index variable, a color index variable and a texture index variable, and calculating an annual maximum value, an annual minimum value, an annual average value and an annual root mean square error of the vegetation index variable, the brightness index variable and the color index variable;
carrying out combination calculation and gray level co-occurrence matrix calculation on VV and VH polarization data of Sentinel-1 radar data, and extracting 2 groups of characteristic variables: a backscatter variable and a texture variable;
three topography factor feature variables are extracted with DEM data: elevation, grade, and slope.
Further, in the second step, the exploratory data analysis EDA performed on the data set is specifically: removing abnormal points of the sample, calculating a correlation coefficient matrix, performing significance test, removing non-significant correlation variables with soil organic matters, and removing high correlation variables among characteristic variables.
Further, in the second step, the sample set is divided into a training set and a verification set, and the specific process of establishing the corresponding soil organic matter inversion machine learning model is as follows: dividing the EDA-processed sample set into a training set and a verification set according to the ratio of 7:3 by using a random method, and training the training sample set by using a machine learning method to obtain a corresponding soil organic matter inversion machine learning model.
Further, the machine learning method in the third step includes: random forests, artificial neural networks, support vector machines, and the like.
In the fourth step, the specific process of performing principal component analysis on the model input variable is as follows: and step two, main component analysis is carried out on the model input parameter set obtained in the step two, the interpretation capability of each main component on the data set is analyzed, and the main component capable of representing the main information of the parameter set is selected.
In the fourth step, the specific process of predicting the spatial distribution map of the residual soil organic matter in the whole research area by using the geostatistical method is as follows: and (3) taking the residual values of the two as main variables, taking the main components of the model parameter set as covariates, carrying out collaborative Kriging interpolation, and predicting the spatial distribution map of the residual error of the soil organic matters in the whole research area.
In the fifth step, the specific process of verifying the prediction result is as follows: and comparing the actual measurement value of the soil organic matters in the verification set with the soil organic matters predicted by the machine learning model and the statistical model combining machine learning, and evaluating the model prediction precision by using the decision coefficient, the root mean square error and the average absolute error.
It is another object of the present invention to provide a storage medium for receiving user input, the stored computer program causing an electronic device to execute the steps comprising:
Firstly, acquiring DEM (digital elevation model), optical and radar remote sensing data on a GEE (google EARTH ENGINE) remote sensing cloud platform, performing digital change and processing, and extracting a topography factor, a spectrum variable, a vegetation index, a color index, a brightness index and a texture index as characteristic variables;
step two, acquiring soil organic matter sample point data in a research area; extracting characteristic variable information of corresponding sampling points, carrying out exploratory data analysis on a data set, and screening out model input variables;
Dividing a sample set into a training set and a verification set, establishing a corresponding soil organic matter inversion machine learning model, and developing a preliminary drawing of the soil organic matter;
Subtracting the predicted soil organic matters from the actual measurement value of the trained soil organic matters to obtain residual errors of the two; carrying out principal component analysis on the model input variables, taking residual errors as principal variables, taking the obtained principal components as covariates, and predicting a spatial distribution map of residual errors of soil organic matters in the whole research area by using a geostatistical method;
Step five, adding the soil organic matter map predicted by the machine learning model and the residual map obtained by the ground statistics method to obtain a final soil organic matter prediction map; and simultaneously, verifying the prediction result by using the verification set data.
By combining all the technical schemes, the invention has the advantages and positive effects that: according to the method, the spatial autocorrelation of the soil organic matters and the nonlinear relation between the spatial autocorrelation and the environmental factors are comprehensively considered, so that the prediction accuracy of the soil organic matters is improved; meanwhile, by using satellite remote sensing data and products such as high-spatial resolution terrain, synthetic aperture radar, multispectral and the like to map soil organic matters of the broken land block scale, the ground sample acquisition workload is reduced, the prediction precision is improved, and accurate management of soil is facilitated.
Meanwhile, compared with the prior art, the invention has the following characteristics: comprehensively considering the spatial autocorrelation of soil organic matters and the nonlinear relation between the soil organic matters and environmental factors, combining the advantages of a machine learning and ground statistics method, utilizing the residual error of ground statistics prediction, and supplementing the main component of model parameters to compensate the preliminary result of machine learning prediction, solving the spatial mutation of the machine learning on the soil organic matter prediction, and improving the single model prediction precision;
satellite remote sensing data and products such as terrains, synthetic aperture radars, multispectral and the like with high spatial resolution are coupled, characteristic variables capable of representing spatial variation of soil organic matters are excavated, the spatial interpretation capacity of the soil organic matters is improved, and the fine drawing of the soil organic matters with field dimensions, which can be simultaneously applied to gentle or relief areas of terrains, is realized.
Drawings
Fig. 1 is a flowchart of a soil organic matter remote sensing mapping method combining machine learning and ground statistics according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a field scale soil organic matter remote sensing mapping process provided by an embodiment of the invention.
Fig. 3 is a half-square difference chart with residuals as main variables and feature parameter principal components as covariates according to an embodiment of the present invention.
Fig. 4 is a spatial distribution diagram of soil organic matter predicted by the inner mongolia large river bay farm random forest method.
Fig. 5 is a spatial distribution diagram of soil organic matter predicted by a combination of random forests of inner mongolia large river bay farms and a common kriging method.
Fig. 6 is a spatial distribution diagram of soil organic matter predicted by combining random forests of inner mongolia large river bay farms with a cooperative kriging method.
Detailed Description
The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Aiming at the problems existing in the prior art, the invention provides a soil organic matter remote sensing drawing method combining machine learning and ground statistics, and the invention is described in detail below with reference to the accompanying drawings.
The soil organic matter remote sensing drawing method combining machine learning and ground statistics provided by the invention can be implemented by other steps by those skilled in the art, and the soil organic matter remote sensing drawing method combining machine learning and ground statistics provided by the invention of fig. 1 is only one specific embodiment.
As shown in fig. 1, the soil organic matter remote sensing mapping method combining machine learning and ground statistics provided by the embodiment of the invention comprises the following steps:
S101: and acquiring remote sensing data on the remote sensing cloud platform, carrying out mathematical change and processing on the DEM and the remote sensing data, and extracting model characteristic variables.
S102: acquiring soil organic matter sample point data in a research area; remote sensing characteristic variable information corresponding to the sampling points is extracted, exploratory data analysis is carried out on the data set, and model input variables are screened out.
S103: dividing the sample set into a training set and a verification set, establishing a corresponding soil organic matter inversion machine learning model, and developing a preliminary drawing of the soil organic matter.
S104: subtracting the predicted soil organic matters from the actual measurement value of the trained soil organic matters to obtain residual errors of the two; and carrying out principal component analysis on the model input variables, taking the residual errors as principal variables, taking the obtained principal components as covariates, and predicting a spatial distribution map of the residual errors of the soil organic matters in the whole research area by using a geostatistical method.
S105: adding the soil organic matter map predicted by the machine learning model with the residual map obtained by the ground statistics method to obtain a final soil organic matter prediction map; and simultaneously, verifying the prediction result by using the verification set data.
In S101 provided by the embodiment of the present invention, a specific process of acquiring remote sensing data on a platform is: obtaining remote sensing data of DEM, sentinel-1 and Sentinel-2 on a Google EARTH ENGINE platform; wherein the DEM original spatial resolution is 30 meters, and resampling is 10 meters; the obtained Sentinel-1 remote sensing data is all images of all the research areas in the research year, the data comprises backscattering coefficients VV and VH under two polarization modes, and the data is preprocessed by orbit correction, GRD boundary noise elimination, thermal noise elimination, radiometric calibration, orthographic correction and the like, and has a spatial resolution of 10 meters; the obtained Sentinel-2 remote sensing data are all images of all the studied areas in the study year, the data are preprocessed by atmospheric correction and the like, cloud count is carried out on each image, only the data which are not covered by the cloud are reserved, and the reflectivity data of all wave bands are resampled to 10 meters.
In S101 provided by the embodiment of the present invention, mathematical changes and processing on DEM and remote sensing data are specifically:
Band combination and gray level co-occurrence matrix calculation are carried out on Sentinel-2 optical remote sensing data, and 5 groups of characteristic variables are extracted: (1) The surface reflectivity spectrum variables comprise reflectivity variables of 10 wave bands including blue, green, red edge 1, red edge 2, red edge 3, red edge 4, near infrared, short wave infrared 1 and short wave infrared 2; (2) The vegetation index variable refers to an enhanced vegetation index EVI calculated according to the band reflectivity, and can reflect the growth condition of plants; (3) The brightness index variable is an index for representing the ground brightness, and is calculated by the reflectivities of three wave bands of blue, green and red; (4) The color index variable comprises a color index and a red index, and is an index for representing the ground color; (5) The texture index variable is to calculate the gray level co-occurrence matrix of the enhanced vegetation index, the window size is set to 11 multiplied by 11, 10 texture variables are generated, the roughness of the earth surface, the structural information of the ground feature in the image and the relation between the ground feature and the surrounding environment can be reflected;
Combining and calculating VV and VH polarization data of Sentinel-1 radar data to obtain three groups of characteristic variables: (1) Polarized backscattering coefficient variables including VV, VH polarization; (2) Carrying out subtraction and division on the polarization of VV and VH by exponential variable; (3) The texture variables, the gray level co-occurrence matrix thereof was calculated for the two polarization data, respectively, and the window size was set to 11×11, each yielding 10 texture variables.
Three topography factor feature variables are calculated and extracted for DEM data: elevation, grade, and slope.
In S102 provided by the embodiment of the present invention, exploratory Data Analysis (EDA) performed on a data set specifically includes: analyzing the soil organic matters actually measured on the ground, and removing abnormal points of samples with larger deviation from the average value according to the average value and the root mean square error; extracting corresponding values on corresponding characteristic variable data by using ground sampling points to form a training sample set, calculating a correlation coefficient matrix of the training sample set, performing significance test, firstly removing variables which are not significantly related to soil organic matters, then analyzing the correlation coefficient matrix, finding out high-correlation variables in the characteristic variable set, removing variables which are less related to the soil organic matters, and forming a characteristic variable set input by a model.
In S103 provided by the embodiment of the present invention, the sample set is divided into a training set and a verification set, and the specific process of establishing the corresponding soil organic matter inversion machine learning model is as follows:
The sample set after exploratory data analysis is randomly divided into a training set and a verification set according to the proportion of 7:3, the training sample set is trained by adopting a random forest model, a corresponding soil organic matter inversion model is obtained, the inversion model is applied to a selected characteristic variable set, and a soil organic matter preliminary result diagram of the whole research area is inverted.
In S104 provided by the embodiment of the present invention, the specific process of predicting the spatial distribution map of the residual soil organic matter in the whole research area is:
Firstly, carrying out principal component analysis on the characteristic variable set selected in the S102, analyzing the interpretation capability of each principal component on the data set, and selecting the first 5 principal components (the interpretation capability is more than 80%) as covariates of geostatistical analysis; and subtracting the predicted soil organic matter value from the actual soil organic matter value of the training, and predicting a soil organic matter residual error spatial distribution map of the whole research area by using the residual error value of the actual soil organic matter value and the predicted soil organic matter value of the machine learning as main variables and the main components of the selected characteristic variable set as covariates.
In S105 provided by the embodiment of the present invention, the specific process of verifying the prediction result is: and (3) taking 30% of sample data randomly divided in the total sample as verification data, respectively verifying soil organic matters predicted by a random forest, a random forest combined common kriging method and a random forest combined synergistic kriging method, and evaluating model accuracy by using a determination coefficient (R 2), an average absolute error (MAE) and a Root Mean Square Error (RMSE), wherein the results are shown in table 1. The result shows that the prediction precision of the two methods of combining machine learning and statistics is higher than that of a machine learning algorithm, wherein the prediction precision of a random forest combining synergistic kriging method is highest, R 2 is improved from 0.50 to 0.55, and MAE and RMSE are reduced by 7.25% and 9.45% respectively. The machine learning combined statistical method is shown to reduce errors in soil organic matter prediction and improve prediction accuracy.
From three methods of predicting the spatial distribution map of soil organic matters (figure 3), it is seen that by compensating the residual error of the soil sample organic matters predicted by the machine learning by using the earth statistical method, the soil organic matters predicted by the machine learning combined with the earth statistical method are more continuous in space, and the spatial mutation of the prediction result of the machine learning method is relatively larger.
Table 1 comparison of soil organic matter accuracy predicted by three methods
Model R2 MAE(g/kg) RMSE(g/kg)
Random forest 0.50 4.00 4.87
Random forest combined common kriging 0.53 3.83 4.59
Random forest combined synergistic kriging 0.55 3.71 4.41
In the machine learning model, not only random forests, artificial neural networks, support vector machines and the like described in the invention are included, but also deep learning and other machine learning methods which are not described in the invention are applicable to the patent; the earth statistics method not only comprises common kriging and collaborative kriging, but also comprises interpolation methods such as regression kriging, inverse Distance Weighting (IDW), spline functions, pantelix, drift kriging and the like.
It should be noted that the embodiments of the present invention can be realized in hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or special purpose design hardware. Those of ordinary skill in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such as provided on a carrier medium such as a magnetic disk, CD or DVD-ROM, a programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The device of the present invention and its modules may be implemented by hardware circuitry, such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., as well as software executed by various types of processors, or by a combination of the above hardware circuitry and software, such as firmware.
The foregoing is merely illustrative of specific embodiments of the present invention, and the scope of the invention is not limited thereto, but any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention will be apparent to those skilled in the art within the scope of the present invention.

Claims (10)

1. The soil organic matter remote sensing drawing method combining machine learning and ground statistics is characterized by comprising the following steps of:
Firstly, obtaining DEM, optical and radar remote sensing data on a GEE remote sensing cloud platform, carrying out digital change and processing, and extracting a topography factor, a spectrum variable, a vegetation index, a color index, a brightness index and a texture index as characteristic variables;
step two, acquiring soil organic matter sample point data in a research area; extracting characteristic variable information of corresponding sampling points, carrying out exploratory data analysis on a data set, and screening out model input variables;
Dividing a sample set into a training set and a verification set, establishing a corresponding soil organic matter inversion machine learning model, and developing a preliminary drawing of the soil organic matter;
Subtracting the predicted soil organic matters from the actual measurement value of the trained soil organic matters to obtain residual errors of the two; carrying out principal component analysis on model input variables, taking residual errors as principal variables, taking the obtained principal components of characteristic variables as covariates, and predicting a spatial distribution map of residual errors of soil organic matters in the whole research area by using a geostatistical method;
Step five, adding the soil organic matter map predicted by the machine learning model and the residual map obtained by the ground statistics method to obtain a final soil organic matter prediction map; and simultaneously, verifying the prediction result by using the verification set data.
2. The method for remote sensing mapping of soil organic matter by combining machine learning and ground statistics as set forth in claim 1, wherein in the first step, the specific process of obtaining remote sensing data on a platform is as follows: and obtaining remote sensing data of the DEM, the Sentinel-1 and the Sentinel-2 on a Google EARTH ENGINE platform.
3. The method for remote sensing mapping of soil organic matter by combining machine learning and ground statistics as claimed in claim 1, wherein the mathematical change and processing process of the remote sensing data in the first step is as follows: carrying out mathematical transformation such as band combination and gray level co-occurrence matrix calculation on Sentinel-2 optical remote sensing data, and extracting 5 groups of characteristic variables: the method comprises the steps of (1) calculating an earth surface reflectivity spectrum variable, a vegetation index variable, a brightness index variable, a color index variable and a texture index variable, and calculating an annual maximum value, an annual minimum value, an annual average value and an annual root mean square error of the vegetation index variable, the brightness index variable and the color index variable;
Carrying out combination calculation and gray level co-occurrence matrix calculation on VV and VH polarization data of Sentinel-1 radar data, and extracting 2 groups of characteristic variables: a backscatter variable and a texture variable; three topography factor feature variables are extracted with DEM data: elevation, grade, and slope.
4. The method for remote sensing mapping of soil organic matter by combining machine learning and ground statistics according to claim 1, wherein the developing exploratory data analysis EDA on the data set in the second step is: removing abnormal points of the sample, calculating a correlation coefficient matrix, performing significance test, removing non-significant correlation variables with soil organic matters, and removing high correlation variables among characteristic variables.
5. The method for remote sensing mapping of soil organic matter by combining machine learning and ground statistics according to claim 1, wherein in the third step, a sample set is divided, and the process of establishing a corresponding soil organic matter inversion machine learning model is as follows: dividing the EDA-processed sample set into a training set and a verification set according to the ratio of 7:3 by using a random method, training the training sample set by using a machine learning method to obtain a corresponding soil organic matter inversion machine learning model, and developing the preliminary drawing of the soil organic matter.
6. The method for remote sensing mapping of soil organic matter by combining machine learning and ground statistics according to claim 4, wherein the machine learning method in the third step comprises: random forest, artificial neural network, support vector machine.
7. The method for remote sensing mapping of soil organic matter by combining machine learning and ground statistics as set forth in claim 1, wherein the process of performing principal component analysis on model input variables in the fourth step is as follows: and step two, main component analysis is carried out on the model input parameter set obtained in the step two, the interpretation capability of each main component on the data set is analyzed, and the main component capable of representing the main information of the parameter set is selected.
8. The method for remote sensing mapping of soil organic matter by combining machine learning and ground statistics according to claim 1, wherein the process of predicting the spatial distribution map of the residual soil organic matter in the whole research area by using the ground statistics method in the fourth step is as follows: and (3) taking the residual values of the two as main variables, taking the main components of the characteristic variable set as covariates, carrying out collaborative kriging interpolation, and predicting a soil organic matter residual spatial distribution map of the whole research area.
9. The method for remote sensing mapping of soil organic matter by combining machine learning and ground statistics as claimed in claim 1, wherein the verification process of the prediction result in the fifth step is as follows: and comparing the actual measurement value of the soil organic matters in the verification set with the soil organic matters predicted by the machine learning model and the statistical model combining machine learning, and evaluating the model prediction precision by using the decision coefficient, the root mean square error and the average absolute error.
10. A program storage medium receiving user input, the stored computer program causing an electronic device to perform any one of claims 1-9 comprising the steps of:
Firstly, acquiring remote sensing data on a remote sensing cloud platform, carrying out mathematical change and processing on the DEM and the remote sensing data, and extracting model feature variables;
Step two, acquiring soil organic matter sample point data in a research area; extracting remote sensing characteristic variable information corresponding to the sampling points, carrying out exploratory data analysis on the data set, and screening out model input variables;
dividing a sample set into a training set and a verification set, establishing a corresponding soil organic matter inversion machine learning model, and developing a preliminary drawing of the soil organic matter;
Subtracting the predicted soil organic matters from the actual measurement value of the trained soil organic matters to obtain residual errors of the two; carrying out principal component analysis on model input variables, taking residual errors as principal variables, taking the obtained principal components of characteristic variables as covariates, and predicting a spatial distribution map of residual errors of soil organic matters in the whole research area by using a geostatistical method;
Step five, adding the soil organic matter map predicted by the machine learning model and the residual map obtained by the ground statistics method to obtain a final soil organic matter prediction map; and simultaneously, verifying the prediction result by using the verification set data.
CN202111437930.XA 2021-11-29 2021-11-29 Soil organic matter remote sensing mapping method combining machine learning and ground statistics Active CN114140591B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111437930.XA CN114140591B (en) 2021-11-29 2021-11-29 Soil organic matter remote sensing mapping method combining machine learning and ground statistics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111437930.XA CN114140591B (en) 2021-11-29 2021-11-29 Soil organic matter remote sensing mapping method combining machine learning and ground statistics

Publications (2)

Publication Number Publication Date
CN114140591A CN114140591A (en) 2022-03-04
CN114140591B true CN114140591B (en) 2024-04-26

Family

ID=80389488

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111437930.XA Active CN114140591B (en) 2021-11-29 2021-11-29 Soil organic matter remote sensing mapping method combining machine learning and ground statistics

Country Status (1)

Country Link
CN (1) CN114140591B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114693915A (en) * 2022-03-29 2022-07-01 华中农业大学 Soil organic matter content prediction method based on smart phone image
CN116049768B (en) * 2023-04-03 2023-07-11 中国科学院空天信息创新研究院 Active and passive microwave soil moisture fusion algorithm based on generalized linear regression model
CN117290910A (en) * 2023-09-13 2023-12-26 爬山虎科技股份有限公司 Automatic soil map compiling method based on spatial interpolation algorithm

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107860889A (en) * 2017-09-22 2018-03-30 华南农业大学 The Forecasting Methodology and equipment of the soil organism
CN110046415A (en) * 2019-04-08 2019-07-23 中国科学院南京地理与湖泊研究所 A kind of soil organic matter content remote sensing dynamic playback method of space-time fining
WO2021226976A1 (en) * 2020-05-15 2021-11-18 安徽中科智能感知产业技术研究院有限责任公司 Soil available nutrient inversion method based on deep neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2019398104A1 (en) * 2018-12-11 2021-07-08 Climate Llc Mapping soil properties with satellite data using machine learning approaches

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107860889A (en) * 2017-09-22 2018-03-30 华南农业大学 The Forecasting Methodology and equipment of the soil organism
CN110046415A (en) * 2019-04-08 2019-07-23 中国科学院南京地理与湖泊研究所 A kind of soil organic matter content remote sensing dynamic playback method of space-time fining
WO2021226976A1 (en) * 2020-05-15 2021-11-18 安徽中科智能感知产业技术研究院有限责任公司 Soil available nutrient inversion method based on deep neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
珠海一号高光谱遥感的表层土壤有机质含量反演方法;孙浩然;赵志根;赵佳星;陈卫卫;;遥感信息;20200820(04);全文 *
集成土壤-环境关系与机器学习的干旱区土壤属性数字制图;张振华;丁建丽;王敬哲;葛翔宇;王瑾杰;田美玲;赵启东;;中国农业科学;20200201(第03期);全文 *

Also Published As

Publication number Publication date
CN114140591A (en) 2022-03-04

Similar Documents

Publication Publication Date Title
Han et al. Modeling maize above-ground biomass based on machine learning approaches using UAV remote-sensing data
CN114140591B (en) Soil organic matter remote sensing mapping method combining machine learning and ground statistics
Halme et al. Utility of hyperspectral compared to multispectral remote sensing data in estimating forest biomass and structure variables in Finnish boreal forest
Betbeder et al. Assimilation of LAI and dry biomass data from optical and SAR images into an agro-meteorological model to estimate soybean yield
Thapa et al. Potential of high-resolution ALOS–PALSAR mosaic texture for aboveground forest carbon tracking in tropical region
Guo et al. Integrating remote sensing information with crop model to monitor wheat growth and yield based on simulation zone partitioning
Dente et al. Assimilation of leaf area index derived from ASAR and MERIS data into CERES-Wheat model to map wheat yield
Manivasagam et al. Practices for upscaling crop simulation models from field scale to large regions
Whetton et al. Nonlinear parametric modelling to study how soil properties affect crop yields and NDVI
Abolafia‐Rosenzweig et al. Soil moisture data assimilation to estimate irrigation water use
Yang et al. Winter wheat SPAD estimation from UAV hyperspectral data using cluster-regression methods
Zhang et al. Estimating soil salinity with different fractional vegetation cover using remote sensing
Yadav et al. An improved inversion algorithm for spatio-temporal retrieval of soil moisture through modified water cloud model using C-band Sentinel-1A SAR data
Alebele et al. Estimation of crop yield from combined optical and SAR imagery using Gaussian kernel regression
CN110427995A (en) A kind of Bayes's soil moisture evaluation method based on multi- source Remote Sensing Data data
Wang et al. Microwave-based vegetation descriptors in the parameterization of water cloud model at L-band for soil moisture retrieval over croplands
Dehkordi et al. Yield gap analysis using remote sensing and modelling approaches: Wheat in the northwest of Iran
Acevedo-Opazo et al. A model for the spatial prediction of water status in vines (Vitis vinifera L.) using high resolution ancillary information
Qiao et al. Estimating maize LAI by exploring deep features of vegetation index map from UAV multispectral images
Rawat et al. Parameterization of the modified water cloud model (MWCM) using normalized difference vegetation index (NDVI) for winter wheat crop: a case study from Punjab, India
Liu et al. Winter wheat yield estimation based on assimilated Sentinel-2 images with the CERES-Wheat model
Zhao et al. Should phenological information be applied to predict agronomic traits across growth stages of winter wheat?
Ma et al. Retrieving the soil moisture in bare farmland areas using a modified Dubois model
Wang et al. The distributed CERES-Maize model with crop parameters determined through data assimilation assists in regional irrigation schedule optimization
Singh et al. Incorporation of first-order backscattered power in Water Cloud Model for improving the Leaf Area Index and Soil Moisture retrieval using dual-polarized Sentinel-1 SAR data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant