CN114140591B - Soil organic matter remote sensing mapping method combining machine learning and ground statistics - Google Patents
Soil organic matter remote sensing mapping method combining machine learning and ground statistics Download PDFInfo
- Publication number
- CN114140591B CN114140591B CN202111437930.XA CN202111437930A CN114140591B CN 114140591 B CN114140591 B CN 114140591B CN 202111437930 A CN202111437930 A CN 202111437930A CN 114140591 B CN114140591 B CN 114140591B
- Authority
- CN
- China
- Prior art keywords
- soil organic
- organic matter
- remote sensing
- machine learning
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 84
- 239000004016 soil organic matter Substances 0.000 title claims abstract description 83
- 238000010801 machine learning Methods 0.000 title claims abstract description 69
- 238000013507 mapping Methods 0.000 title claims description 17
- 239000002689 soil Substances 0.000 claims abstract description 89
- 230000008859 change Effects 0.000 claims abstract description 13
- 238000011985 exploratory data analysis Methods 0.000 claims abstract description 13
- 230000003287 optical effect Effects 0.000 claims abstract description 13
- 238000012545 processing Methods 0.000 claims abstract description 9
- 238000005259 measurement Methods 0.000 claims abstract description 8
- 238000005070 sampling Methods 0.000 claims abstract description 8
- 238000012216 screening Methods 0.000 claims abstract description 5
- 238000011160 research Methods 0.000 claims description 25
- 238000012795 verification Methods 0.000 claims description 19
- 238000012549 training Methods 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 17
- 239000011159 matrix material Substances 0.000 claims description 15
- 238000007637 random forest analysis Methods 0.000 claims description 15
- 238000012876 topography Methods 0.000 claims description 13
- 230000010287 polarization Effects 0.000 claims description 8
- 238000000513 principal component analysis Methods 0.000 claims description 8
- 238000001228 spectrum Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000002310 reflectometry Methods 0.000 claims description 7
- 238000004458 analytical method Methods 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 230000002159 abnormal effect Effects 0.000 claims description 3
- 238000013179 statistical model Methods 0.000 claims description 3
- 238000012706 support-vector machine Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 2
- 230000009466 transformation Effects 0.000 claims description 2
- 230000007613 environmental effect Effects 0.000 description 14
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000012937 correction Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 230000035772 mutation Effects 0.000 description 3
- 238000007619 statistical method Methods 0.000 description 3
- 230000002195 synergetic effect Effects 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012417 linear regression Methods 0.000 description 2
- 238000012952 Resampling Methods 0.000 description 1
- 230000002301 combined effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000035558 fertility Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 235000015097 nutrients Nutrition 0.000 description 1
- 238000004181 pedogenesis Methods 0.000 description 1
- 230000000149 penetrating effect Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 230000003746 surface roughness Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/05—Geographic models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10032—Satellite or aerial image; Remote sensing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10032—Satellite or aerial image; Remote sensing
- G06T2207/10044—Radar image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30181—Earth observation
- G06T2207/30188—Vegetation; Agriculture
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A40/00—Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
- Y02A40/10—Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Geometry (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Computer Graphics (AREA)
- Remote Sensing (AREA)
- Investigating Or Analysing Materials By Optical Means (AREA)
Abstract
The invention belongs to the technical field of soil remote sensing drawing and precise agriculture, and discloses a soil organic matter remote sensing drawing method combining machine learning and ground statistics, wherein DEM, optical and radar remote sensing data are acquired on a GEE remote sensing cloud platform, and characteristic variables are acquired through mathematical change and processing; extracting characteristic variable information corresponding to the sampling points, and carrying out exploratory data analysis and screening on a sample parameter set to obtain model input variables; dividing a sample set, establishing a soil organic matter inversion machine learning model and developing a preliminary drawing; then calculating residual errors of the actual measurement data and the prediction data of the soil organic matters, taking the residual errors as main variables, taking main components of characteristic variables as covariates, and predicting a spatial distribution map of the residual errors of the soil organic matters by using a ground statistics method; adding the soil organic matter map predicted by the machine learning model to obtain a final soil organic matter prediction map; and simultaneously, verifying the result. The invention can carry out fine and accurate drawing on soil organic matters.
Description
Technical Field
The invention belongs to the technical field of soil remote sensing drawing and precise agriculture, and particularly relates to a soil organic matter remote sensing drawing method combining machine learning and ground statistics.
Background
At present, soil organic matters are an important index for evaluating soil quality, are decisive factors for influencing soil fertility and crop yield, and have important significance for soil nutrient element circulation and agricultural sustainable development. Therefore, there is a need for high resolution and high precision graphics of soil organic matter in developing agricultural precision management.
The farmland soil organic matter influencing factors are complex and closely related to factors such as topography, climate, matrix, vegetation, human activities and the like. The classical soil landscape relation theory is to estimate the spatial variation of the soil organic matters which are difficult to be observed in a spatially continuous way by correlating the soil organic matters with the soil-forming environmental factors. The space drawing method represented by geostatistics (such as Kriging, regression Kriging, cooperation Kriging and the like) is a main means for drawing large-scale soil organic matters due to simplicity, remarkable interpolation effect and good space autocorrelation. However, for small-range scattered plough map spots, the environmental gradients of climate, topography, matrix, vegetation and the like are small, the spatial synergy degree with soil organic matters is low, and the soil-forming environmental variable data with medium and low resolution can not or only partially reflect the soil spatial change, so that the high-precision drawing of the soil organic matters with field dimensions is difficult. Meanwhile, the soil statistics method fails to consider the complex nonlinear relation between the soil attribute and the environmental factor, and limits the application of the soil statistics method in areas where the soil organic matters such as complex topography and topography possibly generate severe changes.
The development of satellite remote sensing technology provides a stable data source for acquiring global earth surface environment parameters. The satellite remote sensing image has the characteristics of continuous space, strong timeliness, high resolution, easy acquisition and the like, the contained surface spectrum information can be used for inverting the attribute of the ground object and classifying the ground object, and the extracted multiple remote sensing indexes can be used for quantitatively or qualitatively expressing the vegetation growth condition, so that the current remote sensing information is gradually applied to soil mapping. The Sentinel-2A, B satellite system is an optical multispectral imager emitted by the European space agency, and can be used for carrying out ground monitoring on 13 wave bands such as visible light, near infrared, short wave infrared and infrared with a period of 5 days, wherein the spatial resolution of the visible light near infrared wave band can reach 10 meters, and the short wave infrared reaches 20 meters. Compared with Landsat satellites in the United states, the Sentinel-2 satellite system has richer spectral bands, shorter revisitation period and higher spatial resolution, so that the Sentinel-2 remote sensing image has more excellent performance in land utilization change, ecological environment evolution monitoring and vegetation crop growth monitoring of field scale. The Sentinel-1 synthetic aperture radar based on the active remote sensing technology is different from passive optical remote sensing, has stronger penetrating power, can effectively avoid the influence of cloud, rain and fog, and can acquire the information of the dielectric constant of the earth surface, the physical characteristics of soil, the geometric shape and the like of the target by measuring the polarized scattering characteristics of the target. Research shows that parameters such as polarization characteristics, backscattering coefficients and the like of SAR have high correlation with soil properties, and the method has successful application in aspects such as soil moisture, soil texture, surface roughness estimation and the like. Application to digital soil mapping in combination with optical and SAR data will also help to improve model predictive power. However, in the past research, optical remote sensing data such as multispectral, hyperspectral and the like are too much relied on, and the research of the synthetic aperture radar in soil organic matter mapping is relatively less.
The prediction model between the auxiliary environment variable and the soil attribute established by the machine learning algorithm is increasingly used for regional soil attribute spatial prediction, and the machine learning algorithm has great potential in soil organic matter spatial prediction research due to good generalization capability and predictive capability in solving high-dimensional problems, nonlinear problems and the like. However, the machine learning model cannot consider the spatial autocorrelation relation among samples when predicting the soil properties, so that the combination of the machine learning and the geostatistical method for developing the soil organic matter drawing can be used for improving the prediction precision by considering the spatial autocorrelation relation of the soil properties on the basis of ensuring the prediction precision.
The prior related art is as follows:
In the first related art, the existing mainstream soil organic matter mapping method is a soil-landscape model method based on soil genetics as a theoretical basis, and the theoretical method considers that: soil is the product of the combined effects of climate, biology, topography matrix, and time. Therefore, in a certain soil formation environment, soil properties corresponding to the environment are necessarily generated. By sampling the finite points, a relation model, such as linear regression, fuzzy inference classification tree and the like, between the environmental factors and the soil organic matters is established, and then the model is applied to an environmental factor database of a research area to predict and map the soil organic matters of the research area. The common methods include collaborative kriging, regression kriging and other statistical models and linear regression, fuzzy mathematics, expert knowledge, linear discriminant analysis, generalized linear models and other mathematical models.
In the related art, the method based on remote sensing data and machine learning utilizes the means of machine learning and space data mining, such as an artificial neural network model, a Bayesian model, a decision tree, a random forest and the like to establish the relationship between the remote sensing spectrum signal and the inversion environmental elements thereof and the spatial change of soil organic matters, and presumes the spatial distribution of the soil organic matters according to the relationship. The machine learning method can effectively solve the problem of nonlinearity between soil and environmental factors, and is a method widely applied so far.
Through the above analysis, the problems and defects existing in the prior art are as follows:
(1) The existing soil-landscape model method is only suitable for large-scale soil organic matter drawing under obvious topography fluctuation or complex environmental conditions, needs a large number of sample points, and is not suitable for fine drawing of crushed farmlands under long-term artificial influence; meanwhile, the complex nonlinear relation between the soil organic matters and the environmental factors cannot be considered, and the application of the complex nonlinear relation in the area where the soil organic matters possibly generate severe change under the complex environmental conditions is limited.
(2) In the prior art, when a machine learning algorithm and remote sensing data are utilized to map soil organic matters, the spatial autocorrelation of the soil organic matters is not considered, so that the spatial local feature description of the soil organic matters is not fine enough, and the phenomenon of spatial mutation of the soil organic matters which is inconsistent with reality is caused.
The difficulty of solving the problems and the defects is as follows: for scattered map spots under artificial influence, the common climate, topography, matrix, vegetation and other element environments have small gradient, the space synergy degree with soil organic matters is low, the soil-forming environment variable data with medium and low resolution can not or only reflect the soil space change very little, and the farmland soil organic matters with field scale are difficult to map with high precision. There is a need for feature variables and new methods that reflect the spatial variation of soil organic matter at the field scale.
The meaning of solving the problems and the defects is as follows: the invention can be simultaneously applied to mild or complex terrain conditions, reduces the workload of acquiring ground samples, improves the soil organic matter drawing precision of a large-scale field scale, and provides support for accurate management of crops and sustainable development of agriculture.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides a soil organic matter remote sensing drawing method combining machine learning and ground statistics. The invention relates to a soil organic matter remote sensing drawing method combining machine learning and ground statistics. According to the invention, by utilizing high-spatial-resolution DEM, sentinel-1 radar remote sensing data and Sentinel-2 optical remote sensing data and combining a ground statistics method and a machine learning algorithm, the nonlinear problem between soil organic matters and environmental elements is solved by fully utilizing machine learning, and simultaneously, the autocorrelation of the soil organic matters in space is considered, so that the drawing precision of the soil organic matters is improved, and the fine drawing of the soil organic matters in the field scale is realized.
The invention is realized in such a way that a soil organic matter remote sensing drawing method combining machine learning and ground statistics is realized, and the soil organic matter remote sensing drawing method combining machine learning and ground statistics comprises the following steps:
Firstly, acquiring DEM (digital elevation model), optical and radar remote sensing data on a GEE (google EARTH ENGINE) remote sensing cloud platform, performing digital change and processing, and extracting a topography factor, a spectrum variable, a vegetation index, a color index, a brightness index and a texture index as characteristic variables;
step two, acquiring soil organic matter sample point data in a research area; extracting characteristic variable information of corresponding sampling points, carrying out exploratory data analysis on a data set, and screening out model input variables;
Dividing a sample set into a training set and a verification set, establishing a corresponding soil organic matter inversion machine learning model, and developing a preliminary drawing of the soil organic matter;
Subtracting the predicted soil organic matters from the actual measurement value of the trained soil organic matters to obtain residual errors of the two; carrying out principal component analysis on the model input variables, taking residual errors as principal variables, taking the obtained principal components as covariates, and predicting a spatial distribution map of residual errors of soil organic matters in the whole research area by using a geostatistical method;
Step five, adding the soil organic matter map predicted by the machine learning model and the residual map obtained by the ground statistics method to obtain a final soil organic matter prediction map; and simultaneously, verifying the prediction result by using the verification set data.
Further, in the first step, the specific process of obtaining the remote sensing data on the platform is as follows: obtaining remote sensing data of DEM, sentinel-1 and Sentinel-2 on a Google EARTH ENGINE platform;
Further, in the first step, the specific process of mathematical change and processing of the remote sensing data is as follows: carrying out mathematical transformation such as band combination and gray level co-occurrence matrix calculation on Sentinel-2 optical remote sensing data, and extracting 5 groups of characteristic variables: the method comprises the steps of (1) calculating an earth surface reflectivity spectrum variable, a vegetation index variable, a brightness index variable, a color index variable and a texture index variable, and calculating an annual maximum value, an annual minimum value, an annual average value and an annual root mean square error of the vegetation index variable, the brightness index variable and the color index variable;
carrying out combination calculation and gray level co-occurrence matrix calculation on VV and VH polarization data of Sentinel-1 radar data, and extracting 2 groups of characteristic variables: a backscatter variable and a texture variable;
three topography factor feature variables are extracted with DEM data: elevation, grade, and slope.
Further, in the second step, the exploratory data analysis EDA performed on the data set is specifically: removing abnormal points of the sample, calculating a correlation coefficient matrix, performing significance test, removing non-significant correlation variables with soil organic matters, and removing high correlation variables among characteristic variables.
Further, in the second step, the sample set is divided into a training set and a verification set, and the specific process of establishing the corresponding soil organic matter inversion machine learning model is as follows: dividing the EDA-processed sample set into a training set and a verification set according to the ratio of 7:3 by using a random method, and training the training sample set by using a machine learning method to obtain a corresponding soil organic matter inversion machine learning model.
Further, the machine learning method in the third step includes: random forests, artificial neural networks, support vector machines, and the like.
In the fourth step, the specific process of performing principal component analysis on the model input variable is as follows: and step two, main component analysis is carried out on the model input parameter set obtained in the step two, the interpretation capability of each main component on the data set is analyzed, and the main component capable of representing the main information of the parameter set is selected.
In the fourth step, the specific process of predicting the spatial distribution map of the residual soil organic matter in the whole research area by using the geostatistical method is as follows: and (3) taking the residual values of the two as main variables, taking the main components of the model parameter set as covariates, carrying out collaborative Kriging interpolation, and predicting the spatial distribution map of the residual error of the soil organic matters in the whole research area.
In the fifth step, the specific process of verifying the prediction result is as follows: and comparing the actual measurement value of the soil organic matters in the verification set with the soil organic matters predicted by the machine learning model and the statistical model combining machine learning, and evaluating the model prediction precision by using the decision coefficient, the root mean square error and the average absolute error.
It is another object of the present invention to provide a storage medium for receiving user input, the stored computer program causing an electronic device to execute the steps comprising:
Firstly, acquiring DEM (digital elevation model), optical and radar remote sensing data on a GEE (google EARTH ENGINE) remote sensing cloud platform, performing digital change and processing, and extracting a topography factor, a spectrum variable, a vegetation index, a color index, a brightness index and a texture index as characteristic variables;
step two, acquiring soil organic matter sample point data in a research area; extracting characteristic variable information of corresponding sampling points, carrying out exploratory data analysis on a data set, and screening out model input variables;
Dividing a sample set into a training set and a verification set, establishing a corresponding soil organic matter inversion machine learning model, and developing a preliminary drawing of the soil organic matter;
Subtracting the predicted soil organic matters from the actual measurement value of the trained soil organic matters to obtain residual errors of the two; carrying out principal component analysis on the model input variables, taking residual errors as principal variables, taking the obtained principal components as covariates, and predicting a spatial distribution map of residual errors of soil organic matters in the whole research area by using a geostatistical method;
Step five, adding the soil organic matter map predicted by the machine learning model and the residual map obtained by the ground statistics method to obtain a final soil organic matter prediction map; and simultaneously, verifying the prediction result by using the verification set data.
By combining all the technical schemes, the invention has the advantages and positive effects that: according to the method, the spatial autocorrelation of the soil organic matters and the nonlinear relation between the spatial autocorrelation and the environmental factors are comprehensively considered, so that the prediction accuracy of the soil organic matters is improved; meanwhile, by using satellite remote sensing data and products such as high-spatial resolution terrain, synthetic aperture radar, multispectral and the like to map soil organic matters of the broken land block scale, the ground sample acquisition workload is reduced, the prediction precision is improved, and accurate management of soil is facilitated.
Meanwhile, compared with the prior art, the invention has the following characteristics: comprehensively considering the spatial autocorrelation of soil organic matters and the nonlinear relation between the soil organic matters and environmental factors, combining the advantages of a machine learning and ground statistics method, utilizing the residual error of ground statistics prediction, and supplementing the main component of model parameters to compensate the preliminary result of machine learning prediction, solving the spatial mutation of the machine learning on the soil organic matter prediction, and improving the single model prediction precision;
satellite remote sensing data and products such as terrains, synthetic aperture radars, multispectral and the like with high spatial resolution are coupled, characteristic variables capable of representing spatial variation of soil organic matters are excavated, the spatial interpretation capacity of the soil organic matters is improved, and the fine drawing of the soil organic matters with field dimensions, which can be simultaneously applied to gentle or relief areas of terrains, is realized.
Drawings
Fig. 1 is a flowchart of a soil organic matter remote sensing mapping method combining machine learning and ground statistics according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a field scale soil organic matter remote sensing mapping process provided by an embodiment of the invention.
Fig. 3 is a half-square difference chart with residuals as main variables and feature parameter principal components as covariates according to an embodiment of the present invention.
Fig. 4 is a spatial distribution diagram of soil organic matter predicted by the inner mongolia large river bay farm random forest method.
Fig. 5 is a spatial distribution diagram of soil organic matter predicted by a combination of random forests of inner mongolia large river bay farms and a common kriging method.
Fig. 6 is a spatial distribution diagram of soil organic matter predicted by combining random forests of inner mongolia large river bay farms with a cooperative kriging method.
Detailed Description
The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Aiming at the problems existing in the prior art, the invention provides a soil organic matter remote sensing drawing method combining machine learning and ground statistics, and the invention is described in detail below with reference to the accompanying drawings.
The soil organic matter remote sensing drawing method combining machine learning and ground statistics provided by the invention can be implemented by other steps by those skilled in the art, and the soil organic matter remote sensing drawing method combining machine learning and ground statistics provided by the invention of fig. 1 is only one specific embodiment.
As shown in fig. 1, the soil organic matter remote sensing mapping method combining machine learning and ground statistics provided by the embodiment of the invention comprises the following steps:
S101: and acquiring remote sensing data on the remote sensing cloud platform, carrying out mathematical change and processing on the DEM and the remote sensing data, and extracting model characteristic variables.
S102: acquiring soil organic matter sample point data in a research area; remote sensing characteristic variable information corresponding to the sampling points is extracted, exploratory data analysis is carried out on the data set, and model input variables are screened out.
S103: dividing the sample set into a training set and a verification set, establishing a corresponding soil organic matter inversion machine learning model, and developing a preliminary drawing of the soil organic matter.
S104: subtracting the predicted soil organic matters from the actual measurement value of the trained soil organic matters to obtain residual errors of the two; and carrying out principal component analysis on the model input variables, taking the residual errors as principal variables, taking the obtained principal components as covariates, and predicting a spatial distribution map of the residual errors of the soil organic matters in the whole research area by using a geostatistical method.
S105: adding the soil organic matter map predicted by the machine learning model with the residual map obtained by the ground statistics method to obtain a final soil organic matter prediction map; and simultaneously, verifying the prediction result by using the verification set data.
In S101 provided by the embodiment of the present invention, a specific process of acquiring remote sensing data on a platform is: obtaining remote sensing data of DEM, sentinel-1 and Sentinel-2 on a Google EARTH ENGINE platform; wherein the DEM original spatial resolution is 30 meters, and resampling is 10 meters; the obtained Sentinel-1 remote sensing data is all images of all the research areas in the research year, the data comprises backscattering coefficients VV and VH under two polarization modes, and the data is preprocessed by orbit correction, GRD boundary noise elimination, thermal noise elimination, radiometric calibration, orthographic correction and the like, and has a spatial resolution of 10 meters; the obtained Sentinel-2 remote sensing data are all images of all the studied areas in the study year, the data are preprocessed by atmospheric correction and the like, cloud count is carried out on each image, only the data which are not covered by the cloud are reserved, and the reflectivity data of all wave bands are resampled to 10 meters.
In S101 provided by the embodiment of the present invention, mathematical changes and processing on DEM and remote sensing data are specifically:
Band combination and gray level co-occurrence matrix calculation are carried out on Sentinel-2 optical remote sensing data, and 5 groups of characteristic variables are extracted: (1) The surface reflectivity spectrum variables comprise reflectivity variables of 10 wave bands including blue, green, red edge 1, red edge 2, red edge 3, red edge 4, near infrared, short wave infrared 1 and short wave infrared 2; (2) The vegetation index variable refers to an enhanced vegetation index EVI calculated according to the band reflectivity, and can reflect the growth condition of plants; (3) The brightness index variable is an index for representing the ground brightness, and is calculated by the reflectivities of three wave bands of blue, green and red; (4) The color index variable comprises a color index and a red index, and is an index for representing the ground color; (5) The texture index variable is to calculate the gray level co-occurrence matrix of the enhanced vegetation index, the window size is set to 11 multiplied by 11, 10 texture variables are generated, the roughness of the earth surface, the structural information of the ground feature in the image and the relation between the ground feature and the surrounding environment can be reflected;
Combining and calculating VV and VH polarization data of Sentinel-1 radar data to obtain three groups of characteristic variables: (1) Polarized backscattering coefficient variables including VV, VH polarization; (2) Carrying out subtraction and division on the polarization of VV and VH by exponential variable; (3) The texture variables, the gray level co-occurrence matrix thereof was calculated for the two polarization data, respectively, and the window size was set to 11×11, each yielding 10 texture variables.
Three topography factor feature variables are calculated and extracted for DEM data: elevation, grade, and slope.
In S102 provided by the embodiment of the present invention, exploratory Data Analysis (EDA) performed on a data set specifically includes: analyzing the soil organic matters actually measured on the ground, and removing abnormal points of samples with larger deviation from the average value according to the average value and the root mean square error; extracting corresponding values on corresponding characteristic variable data by using ground sampling points to form a training sample set, calculating a correlation coefficient matrix of the training sample set, performing significance test, firstly removing variables which are not significantly related to soil organic matters, then analyzing the correlation coefficient matrix, finding out high-correlation variables in the characteristic variable set, removing variables which are less related to the soil organic matters, and forming a characteristic variable set input by a model.
In S103 provided by the embodiment of the present invention, the sample set is divided into a training set and a verification set, and the specific process of establishing the corresponding soil organic matter inversion machine learning model is as follows:
The sample set after exploratory data analysis is randomly divided into a training set and a verification set according to the proportion of 7:3, the training sample set is trained by adopting a random forest model, a corresponding soil organic matter inversion model is obtained, the inversion model is applied to a selected characteristic variable set, and a soil organic matter preliminary result diagram of the whole research area is inverted.
In S104 provided by the embodiment of the present invention, the specific process of predicting the spatial distribution map of the residual soil organic matter in the whole research area is:
Firstly, carrying out principal component analysis on the characteristic variable set selected in the S102, analyzing the interpretation capability of each principal component on the data set, and selecting the first 5 principal components (the interpretation capability is more than 80%) as covariates of geostatistical analysis; and subtracting the predicted soil organic matter value from the actual soil organic matter value of the training, and predicting a soil organic matter residual error spatial distribution map of the whole research area by using the residual error value of the actual soil organic matter value and the predicted soil organic matter value of the machine learning as main variables and the main components of the selected characteristic variable set as covariates.
In S105 provided by the embodiment of the present invention, the specific process of verifying the prediction result is: and (3) taking 30% of sample data randomly divided in the total sample as verification data, respectively verifying soil organic matters predicted by a random forest, a random forest combined common kriging method and a random forest combined synergistic kriging method, and evaluating model accuracy by using a determination coefficient (R 2), an average absolute error (MAE) and a Root Mean Square Error (RMSE), wherein the results are shown in table 1. The result shows that the prediction precision of the two methods of combining machine learning and statistics is higher than that of a machine learning algorithm, wherein the prediction precision of a random forest combining synergistic kriging method is highest, R 2 is improved from 0.50 to 0.55, and MAE and RMSE are reduced by 7.25% and 9.45% respectively. The machine learning combined statistical method is shown to reduce errors in soil organic matter prediction and improve prediction accuracy.
From three methods of predicting the spatial distribution map of soil organic matters (figure 3), it is seen that by compensating the residual error of the soil sample organic matters predicted by the machine learning by using the earth statistical method, the soil organic matters predicted by the machine learning combined with the earth statistical method are more continuous in space, and the spatial mutation of the prediction result of the machine learning method is relatively larger.
Table 1 comparison of soil organic matter accuracy predicted by three methods
Model | R2 | MAE(g/kg) | RMSE(g/kg) |
Random forest | 0.50 | 4.00 | 4.87 |
Random forest combined common kriging | 0.53 | 3.83 | 4.59 |
Random forest combined synergistic kriging | 0.55 | 3.71 | 4.41 |
In the machine learning model, not only random forests, artificial neural networks, support vector machines and the like described in the invention are included, but also deep learning and other machine learning methods which are not described in the invention are applicable to the patent; the earth statistics method not only comprises common kriging and collaborative kriging, but also comprises interpolation methods such as regression kriging, inverse Distance Weighting (IDW), spline functions, pantelix, drift kriging and the like.
It should be noted that the embodiments of the present invention can be realized in hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or special purpose design hardware. Those of ordinary skill in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such as provided on a carrier medium such as a magnetic disk, CD or DVD-ROM, a programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The device of the present invention and its modules may be implemented by hardware circuitry, such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., as well as software executed by various types of processors, or by a combination of the above hardware circuitry and software, such as firmware.
The foregoing is merely illustrative of specific embodiments of the present invention, and the scope of the invention is not limited thereto, but any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention will be apparent to those skilled in the art within the scope of the present invention.
Claims (10)
1. The soil organic matter remote sensing drawing method combining machine learning and ground statistics is characterized by comprising the following steps of:
Firstly, obtaining DEM, optical and radar remote sensing data on a GEE remote sensing cloud platform, carrying out digital change and processing, and extracting a topography factor, a spectrum variable, a vegetation index, a color index, a brightness index and a texture index as characteristic variables;
step two, acquiring soil organic matter sample point data in a research area; extracting characteristic variable information of corresponding sampling points, carrying out exploratory data analysis on a data set, and screening out model input variables;
Dividing a sample set into a training set and a verification set, establishing a corresponding soil organic matter inversion machine learning model, and developing a preliminary drawing of the soil organic matter;
Subtracting the predicted soil organic matters from the actual measurement value of the trained soil organic matters to obtain residual errors of the two; carrying out principal component analysis on model input variables, taking residual errors as principal variables, taking the obtained principal components of characteristic variables as covariates, and predicting a spatial distribution map of residual errors of soil organic matters in the whole research area by using a geostatistical method;
Step five, adding the soil organic matter map predicted by the machine learning model and the residual map obtained by the ground statistics method to obtain a final soil organic matter prediction map; and simultaneously, verifying the prediction result by using the verification set data.
2. The method for remote sensing mapping of soil organic matter by combining machine learning and ground statistics as set forth in claim 1, wherein in the first step, the specific process of obtaining remote sensing data on a platform is as follows: and obtaining remote sensing data of the DEM, the Sentinel-1 and the Sentinel-2 on a Google EARTH ENGINE platform.
3. The method for remote sensing mapping of soil organic matter by combining machine learning and ground statistics as claimed in claim 1, wherein the mathematical change and processing process of the remote sensing data in the first step is as follows: carrying out mathematical transformation such as band combination and gray level co-occurrence matrix calculation on Sentinel-2 optical remote sensing data, and extracting 5 groups of characteristic variables: the method comprises the steps of (1) calculating an earth surface reflectivity spectrum variable, a vegetation index variable, a brightness index variable, a color index variable and a texture index variable, and calculating an annual maximum value, an annual minimum value, an annual average value and an annual root mean square error of the vegetation index variable, the brightness index variable and the color index variable;
Carrying out combination calculation and gray level co-occurrence matrix calculation on VV and VH polarization data of Sentinel-1 radar data, and extracting 2 groups of characteristic variables: a backscatter variable and a texture variable; three topography factor feature variables are extracted with DEM data: elevation, grade, and slope.
4. The method for remote sensing mapping of soil organic matter by combining machine learning and ground statistics according to claim 1, wherein the developing exploratory data analysis EDA on the data set in the second step is: removing abnormal points of the sample, calculating a correlation coefficient matrix, performing significance test, removing non-significant correlation variables with soil organic matters, and removing high correlation variables among characteristic variables.
5. The method for remote sensing mapping of soil organic matter by combining machine learning and ground statistics according to claim 1, wherein in the third step, a sample set is divided, and the process of establishing a corresponding soil organic matter inversion machine learning model is as follows: dividing the EDA-processed sample set into a training set and a verification set according to the ratio of 7:3 by using a random method, training the training sample set by using a machine learning method to obtain a corresponding soil organic matter inversion machine learning model, and developing the preliminary drawing of the soil organic matter.
6. The method for remote sensing mapping of soil organic matter by combining machine learning and ground statistics according to claim 4, wherein the machine learning method in the third step comprises: random forest, artificial neural network, support vector machine.
7. The method for remote sensing mapping of soil organic matter by combining machine learning and ground statistics as set forth in claim 1, wherein the process of performing principal component analysis on model input variables in the fourth step is as follows: and step two, main component analysis is carried out on the model input parameter set obtained in the step two, the interpretation capability of each main component on the data set is analyzed, and the main component capable of representing the main information of the parameter set is selected.
8. The method for remote sensing mapping of soil organic matter by combining machine learning and ground statistics according to claim 1, wherein the process of predicting the spatial distribution map of the residual soil organic matter in the whole research area by using the ground statistics method in the fourth step is as follows: and (3) taking the residual values of the two as main variables, taking the main components of the characteristic variable set as covariates, carrying out collaborative kriging interpolation, and predicting a soil organic matter residual spatial distribution map of the whole research area.
9. The method for remote sensing mapping of soil organic matter by combining machine learning and ground statistics as claimed in claim 1, wherein the verification process of the prediction result in the fifth step is as follows: and comparing the actual measurement value of the soil organic matters in the verification set with the soil organic matters predicted by the machine learning model and the statistical model combining machine learning, and evaluating the model prediction precision by using the decision coefficient, the root mean square error and the average absolute error.
10. A program storage medium receiving user input, the stored computer program causing an electronic device to perform any one of claims 1-9 comprising the steps of:
Firstly, acquiring remote sensing data on a remote sensing cloud platform, carrying out mathematical change and processing on the DEM and the remote sensing data, and extracting model feature variables;
Step two, acquiring soil organic matter sample point data in a research area; extracting remote sensing characteristic variable information corresponding to the sampling points, carrying out exploratory data analysis on the data set, and screening out model input variables;
dividing a sample set into a training set and a verification set, establishing a corresponding soil organic matter inversion machine learning model, and developing a preliminary drawing of the soil organic matter;
Subtracting the predicted soil organic matters from the actual measurement value of the trained soil organic matters to obtain residual errors of the two; carrying out principal component analysis on model input variables, taking residual errors as principal variables, taking the obtained principal components of characteristic variables as covariates, and predicting a spatial distribution map of residual errors of soil organic matters in the whole research area by using a geostatistical method;
Step five, adding the soil organic matter map predicted by the machine learning model and the residual map obtained by the ground statistics method to obtain a final soil organic matter prediction map; and simultaneously, verifying the prediction result by using the verification set data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111437930.XA CN114140591B (en) | 2021-11-29 | 2021-11-29 | Soil organic matter remote sensing mapping method combining machine learning and ground statistics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111437930.XA CN114140591B (en) | 2021-11-29 | 2021-11-29 | Soil organic matter remote sensing mapping method combining machine learning and ground statistics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114140591A CN114140591A (en) | 2022-03-04 |
CN114140591B true CN114140591B (en) | 2024-04-26 |
Family
ID=80389488
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111437930.XA Active CN114140591B (en) | 2021-11-29 | 2021-11-29 | Soil organic matter remote sensing mapping method combining machine learning and ground statistics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114140591B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114693915A (en) * | 2022-03-29 | 2022-07-01 | 华中农业大学 | Soil organic matter content prediction method based on smart phone image |
CN116049768B (en) * | 2023-04-03 | 2023-07-11 | 中国科学院空天信息创新研究院 | Active and passive microwave soil moisture fusion algorithm based on generalized linear regression model |
CN117290910A (en) * | 2023-09-13 | 2023-12-26 | 爬山虎科技股份有限公司 | Automatic soil map compiling method based on spatial interpolation algorithm |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107860889A (en) * | 2017-09-22 | 2018-03-30 | 华南农业大学 | The Forecasting Methodology and equipment of the soil organism |
CN110046415A (en) * | 2019-04-08 | 2019-07-23 | 中国科学院南京地理与湖泊研究所 | A kind of soil organic matter content remote sensing dynamic playback method of space-time fining |
WO2021226976A1 (en) * | 2020-05-15 | 2021-11-18 | 安徽中科智能感知产业技术研究院有限责任公司 | Soil available nutrient inversion method based on deep neural network |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2019398104A1 (en) * | 2018-12-11 | 2021-07-08 | Climate Llc | Mapping soil properties with satellite data using machine learning approaches |
-
2021
- 2021-11-29 CN CN202111437930.XA patent/CN114140591B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107860889A (en) * | 2017-09-22 | 2018-03-30 | 华南农业大学 | The Forecasting Methodology and equipment of the soil organism |
CN110046415A (en) * | 2019-04-08 | 2019-07-23 | 中国科学院南京地理与湖泊研究所 | A kind of soil organic matter content remote sensing dynamic playback method of space-time fining |
WO2021226976A1 (en) * | 2020-05-15 | 2021-11-18 | 安徽中科智能感知产业技术研究院有限责任公司 | Soil available nutrient inversion method based on deep neural network |
Non-Patent Citations (2)
Title |
---|
珠海一号高光谱遥感的表层土壤有机质含量反演方法;孙浩然;赵志根;赵佳星;陈卫卫;;遥感信息;20200820(04);全文 * |
集成土壤-环境关系与机器学习的干旱区土壤属性数字制图;张振华;丁建丽;王敬哲;葛翔宇;王瑾杰;田美玲;赵启东;;中国农业科学;20200201(第03期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN114140591A (en) | 2022-03-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Han et al. | Modeling maize above-ground biomass based on machine learning approaches using UAV remote-sensing data | |
CN114140591B (en) | Soil organic matter remote sensing mapping method combining machine learning and ground statistics | |
Halme et al. | Utility of hyperspectral compared to multispectral remote sensing data in estimating forest biomass and structure variables in Finnish boreal forest | |
Betbeder et al. | Assimilation of LAI and dry biomass data from optical and SAR images into an agro-meteorological model to estimate soybean yield | |
Thapa et al. | Potential of high-resolution ALOS–PALSAR mosaic texture for aboveground forest carbon tracking in tropical region | |
Guo et al. | Integrating remote sensing information with crop model to monitor wheat growth and yield based on simulation zone partitioning | |
Dente et al. | Assimilation of leaf area index derived from ASAR and MERIS data into CERES-Wheat model to map wheat yield | |
Manivasagam et al. | Practices for upscaling crop simulation models from field scale to large regions | |
Whetton et al. | Nonlinear parametric modelling to study how soil properties affect crop yields and NDVI | |
Abolafia‐Rosenzweig et al. | Soil moisture data assimilation to estimate irrigation water use | |
Yang et al. | Winter wheat SPAD estimation from UAV hyperspectral data using cluster-regression methods | |
Zhang et al. | Estimating soil salinity with different fractional vegetation cover using remote sensing | |
Yadav et al. | An improved inversion algorithm for spatio-temporal retrieval of soil moisture through modified water cloud model using C-band Sentinel-1A SAR data | |
Alebele et al. | Estimation of crop yield from combined optical and SAR imagery using Gaussian kernel regression | |
CN110427995A (en) | A kind of Bayes's soil moisture evaluation method based on multi- source Remote Sensing Data data | |
Wang et al. | Microwave-based vegetation descriptors in the parameterization of water cloud model at L-band for soil moisture retrieval over croplands | |
Dehkordi et al. | Yield gap analysis using remote sensing and modelling approaches: Wheat in the northwest of Iran | |
Acevedo-Opazo et al. | A model for the spatial prediction of water status in vines (Vitis vinifera L.) using high resolution ancillary information | |
Qiao et al. | Estimating maize LAI by exploring deep features of vegetation index map from UAV multispectral images | |
Rawat et al. | Parameterization of the modified water cloud model (MWCM) using normalized difference vegetation index (NDVI) for winter wheat crop: a case study from Punjab, India | |
Liu et al. | Winter wheat yield estimation based on assimilated Sentinel-2 images with the CERES-Wheat model | |
Zhao et al. | Should phenological information be applied to predict agronomic traits across growth stages of winter wheat? | |
Ma et al. | Retrieving the soil moisture in bare farmland areas using a modified Dubois model | |
Wang et al. | The distributed CERES-Maize model with crop parameters determined through data assimilation assists in regional irrigation schedule optimization | |
Singh et al. | Incorporation of first-order backscattered power in Water Cloud Model for improving the Leaf Area Index and Soil Moisture retrieval using dual-polarized Sentinel-1 SAR data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |