CN112446397A - Grass yield estimation method and device based on remote sensing and random forest and storage medium - Google Patents
Grass yield estimation method and device based on remote sensing and random forest and storage medium Download PDFInfo
- Publication number
- CN112446397A CN112446397A CN201910822293.4A CN201910822293A CN112446397A CN 112446397 A CN112446397 A CN 112446397A CN 201910822293 A CN201910822293 A CN 201910822293A CN 112446397 A CN112446397 A CN 112446397A
- Authority
- CN
- China
- Prior art keywords
- remote sensing
- data
- random forest
- sample
- sensing data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 244000025254 Cannabis sativa Species 0.000 title claims abstract description 81
- 238000007637 random forest analysis Methods 0.000 title claims abstract description 75
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000003860 storage Methods 0.000 title claims abstract description 9
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 30
- 238000012549 training Methods 0.000 claims abstract description 19
- 239000013598 vector Substances 0.000 claims abstract description 17
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 238000004519 manufacturing process Methods 0.000 claims description 22
- 238000011160 research Methods 0.000 claims description 22
- 238000012360 testing method Methods 0.000 claims description 18
- 238000003066 decision tree Methods 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 9
- 238000011156 evaluation Methods 0.000 claims description 7
- 238000010801 machine learning Methods 0.000 claims description 7
- 238000005070 sampling Methods 0.000 claims description 7
- 230000008859 change Effects 0.000 claims description 6
- 238000012952 Resampling Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 238000005520 cutting process Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 5
- 230000002159 abnormal effect Effects 0.000 abstract description 4
- 238000009826 distribution Methods 0.000 description 5
- 101100284396 Drosophila melanogaster Hayan gene Proteins 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000012544 monitoring process Methods 0.000 description 4
- 238000012937 correction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 241000196324 Embryophyta Species 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000000701 chemical imaging Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000002310 reflectometry Methods 0.000 description 1
- 239000002689 soil Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000699 topical effect Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/55—Specular reflectivity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/02—Agriculture; Fishing; Mining
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N2021/1793—Remote sensing
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N2021/1793—Remote sensing
- G01N2021/1797—Remote sensing in landscape, e.g. crops
Abstract
The invention discloses a grass yield estimation method based on remote sensing data and a random forest algorithm, which comprises the following steps of: acquiring remote sensing data and preprocessing the remote sensing data; acquiring measured data of sample points in a grass producing area; obtaining a corresponding point wave band value and a vegetation index as sample data according to the remote sensing data and the sample point coordinates; establishing a random forest estimation model according to the actually measured data of the sample points and the sample data; predicting the grass yield by using the preprocessed remote sensing data as an input vector of a random forest estimation model; also discloses a grass yield estimation device based on remote sensing data and a random forest algorithm and a readable storage medium. The assumed conditions such as the normality, the independence and the like of the variables do not need to be checked, the collinear problem of the variables does not need to be considered, and the method has high operation efficiency and accurate result. The method has high accuracy, good tolerance to abnormal values and noise, and good training and learning effects on high-dimensional data such as hyperspectral remote sensing.
Description
Technical Field
The invention relates to the technical field of satellite measurement and calculation, in particular to a grass yield estimation method and device based on remote sensing data and random forests and a storage medium.
Background
Grassland resources are an important component of the global terrestrial ecosystem and play a crucial role in the ecological environment. The grassland resource monitoring is helpful for understanding and mastering the actual situation of the grassland, the grassland resources are reasonably developed and utilized, and the aim of maintaining the balance of the grassland ecological system is fulfilled. The grassland remote sensing estimation is to obtain the earth surface information through a satellite sensor or a ground spectrometer, and under the guidance of a ground object spectrum theory, the obtained earth surface information is subjected to complex comprehensive processing to identify grasslands and growth vigor thereof, so that the grassland area and growth vigor monitoring and the prediction of single yield and total yield are realized. Grassland assessment by remote sensing technology was originally originated abroad, and domestic research began later but developed quickly. Through research, people initially estimate grassland remote sensing based on AVHRR-NDVI data, and explore the relationship between vegetation index NDVI and grass yield by establishing a regression model. With the development of the technology, the grassland grass yield is estimated and the grassland growth condition is explored by establishing different inversion models by utilizing ground spectrum experimental data or hyperspectral data and ground monitoring sample data in combination with data products such as Landsat and MODIS.
Sentinel 2 (Sentinel-2) is a multispectral imaging satellite emitted by the european space agency in 2015 at 6 months, carries a multispectral imager (MSI), has a resolution up to 10m, can cover remote sensing data of 13 spectral bands, is used for land monitoring, can provide images of vegetation, soil and water coverage, inland waterways, coastal areas and the like, and can also be used for emergency rescue services. At present, two satellites, namely a Sentinel-2A satellite and a Sentinel-2B satellite, are sent, the revisit period of one satellite is 10 days, the two satellites are complementary, and the revisit period is 5 days. However, because Sentinel-2 is a newly launched satellite, there is currently less research using Sentinel-2 data for remote sensing estimates in grassland areas.
Disclosure of Invention
In view of the defects, the invention provides the grass yield estimation method based on the remote sensing data and the random forest algorithm, and the grass yield with more accurate result can be estimated.
In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:
a grass yield estimation method based on remote sensing data and a random forest algorithm comprises the following steps:
acquiring remote sensing data and preprocessing the remote sensing data;
acquiring measured data of sample points in a grass producing area;
obtaining a corresponding point wave band value and a vegetation index as sample data according to the remote sensing data and the sample point coordinates;
establishing a random forest estimation model according to the actually measured data of the sample points and the sample data;
and predicting the grass yield by using the preprocessed remote sensing data as an input vector of the random forest estimation model.
According to one aspect of the invention, the acquiring and preprocessing remote sensing data comprises:
according to actual project requirements, a field sampling area and sampling time, selecting required satellite remote sensing data, wherein the data requirements are basically cloud-free and the image quality is high;
resampling the image wave band to the resolution of 10m with the best quality by using a bicubic convolution mode;
and according to the needs of the research area, performing wave band combination, image splicing and cutting on the data.
According to one aspect of the invention, the acquiring measured data of the sample points of the grass-producing area comprises: and selecting the period of the most vigorous growth of the pasture according to the growth rule of the pasture to obtain the actually measured data.
According to one aspect of the invention, the obtaining of the corresponding point wave band value and the vegetation index as sample data according to the remote sensing data and the sample point coordinate comprises: and extracting a wave band value of a corresponding point, an enhanced vegetation index EVI and a normalized vegetation index NDVI based on the remote sensing data as sample data according to the longitude and latitude coordinates of the sample point.
According to one aspect of the invention, the obtaining of the corresponding point wave band value and the vegetation index as sample data according to the remote sensing data and the sample point coordinate comprises: carrying out wave band splitting on a research area, extracting wave bands except B10 cirrus wave bands, and extracting wave band values according to sample points to be used as sample characteristics; for the study region, the images were cropped according to the study region boundary, numbered B1, B2, B3, B4, B5, B6, B7, B8, B8A, B9, B11, B12, EVI, NDVI in the band selection order, and stored as TIFF data.
According to one aspect of the invention, the establishing of the random forest estimation model according to the measured data of the sample points and the sample data comprises the following steps:
storing the total grass yield fresh weight/edible grass yield fresh weight of the actual measuring points to be calculated into a sample as a Y value, and taking the extracted wave band value and the EVI and NDVI values as an X value;
and (3) modeling a random forest regression model by using a machine learning library, wherein the constructed random forest regression model is represented by { h (X, Θ k) } 1, …, wherein X is an input vector, and { Θ k } is an independent identically distributed random vector.
According to one aspect of the invention, the modeling of the random forest regression model comprises the following steps:
carrying out normalization processing on an input sample data set;
dividing a sample data set into a training set and a test set;
for the training set D { (x)1,y1),(x2,y2),...,(xn,yn) Resampling by using a Bootstrap method, and randomly generating T training sets S1,S2,…,ST;
Generating a corresponding decision tree C for each training set1,C2,…,CT(ii) a Before selecting attributes on each non-leaf node, randomly extracting M (M < M) attributes from all M attributes as a splitting attribute set of a current node, and selecting an optimal splitting attribute from the M attributes as a node for splitting;
forming a random forest by the generated decision trees, and testing each decision tree of a test set sample X to obtain a prediction result C1(x),C2(x),…,CT(x);
Based on the regression problem, the predicted values for test set sample X are the average of the results of these trees.
According to one aspect of the invention, the predicting values of the test set samples X based on the regression problem as the average of the results of these trees comprises:
for any division characteristic A, data set D divided into two sides of corresponding any division point S1,D2Obtaining the angle D1,D2The sum of the mean square deviations of the respective sets is minimized while D is1,D2The feature and feature value division point corresponding to the minimum sum of the mean square deviations is expressed as:
wherein, c1Is D1Sample output mean of data set, c2Is D2The sample output mean of the data set.
According to one aspect of the invention, the modeling of the random forest regression model includes determining the importance of variables, and specifically includes:
adding random noise into the variable of each decision tree, then checking the increase and decrease of the error outside the bag, if the error increases, the change amount is more important, otherwise, the change amount is not important;
the calculation method comprises the following steps:
wherein the content of the first and second substances,represents the importance of the variable i; eerrOOB1Represents an Out of bag (OOB) error, EerrOOB2And (4) representing the error outside the bag calculated again by adding noise interference to the variable i of all samples of the OOB data outside the bag at random.
According to one aspect of the invention, said establishing a random forest estimation model comprises performing a model evaluation, said model evaluation selecting a selection decision coefficient (R)2) And Mean Square Error (MSE), specifically including:
wherein, yiIn order to be the actual observed value,in order to estimate the value to be estimated by the model,is the average number of samples, and n is the number of samples.
According to one aspect of the invention, the predicting the grass production by using the preprocessed remote sensing data as the input vector of the random forest estimation model comprises: storing the image wave band values into an array according to the format of a sample as an input vector by using the preprocessed remote sensing data; and predicting the grass yield of the research area by using a random forest estimation model, and outputting the predicted grass yield result as a TIFF image.
A grass yield estimation device based on remote sensing data and a random forest algorithm comprises:
a memory for storing a computer program;
a processor for implementing the steps of the method for estimating grass production based on remote sensing data and a random forest algorithm as described above when executing the computer program.
A readable storage medium having stored thereon a computer program which, when executed, carries out the steps of the method for estimating grass production based on remote sensing data and a random forest algorithm as described above.
The implementation of the invention has the advantages that: the grass yield estimation method based on the remote sensing data and the random forest algorithm comprises the following steps: acquiring remote sensing data and preprocessing the remote sensing data; acquiring measured data of sample points in a grass producing area; obtaining a corresponding point wave band value and a vegetation index as sample data according to the remote sensing data and the sample point coordinates; establishing a random forest estimation model according to the actually measured data of the sample points and the sample data; predicting the grass yield by using the preprocessed remote sensing data as an input vector of a random forest estimation model; the assumed conditions such as the normality, the independence and the like of the variables do not need to be checked, the collinear problem of the variables does not need to be considered, and the method has high operation efficiency and accurate result. The method has high accuracy, good tolerance to abnormal values and noise, and good training and learning effects on high-dimensional data such as hyperspectral remote sensing. Meanwhile, a major problem in machine learning is overfitting, and for a random forest, as long as enough trees are in the forest, the classifier cannot overfitt the model, and the generalization capability is strong.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of a grass yield estimation method based on remote sensing data and a random forest algorithm according to the present invention;
FIG. 2 is a plot of the gross grass yield for a Hayan study area according to the present invention;
FIG. 3 is a plot of the edible grass production in the research area for Hayan according to the present invention;
FIG. 4 is a diagram of the total grass production distribution in the qilian study area according to the present invention;
FIG. 5 is a diagram of the distribution of the edible grass yield in the qilian research area according to the present invention;
FIG. 6 is a schematic diagram of a grass yield estimation apparatus according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, fig. 2, fig. 3, fig. 4 and fig. 5, a grass production estimation method based on remote sensing data and a random forest algorithm comprises the following steps:
step S1: acquiring remote sensing data and preprocessing the remote sensing data;
according to the actual project requirements, the field sampling area and the sampling time, the required Sentinel-2A or Sentinel-2B data is selected, the data requirements are basically cloud-free, and the image quality is high. Sentinel-2 data directly acquired through a data sharing website (https:// scihub. copernius. eu/dhus /) of the European Space Agency (ESA) is Level-1C-Level multispectral data, is an orthoimage subjected to geometric correction, and has a projection coordinate system of UTM/WGS-84. The ESA also defines the Sentinel-2L2A level data, the L2A level data mainly contains the atmospheric background reflectivity data after radiometric calibration and atmospheric correction, and the L2A level data needs to be processed by the user. The Sentinel-2 data band is referenced as follows:
TABLE 1 Sentinel-2 satellite data partial parameter information
Raw data was radiometric and atmospheric corrected using plug-in Sen2co published by ESA. As can be seen from Table 1, the spatial resolution of the Sentinel-2 band is not uniform, where the image band is resampled to the best quality 10m resolution using a bicubic convolution, and the data is band-combined, image-stitched and cropped according to the needs of the study area.
Step S2: acquiring measured data of sample points in a grass producing area;
the adopted actual measurement data mainly comprises information such as herbage sample prescription number, longitude and latitude, vegetation coverage, main plant species, total grass yield fresh weight and dry wind weight, edible grass yield fresh weight and dry wind weight, investigation time and the like. According to the growth rule of the pasture, the time selection is preferably concentrated on 7-8 months, which is the period of the most vigorous growth of the pasture. And considering the grass yield estimation precision, the measured data distribution should be as uniform as possible, and a sufficient number of prescriptions are selected for the grassland type in the research area to be measured.
Step S3: obtaining a corresponding point wave band value and a vegetation index as sample data according to the remote sensing data and the sample point coordinates;
the vegetation index is a measure reflecting the vegetation condition of the earth surface by combining the spectral reflectances of different wave bands on the remote sensing image in a linear or nonlinear mode. According to past researches, NDVI and EVI vegetation indexes which can well reflect the coverage condition of the surface vegetation are selected as characteristic values. And according to the longitude and latitude coordinates of the sample points, extracting the wave band value of the corresponding point, the enhanced vegetation index EVI and the normalized vegetation index NDVI by using ArcGIS as sample data, wherein the storage format is csv. The method comprises the following specific steps:
where ρ isnirIs a value in the near infrared band, predAs red band values, pblueThe blue band value.
And (3) splitting the wave band of the research area by using ENVI5.3, extracting the wave bands except the B10 cirrus wave band, and extracting the wave band value as a sample characteristic according to the sample point. For the study region, the images were cropped according to the study region boundary, numbered B1, B2, B3, B4, B5, B6, B7, B8, B8A, B9, B11, B12, EVI, NDVI in the band selection order, and stored as TIFF data.
Step S4: establishing a random forest estimation model according to the actually measured data of the sample points and the sample data;
storing the actual measuring point total grass yield fresh weight/edible grass yield fresh weight which needs to be calculated into a sample as a Y value, taking the extracted wave band value and EVI and NDVI values as X values, and performing random forest regression model building by using a machine learning library scinit-leann (skearn) of python, wherein the skearn supports four machine learning algorithms including classification, regression, dimensionality reduction and clustering, and further comprises three modules of feature extraction, data processing and model evaluation.
The specific modeling process comprises the following steps:
data normalization
Because different dimensions and dimension units exist among the selected evaluation values, the final result is influenced by the conditions, and in order to eliminate the influence, the input sample data set needs to be normalized, so that the sample characteristic distribution is close to the standard normal distribution, and a better effect is achieved. The MinMacScaler is used here to scale the features between a given minimum and maximum, typically taking the value (0, 1).
Wherein x isiMeasuring the sample value, x, for that pointmin(axis=0)Is the smallest sample, xmax(axis=0) For the maximum sample, max, min is the maximum and minimum for a given scaling range.
(II) selecting sample training set and test set
Before modeling, a sample data set needs to be divided into a training set and a test set, and when the training set and the test set are divided, balancing needs to be carried out, and if the data of the test set is smaller, the estimation of the generalization error of the model is more inaccurate. In general, in practical applications, the division ratio of the training set data to the test set data is 6: 4, 7: 3, or 8: 2 based on the size of the entire data set. For large data, 9: 1, or even 99: 1 can be used.
(III) RF (random forest) model establishment
RF has evolved from modifications made on the basis of Bagging. The Bagging algorithm is to extract m sub-samples on an original data set by adopting a replaced random sampling mode, so that m learners are trained by the m sub-samples, the variance of a model is reduced, then the data are put on the m classifiers, and finally the class of the data is determined according to the voting results of the m classifiers. RF has been modified on this basis in two places: firstly, when each learner is trained, the optimal features are selected from all the features to segment the nodes, but k features are randomly selected, and the optimal features are selected from the k features to segment the nodes; the second is to use the CART decision tree.
The RF regression model is denoted by { h (X, Θ k), k ═ 1. }, where X is the input vector and { Θ k } is the independent identically distributed random vector.
The specific algorithm steps are as follows:
for the training dataset D { (x)1,y1),(x2,y2),...,(xn,yn)},
(1) Resampling by using a Bootstrap method, and randomly generating T training sets S1,S2,…,ST;
(2) Generating a corresponding decision tree C for each training set1,C2,…,CT(ii) a Before selecting attributes on each non-leaf node, randomly extracting M (M < M) attributes from all M attributes as a splitting attribute set of a current node, and selecting a best splitting attribute from the M attributes as a node for splitting. Where m controlsThe introduction degree of randomness is introduced, and if M is equal to M, the construction of the base decision tree is the same as that of the traditional decision tree; if m is 1, one attribute is randomly selected for division, and m is generally recommended to be log2M;
(3) Forming a random forest by the generated decision trees, and testing each decision tree of a test set sample X to obtain a prediction result C1(x),C2(x),…,CT(x);
(4) For the regression problem, the predicted values for test set sample X are the average of the results of these trees.
When constructing a regression tree using CART, the principle used is the minimum variance. I.e. for any dividing characteristic A, the data set D divided into two sides of the corresponding arbitrary dividing point S1,D2Obtaining the angle D1,D2The sum of the mean square deviations of the respective sets is minimized while D is1,D2The feature and feature value division point corresponding to the minimum sum of the mean square deviations. The expression is as follows:
wherein, c1Is D1Sample output mean of data set, c2Is D2The sample output mean of the data set. The prediction of the CART tree is based on the mean of the leaf nodes, so the prediction of the RF is the mean of the predicted values of all trees.
The RF model judges the importance of the variable by adding random noise into the variable of each decision tree and then checking the increase and decrease of the out-of-bag error, wherein if the error increases, the change amount is more important, otherwise, the change amount is not important. The calculation method comprises the following steps:
wherein the content of the first and second substances,represents the importance of the variable i;EerrOOB1represents an Out of bag (OOB) error, EerrOOB2And (4) representing the error outside the bag calculated again by adding noise interference to the variable i of all samples of the OOB data outside the bag at random.
(IV) evaluation of model
Model evaluation selection decision coefficient (R)2) And Mean Square Error (MSE). Wherein R is2The characterization regression equation explains to what extent the dependent variable changes, or how well the model fits to the observed values.
Wherein, yiIn order to be the actual observed value,in order to estimate the value to be estimated by the model,is the average number of samples, and n is the number of samples.
Step S5: and predicting the grass yield by using the preprocessed remote sensing data as an input vector of the random forest estimation model.
And storing the image wave band values into an array according to the format of the sample as an input vector by using the preprocessed remote sensing data. And (3) predicting the grass yield of the research area by using an RF model, and outputting the predicted grass yield result as a TIFF image.
In practical applications, the following implementation data are included:
two study areas, a Hayan study area and a Keemun study area, were designated;
wherein the Haiyan research district is 100.708769-101.136171 DEG E, 36.897002-37.188647 DEG N, and the main grassland types are alpine meadow and warm grassland; the Qilian research district is 100.495069-100.860690 degrees E and 37.588238-37.801124 degrees N, and the main types of the grassland distributed are alpine meadows and warm grasslands.
1) The Sentinel-2 data are screened according to the latitude and longitude of the research area and the coordinates of the sample points, and the basically cloud-free image is required to be high in quality. In this example, since the sample point and the Hayan study area are located in two adjacent views of Sentinel-2, the image data needs to be stitched after image pre-processing.
According to the embodiment, step S4, an RF regression model is established, the total grass yield and the edible grass yield are respectively modeled, then the grass yield is estimated according to the input image data of step S5, the result is stored as a TIFF image, and a grass yield thematic map is established according to the result image data. As shown in fig. 2, is a chart of the total draft production thematic in the kayan research area; as shown in fig. 3, is a chart of the topical chart of the edible grass production in the kayan study area.
2) The Sentinel-2 data are screened according to the latitude and longitude of the research area and the coordinates of the sample points, and the basically cloud-free image is required to be high in quality. In this example, since the sample points and the qilian study area are located in two adjacent Sentinel-2 shots, the image data needs to be stitched and cropped after the image preprocessing.
According to the embodiment, step S4, an RF regression model is established, the total grass yield and the edible grass yield are respectively modeled, then the grass yield is estimated according to the input image data of step S5, the result is stored as a TIFF image, and a grass yield thematic map is established according to the result image data. As shown in fig. 4, is a thematic set of total grass production in the qilian research district; as shown in FIG. 5, it is a graphic set of the monograph of the edible grass-producing amount in the qilian research district.
The assumed conditions such as the normality, the independence and the like of the variables do not need to be checked, the collinear problem of the variables does not need to be considered, and the method has high operation efficiency and accurate result. The method has high accuracy, good tolerance to abnormal values and noise, and good training and learning effects on high-dimensional data such as hyperspectral remote sensing. Meanwhile, a major problem in machine learning is overfitting, and for a random forest, as long as enough trees are in the forest, the classifier cannot overfitt the model, and the generalization capability is strong.
Example two
As shown in fig. 6, a grass yield estimation device based on remote sensing data and a random forest algorithm, the grass yield estimation device includes:
a memory 100 for storing a computer program;
a processor 200 for implementing the steps of the method for estimating grass production based on remote sensing data and random forest algorithm as described above when executing said computer program.
EXAMPLE III
A readable storage medium having stored thereon a computer program which, when executed, carries out the steps of the method for estimating grass production based on remote sensing data and a random forest algorithm as described above.
The implementation of the invention has the advantages that: the grass yield estimation method based on the remote sensing data and the random forest algorithm comprises the following steps: acquiring remote sensing data and preprocessing the remote sensing data; acquiring measured data of sample points in a grass producing area; obtaining a corresponding point wave band value and a vegetation index as sample data according to the remote sensing data and the sample point coordinates; establishing a random forest estimation model according to the actually measured data of the sample points and the sample data; predicting the grass yield by using the preprocessed remote sensing data as an input vector of a random forest estimation model; the assumed conditions such as the normality, the independence and the like of the variables do not need to be checked, the collinear problem of the variables does not need to be considered, and the method has high operation efficiency and accurate result. The method has high accuracy, good tolerance to abnormal values and noise, and good training and learning effects on high-dimensional data such as hyperspectral remote sensing. Meanwhile, a major problem in machine learning is overfitting, and for a random forest, as long as enough trees are in the forest, the classifier cannot overfitt the model, and the generalization capability is strong.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention disclosed herein are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
Claims (13)
1. A grass yield estimation method based on remote sensing data and a random forest algorithm is characterized by comprising the following steps:
acquiring remote sensing data and preprocessing the remote sensing data;
acquiring measured data of sample points in a grass producing area;
obtaining a corresponding point wave band value and a vegetation index as sample data according to the remote sensing data and the sample point coordinates;
establishing a random forest estimation model according to the actually measured data of the sample points and the sample data;
and predicting the grass yield by using the preprocessed remote sensing data as an input vector of the random forest estimation model.
2. The method for estimating grass production based on remote sensing data and random forest algorithm according to claim 1, wherein the obtaining and preprocessing remote sensing data comprises:
according to actual project requirements, a field sampling area and sampling time, selecting required satellite remote sensing data, wherein the data requirements are basically cloud-free and the image quality is high;
resampling the image wave band to the resolution of 10m with the best quality by using a bicubic convolution mode;
and according to the needs of the research area, performing wave band combination, image splicing and cutting on the data.
3. The remote sensing data and random forest algorithm-based grass production estimation method of claim 1, wherein the obtaining measured data of sample points of a grass production area comprises: and selecting the period of the most vigorous growth of the pasture according to the growth rule of the pasture to obtain the actually measured data.
4. The method for estimating grass yield based on remote sensing data and a random forest algorithm according to claim 1, wherein the obtaining of the corresponding point band value and the vegetation index as sample data according to the remote sensing data and the sample point coordinates comprises: and extracting a wave band value of a corresponding point, an enhanced vegetation index EVI and a normalized vegetation index NDVI based on the remote sensing data as sample data according to the longitude and latitude coordinates of the sample point.
5. The method for estimating grass yield based on remote sensing data and a random forest algorithm according to claim 4, wherein the obtaining of the corresponding point band value and the vegetation index as sample data according to the remote sensing data and the sample point coordinates comprises: carrying out wave band splitting on a research area, extracting wave bands except B10 cirrus wave bands, and extracting wave band values according to sample points to be used as sample characteristics; for the study region, the images were cropped according to the study region boundary, numbered B1, B2, B3, B4, B5, B6, B7, B8, B8A, B9, B11, B12, EVI, NDVI in the band selection order, and stored as TIFF data.
6. The remote sensing data and random forest algorithm-based grass production estimation method according to claim 5, wherein the step of establishing a random forest estimation model according to the measured data of the sample points and the sample data comprises the following steps:
storing the total grass yield fresh weight/edible grass yield fresh weight of the actual measuring points to be calculated into a sample as a Y value, and taking the extracted wave band value and the EVI and NDVI values as an X value;
and (3) modeling a random forest regression model by using a machine learning library, wherein the constructed random forest regression model is represented by { h (X, Θ k) } 1, …, wherein X is an input vector, and { Θ k } is an independent identically distributed random vector.
7. The remote sensing data and random forest algorithm-based grass production estimation method of claim 6, wherein the performing random forest regression model modeling comprises the steps of:
carrying out normalization processing on an input sample data set;
dividing a sample data set into a training set and a test set;
for the training set D { (x)1,y1),(x2,y2),…,(xn,yn) Resampling by using a Bootstrap method, and randomly generating T training sets S1,S2,…,ST;
Generating a corresponding decision tree C for each training set1,C2,…,CT(ii) a Before selecting attributes on each non-leaf node, randomly extracting M (M < M) attributes from all M attributes as a splitting attribute set of a current node, and selecting an optimal splitting attribute from the M attributes as a node for splitting;
forming a random forest by the generated decision trees, and testing each decision tree of a test set sample X to obtain a prediction result C1(x),C2(x),…,CT(x);
Based on the regression problem, the predicted values for test set sample X are the average of the results of these trees.
8. The method of claim 7, wherein the predicting the predicted value of test set sample X as an average of the results of the trees based on the regression problem comprises:
for any division characteristic A, data set D divided into two sides of corresponding any division point S1,D2Obtaining the angle D1,D2The sum of the mean square deviations of the respective sets is minimized while D is1,D2The feature and feature value division point corresponding to the minimum sum of the mean square deviations is expressed as:
wherein, c1Is D1Sample output mean of data set, c2Is D2The sample output mean of the data set.
9. The remote sensing data and random forest algorithm-based grass production estimation method of claim 7, wherein the performing random forest regression model modeling comprises determining variable importance, and specifically comprises:
adding random noise into the variable of each decision tree, then checking the increase and decrease of the error outside the bag, if the error increases, the change amount is more important, otherwise, the change amount is not important;
the calculation method comprises the following steps:
wherein the content of the first and second substances,represents the importance of the variable i; eerrOOB1Represents an Out of bag (OOB) error, EerrOOB2And (4) representing the error outside the bag calculated again by adding noise interference to the variable i of all samples of the OOB data outside the bag at random.
10. The remote sensing data and random forest algorithm-based grass production estimation method of claim 9 wherein the establishing a random forest estimation model comprises performing a model evaluation that selects a selection decision coefficient (R)2) And Mean Square Error (MSE), specifically including:
11. A method for estimating grass production based on remote sensing data and a random forest algorithm according to any one of claims 1-10, wherein the predicting grass production using the preprocessed remote sensing data as input vectors to a random forest estimation model comprises: storing the image wave band values into an array according to the format of a sample as an input vector by using the preprocessed remote sensing data; and predicting the grass yield of the research area by using a random forest estimation model, and outputting the predicted grass yield result as a TIFF image.
12. A grass yield estimation device based on remote sensing data and a random forest algorithm is characterized by comprising:
a memory for storing a computer program;
a processor for implementing the steps of the method for estimating grass production based on remote sensing data and random forest algorithm according to any one of claims 1 to 11 when executing said computer program.
13. A readable storage medium, characterized in that the readable storage medium has stored thereon a computer program which, when executed, carries out the steps of the method of estimating grass production based on remote sensing data and a random forest algorithm according to any one of claims 1 to 11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910822293.4A CN112446397A (en) | 2019-09-02 | 2019-09-02 | Grass yield estimation method and device based on remote sensing and random forest and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910822293.4A CN112446397A (en) | 2019-09-02 | 2019-09-02 | Grass yield estimation method and device based on remote sensing and random forest and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112446397A true CN112446397A (en) | 2021-03-05 |
Family
ID=74735137
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910822293.4A Pending CN112446397A (en) | 2019-09-02 | 2019-09-02 | Grass yield estimation method and device based on remote sensing and random forest and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112446397A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113033474A (en) * | 2021-04-14 | 2021-06-25 | 海南大学 | Mangrove forest resource remote sensing interpretation method based on fusion algorithm and model |
CN113408468A (en) * | 2021-07-01 | 2021-09-17 | 中国科学院东北地理与农业生态研究所 | Forest swamp extraction method based on Sentinel satellite image and random forest algorithm |
CN113761790A (en) * | 2021-07-27 | 2021-12-07 | 河海大学 | Fruit tree leaf nitrogen content estimation method based on Stacking ensemble learning |
CN114529826A (en) * | 2022-04-24 | 2022-05-24 | 江西农业大学 | Rice yield estimation method, device and equipment based on remote sensing image data |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103413272A (en) * | 2013-07-22 | 2013-11-27 | 中国科学院遥感与数字地球研究所 | Low-spatial-resolution multisource remote sensing image space consistency correction method |
CN103439297A (en) * | 2013-09-05 | 2013-12-11 | 太原理工大学 | Remote sensing estimation method for fresh weights of green plants in desert grassland |
CN106503458A (en) * | 2016-10-26 | 2017-03-15 | 南京信息工程大学 | A kind of surface air temperature data quality control method |
CN108229403A (en) * | 2018-01-08 | 2018-06-29 | 中国科学院遥感与数字地球研究所 | A kind of mixed model construction method for being used to estimate vegetation leaf area index |
CN108399400A (en) * | 2018-03-23 | 2018-08-14 | 中国农业科学院农业资源与农业区划研究所 | A kind of early stage crop recognition methods and system based on high-definition remote sensing data |
CN108710864A (en) * | 2018-05-25 | 2018-10-26 | 北华航天工业学院 | Winter wheat Remotely sensed acquisition method based on various dimensions identification and image noise reduction processing |
CN108921885A (en) * | 2018-08-03 | 2018-11-30 | 南京林业大学 | A kind of method of comprehensive three classes data source joint inversion forest ground biomass |
CN109376750A (en) * | 2018-06-15 | 2019-02-22 | 武汉大学 | A kind of Remote Image Classification merging medium-wave infrared and visible light |
-
2019
- 2019-09-02 CN CN201910822293.4A patent/CN112446397A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103413272A (en) * | 2013-07-22 | 2013-11-27 | 中国科学院遥感与数字地球研究所 | Low-spatial-resolution multisource remote sensing image space consistency correction method |
CN103439297A (en) * | 2013-09-05 | 2013-12-11 | 太原理工大学 | Remote sensing estimation method for fresh weights of green plants in desert grassland |
CN106503458A (en) * | 2016-10-26 | 2017-03-15 | 南京信息工程大学 | A kind of surface air temperature data quality control method |
CN108229403A (en) * | 2018-01-08 | 2018-06-29 | 中国科学院遥感与数字地球研究所 | A kind of mixed model construction method for being used to estimate vegetation leaf area index |
CN108399400A (en) * | 2018-03-23 | 2018-08-14 | 中国农业科学院农业资源与农业区划研究所 | A kind of early stage crop recognition methods and system based on high-definition remote sensing data |
CN108710864A (en) * | 2018-05-25 | 2018-10-26 | 北华航天工业学院 | Winter wheat Remotely sensed acquisition method based on various dimensions identification and image noise reduction processing |
CN109376750A (en) * | 2018-06-15 | 2019-02-22 | 武汉大学 | A kind of Remote Image Classification merging medium-wave infrared and visible light |
CN108921885A (en) * | 2018-08-03 | 2018-11-30 | 南京林业大学 | A kind of method of comprehensive three classes data source joint inversion forest ground biomass |
Non-Patent Citations (2)
Title |
---|
JIEDE1: "机器学习算法---随机森林实现", pages 1 - 2, Retrieved from the Internet <URL:https://blog.csdn.net/jiede1/article/details/78245597/> * |
何云等: "基于Sentinel-2A影像特征优选的随机森林土地覆盖分类", 资源科学, no. 5, pages 992 - 999 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113033474A (en) * | 2021-04-14 | 2021-06-25 | 海南大学 | Mangrove forest resource remote sensing interpretation method based on fusion algorithm and model |
CN113408468A (en) * | 2021-07-01 | 2021-09-17 | 中国科学院东北地理与农业生态研究所 | Forest swamp extraction method based on Sentinel satellite image and random forest algorithm |
CN113761790A (en) * | 2021-07-27 | 2021-12-07 | 河海大学 | Fruit tree leaf nitrogen content estimation method based on Stacking ensemble learning |
CN113761790B (en) * | 2021-07-27 | 2024-04-23 | 河海大学 | Fruit tree leaf nitrogen content estimation method based on Stacking integrated learning |
CN114529826A (en) * | 2022-04-24 | 2022-05-24 | 江西农业大学 | Rice yield estimation method, device and equipment based on remote sensing image data |
CN114529826B (en) * | 2022-04-24 | 2022-08-30 | 江西农业大学 | Rice yield estimation method, device and equipment based on remote sensing image data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Han et al. | Modeling maize above-ground biomass based on machine learning approaches using UAV remote-sensing data | |
CN112446397A (en) | Grass yield estimation method and device based on remote sensing and random forest and storage medium | |
Puliti et al. | Assessing 3D point clouds from aerial photographs for species-specific forest inventories | |
Koukoulas et al. | Mapping individual tree location, height and species in broadleaved deciduous forest using airborne LIDAR and multi‐spectral remotely sensed data | |
Adelabu et al. | Testing the reliability and stability of the internal accuracy assessment of random forest for classifying tree defoliation levels using different validation methods | |
Verma et al. | Sugarcane crop identification from LISS IV data using ISODATA, MLC, and indices based decision tree approach | |
Lv et al. | Object-oriented key point vector distance for binary land cover change detection using VHR remote sensing images | |
Jia et al. | Vegetation classification method with biochemical composition estimated from remote sensing data | |
US20210209803A1 (en) | Computer-based method and system for geo-spatial analysis | |
Banskota et al. | An LUT-based inversion of DART model to estimate forest LAI from hyperspectral data | |
CN112446522A (en) | Grass yield estimation method and device facing multi-scale segmentation and storage medium | |
Li et al. | Estimation of aboveground vegetation biomass based on Landsat-8 OLI satellite images in the Guanzhong Basin, China | |
Tesfamichael et al. | Investigating the impact of discrete-return lidar point density on estimations of mean and dominant plot-level tree height in Eucalyptus grandis plantations | |
CN111523525A (en) | Crop classification identification method and device and electronic equipment | |
Jiao et al. | Improving MODIS land cover classification by combining MODIS spectral and angular signatures in a Canadian boreal forest | |
CN115494007A (en) | Random forest based high-precision rapid detection method and device for soil organic matters | |
Song et al. | Object-based feature selection for crop classification using multi-temporal high-resolution imagery | |
Bayati et al. | 3D reconstruction of uneven-aged forest in single tree scale using digital camera and SfM-MVS technique | |
Safari et al. | Integration of synthetic aperture radar and multispectral data for aboveground biomass retrieval in Zagros oak forests, Iran: An attempt on Sentinel imagery | |
Fu et al. | Evaluation of LAI estimation of mangrove communities using DLR and ELR algorithms with UAV, hyperspectral, and SAR images | |
Rakuasa et al. | Analysis of Vegetation Index in Ambon City Using Sentinel-2 Satellite Image Data with Normalized Difference Vegetation Index (NDVI) Method based on Google Earth Engine | |
Qiu et al. | Exploring parameter selection for carbon monitoring based on Landsat-8 imagery of the aboveground forest biomass on Mount Tai | |
Guan et al. | A novel approach to estimate maize lodging area with PolSAR data | |
Kalbi et al. | Estimation of forest attributes in the Hyrcanian forests, comparison of advanced space-borne thermal emission and reflection radiometer and satellite poure I’observation de la terre-high resolution grounding data by multiple linear, and classification and regression tree regression models | |
Varvia et al. | Uncertainty quantification in ALS-based species-specific growing stock volume estimation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |