Remote sensing forest biomass inversion method
Technical Field
The invention relates to the technical field of monitoring and pollution-free control of forest pests, in particular to a method for inverting remote sensing forest biomass.
Background
Accurate biomass estimation has important significance for forest resource monitoring, forest carbon reserve assessment and forest ecosystem research. Meanwhile, the information can also provide quantitative data support for forest sustainable management and comprehensive utilization of forest resources. The traditional biomass investigation method is time-consuming and labor-consuming, can only obtain limited information on points, and is difficult to be practically popularized in a large area; the remote sensing technology can accurately and quickly acquire forest parameters of all scales, and has good practical value and application prospect.
The Landsat series of satellites can acquire multispectral (optical) information of the forest on medium and large scales. Compared with the previous sensors such as TM, the latest Landsat 8OLI sensor (which is mounted on the Landsat 8 satellite launched in 2013, 2, 11 and by the United states space and aviation administration (NASA)) has a great improvement in the arrangement of wave bands and the sensitivity to vegetation. However, optical remote sensing is still difficult to penetrate through forest canopies to obtain vertical structure information of the forest canopies, and forest biomass information is easily saturated when the forest biomass information is obtained in an area with high forest coverage (vegetation is vigorous). LiDAR (Light Detection And Ranging) is an active remote sensing technology which is rapidly developed in recent years, laser pulses emitted by the LiDAR can penetrate through vegetation canopies to obtain three-dimensional structure And energy information of the vegetation canopies, and previous researches show that the LiDAR has great potential in accurately estimating biophysical And structural characteristics of different forest types.
In recent years, the research of inverting forest biomass is as follows: 1) Zheng et al 2004 published "Estimating aboveground biobased using land and sat7ETM + data across a managed land and southern Wisconsin, USA" on Remote Sensing of Environment, volume 35, which inverted the aboveground biomass information of northern coniferous forest, wisconsin, USA by means of a vegetation index such as NDVI extracted from ETM + (Landsat 7) images. 2) The ' correlation analysis between tropical forest vegetation biomass of different age groups and remote sensing geological data ' published in 2004 journal of the plant ecology bulletin the Yangzhou, ming et al ' for estimating the tropical forest vegetation biomass of the West double-Pana of Yunnan by using a TM (Landsat 5) image original wave band method. 3) Ferster et al, 2009, published "above-bound Large Tree Mass Estimation in a Coastal Format in British Columbia Using Plot-Level measurements and Industrial Tree Detection from LiDAR" on Canadian Journal of Remote Sensing, volume 35, by means of a small spot (diameter: 0.1-2 m) and the height of the point cloud and the density of the canopy extracted from the LiDAR data, the aboveground biomass of the temperate forest is inverted. 4) Saatchi et al 2011 published "Benchmark Map of Forest Carbon Stocks in thermoplastic Regions acids Three contacts" at "Proceedings of the National Academy of Sciences of the United States of America" at stage 12 by the method of measuring large spots (diameter: 52-90 m) LiDAR data, extracting characteristic variables related to tree height information, and inverting the biomass of tropical rainforests. However, the above methods only mine traditional "optical" and LiDAR data from a single angle, with low specificity, and shallow mining depths for feature variables (i.e., feature variables are not systematically grouped and extracted and screened from multiple angles), and are not able to accurately invert biomass.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the defects in the prior art, the invention provides a remote sensing forest biomass inversion method which has the characteristics of strong specificity, low cost, easiness in popularization and application and the like.
The technical scheme is as follows: in order to achieve the purpose of the invention, the invention adopts the technical scheme that:
a method for inverting remote sensing forest biomass comprises the following steps: on the basis of preprocessing remote sensing data, respectively extracting characteristic variables of vegetation canopies from LiDAR point cloud (containing three-dimensional space information of the canopies) and multispectral (containing spectral information of the upper surfaces of the canopies) data; and (3) screening the LiDAR point cloud and the multispectral characteristic variables through correlation analysis, and inverting the aboveground and underground biomass through a stepwise regression model by combining with the ground actual measurement biomass information.
The remote sensing forest biomass inversion method comprises the following steps:
1) OLI image preprocessing: firstly, carrying out radiometric calibration on an original image by means of radiometric calibration parameters of an OLI sensor; converting the original DN value into a pixel radiation brightness value; then, carrying out atmospheric correction on the image by using a FLAASH model so as to convert the radiation brightness value into the actual reflectivity of the earth surface; then, geometrically and finely correcting the image, selecting the same-name feature points, correcting by adopting a quadratic polynomial, controlling the correction error within 0.1 pixel, and resampling by adopting a nearest pixel method.
2) LiDAR data preprocessing:
a) Noise level estimation and data smoothing: firstly, converting original data into a frequency domain, and then taking a low value part with higher frequency as a judgment standard of a noise level; then, a Gaussian filter is selected for smoothing, because the Gaussian filter can effectively smooth data and keep the trend of the original curve to the maximum extent;
b) Gaussian fitting (decomposition) and waveform data point clouding: fitting the waveform data by adopting a nonlinear least square method on the basis of the assumption that the echo data is the accumulation of a plurality of Gaussian functions; then extracting discrete point clouds from the processed waveform data through a local maximum peak detection filtering algorithm, wherein the energy and amplitude information of a return signal is recorded in each discrete point;
c) Generating a digital terrain: the purpose of LiDAR data height normalization is to obtain a "true" vegetation height from which the terrain effect is removed, typically by subtracting the terrain height from the original LiDAR data height information; therefore, accurate generation of a Digital Terrain Model (DTM) is an important prerequisite for calculating the normalized vegetation height; firstly, extracting discrete point clouds from waveform data to classify the discrete point clouds, then carrying out Kraus filtering processing on a last echo to remove non-ground points, and finally using the filtered last echo data and generating a digital terrain model by means of natural proximity interpolation;
d) Then, normalizing the elevation of the vegetation echo point by utilizing a DEM (digital elevation model), namely converting the height value of the vegetation point into a height value relative to the ground, and cutting each sample land through coordinates of a lower left corner and an upper right corner; finally, extracting normalized point cloud data of the coordinate positions corresponding to the 55 sample plots by a GIS analysis tool;
3) OLI characteristic variable extraction: extracting 5 groups of characteristic variables (detailed in table 1), namely original single-waveband variables, waveband combination variables, information enhancement group variables, vegetation index variables and texture information variables, by performing waveband combination, tassel cap transformation, texture information extraction, principal component analysis, minimum noise separation transformation and various vegetation index transformations on the OLI image; wherein the texture analysis is performed on a first principal component of the principal component analysis;
4) Extracted LiDAR point cloud feature variables: the LiDAR point cloud characteristic variables are 4 groups of characteristic variables (detailed in a table 2) calculated on the basis of three-dimensional normalized LiDAR point cloud values, namely height variables, height percentile variables, canopy density variables and canopy coverage variables;
5) Screening characteristic variables: performing Pearson's correlation analysis on the extracted LiDAR characteristic variables and OLI characteristic variables and parameters to be predicted, and selecting the characteristic variables of which the absolute values of Pearson's correlation coefficients are higher than 0.2 as modeling candidate variables; the calculation method of the Pearson's correlation coefficient comprises the following steps:
in the formula, x i For a certain stand characteristic measured on the ground, y i For a certain variable of the LiDAR characteristics,is x i Is determined by the average value of (a),is y i Average value of (a);
6) Statistical analysis: and (3) establishing a multivariate regression model by taking biomass information obtained by ground actual measurement and summary as a dependent variable and characteristic variables extracted by a remote sensing method as independent variables. Using stepwise entry method (stepwise) and checking the coefficient of determination (R) 2 ) To select appropriate variables to enter the model; if there is an independent variable, the statistic F is too small and T test does not reach significant level (P value)>, 0.1), then removing; f value is large and T test reaches a significant level (P value)< 0.05) then get in; using a coefficient of determination (R) 2 ) Root Mean Square Error (RMSE), and relative Root Mean Square Error (RMSE) to evaluate the accuracy of the regression model;
in the formula, x i Is a certain forest stand characteristic measured on the ground,is x i Is determined by the average value of (a) of (b),a certain forest stand characteristic estimated by the model, wherein n is the number of sample plots;
as shown in the formula (3), rRMSE is the RMSE (root mean square error) and(mean of measured values).
The invention discloses a remote sensing forest biomass inversion method, which extracts characteristic variables of a vegetation canopy from point cloud (comprising three-dimensional space information of the canopy) and multispectral (comprising spectral information of the upper surface of the canopy) data respectively on the basis of preprocessing LiDAR and OLI data, and then inverts aboveground and underground biomass through a stepwise regression model by combining with actually measured biomass information on the ground on the basis of screening the characteristic variables, wherein the innovation points and the characteristics are as follows: 1) Fully mining the biological and physical characteristic information of the vegetation in the canopy three-dimensional space information and the spectral information of the upper surface of the canopy in two dimensions; 2) The extracted characteristic variables are screened by means of correlation analysis and used for a final inversion model, so that mechanism explanation and method transplantation are facilitated; 3) Since the OLI data is free data, the method also saves the cost of increasing the biomass inversion accuracy over multiple scales to a large extent.
Has the advantages that: compared with the prior art, the invention fully excavates vegetation biophysical characteristic information contained in the canopy three-dimensional space information and the spectral information dimension of the upper surface of the canopy three-dimensional space information by integrating the LiDAR point cloud and the OLI multispectral characteristic variables, screens the extracted characteristic variables by means of correlation analysis, and finally determines the inversion model. The method is beneficial to mechanism explanation and method transplantation; and because the OLI data are free data, the method also greatly saves the cost of increasing the biomass inversion accuracy on multiple scales. The early-stage experimental verification result shows that the optimized inversion model of the northern subtropical forest biomass (compared with a method based on a single multispectral data source) constructed by the method can determine the coefficient R of the model 2 The improvement is 3 to 24 percent; and forest biomass can be estimated with high precision (reducing the relative root mean square error rRMSE by 2-10%). The method can be applied to the fields of forestry investigation, forest resource monitoring, forest carbon reserve assessment, forest ecosystem research and the like, and provides quantitative data support for forest sustainable management and comprehensive utilization of forest resources.
Drawings
FIG. 1 is a plot of a true color aerial photograph and 55 sample plots of a study area;
FIG. 2 is a method technical roadmap for remote sensing forest biomass inversion;
FIG. 3 is a plot of Pearson's correlation coefficients for various characteristic variables with aboveground biomass;
FIG. 4 is a plot of Pearson's correlation coefficients for various characteristic variables with subsurface biomass;
FIG. 5 is a comparative scattergram of measured values of an OLI model-above ground biomass plot and predicted values of the model;
FIG. 6 is a scatter plot of OLI model-measured values of subterranean biomass plots versus predicted values of the model;
FIG. 7 is a comparative scatter plot of LiDAR model-aboveground biomass plot actual measurements versus model predicted values;
FIG. 8 is a scatter plot of LiDAR model-measured value of a subterranean biomass plot versus predicted value for the model;
FIG. 9 is a comparative scattergram of measured values and predicted values of a model for an integrated model-aboveground biomass plot;
FIG. 10 is a scatter plot comparing model-subterranean biomass sample plot measured values to model predicted values.
Detailed Description
The present invention will be further described with reference to the following examples.
Example 1
A method for inverting biomass of remote sensing forest is disclosed, the technical route is shown in figure 1, the method is a method for inverting biomass of remote sensing forest integrating LiDAR point cloud and OLI multispectral characteristic variable, on the basis of preprocessing remote sensing data, the characteristic variable of vegetation canopy is extracted from LiDAR point cloud (containing canopy three-dimensional space information) and multispectral (containing spectral information of upper surface of canopy) data respectively; and (3) screening the LiDAR point cloud and the multispectral characteristic variables through correlation analysis, and inverting the aboveground and underground biomass through a stepwise regression model by combining with the actually measured biomass information on the ground. The method comprises the following specific steps:
the research district is located in China-Yu mountain forest farm (120 DEG 42 '9.4' E,31 DEG 40 '4.1' N) of mature city in Jiangsu province, belongs to subtropical monsoon climate, has mild climate, annual average precipitation amount of 1054 mm and area of about 1103hm 2 The altitude is 20-261m. The corn poppy field belongs to northern subtropical secondary commingled forests, and the main forest types are coniferous forests, broad-leaved forests and commingled forests, wherein the main conifer species are Pinus massoniana (Pinus massoniana), cedar wood (Cunninghamia lancelolia), slash pine (Pinus elliottii), black pine (Pinus thunbergii) and the like; the main broad-leaved tree species are Quercus acutissima, sweet maple (Liquidambar formosan) and some evergreen broad-leaved tree species, such as Fagaceae (Fagaceae), lauraceae (Lauraceae) and Theaceae (Theaceae).
55 square plots (size: 30 x 30m, setting time: 7-8 months in 2012 and 8 months in 2013) are set in the range of the research area according to indexes such as forest type, age and site index in three types of survey history data (1995, 2012). The sample plot is divided into three types of coniferous forest (n = 13), broad-leaved forest (n = 16) and mixed forest (n = 26) according to the tree species composition ratio. During the investigation of the sample plot, for trees with a breast diameter greater than 5cm, the tree species, breast diameter (measured with girth), tree height and under-branch height (measured with a Vertex IV laser altimeter) and crown width (i.e. the projected distance in both main directions, measured with a tape) of a single tree are measured one by one, and the trees with a breast diameter less than 5cm and dead are counted, but not involved in the calculation of biomass. The southwest angular position is likewise determined using differential GPS (Trimble GeoXH6000Handheld GPS units) with an accuracy better than 0.5 m by receiving JSCORS wide area differential signal positioning (fig. 2).
Collecting relevant forest parameters of the sample plot scale according to the single-tree survey data, wherein the relevant forest parameters comprise the aboveground biomass and underground biomass (t.hm) per unit area on the sample plot scale -2 ). Biomass information Individual biomass was calculated by the equation of growth at different rates (following the near principle) and summarized for eachBiomass (W) per unit area of block plot A ) And underground biomass (W) B )。
And (4) preprocessing the OLI image. The 2-7 wave bands in the Landsat 8OLI image (band number 119/38) of 19/7 in 2013 are adopted for research, the spatial resolution of the image is 30m, the radiation resolution is 12bit, and the spectral range covers 11 wave bands. Firstly, the original image is radiometrically calibrated by means of radiometric calibration parameters of the OLI sensor. And converting the original DN value into a pixel radiation brightness value. And then carrying out atmospheric correction on the image by using a FLAASH model (atmospheric model: tropical; aerosol model: urban; aerosol inversion method: 2-Band (K-T); initial visibility: 40; and spectral response function: ldcm _ oli. Sli), thereby converting the radiation brightness value into the actual surface reflectivity. Then, geometric fine correction is carried out on the image, 40 homonymy ground object points are selected, quadratic polynomial is adopted for correction, the correction error is controlled within 0.1 pixel, and the nearest pixel method is adopted for resampling.
LiDAR data preprocessing. 1) Noise level estimation and data smoothing: the original data is first converted into the frequency domain, and the lower value part with higher frequency is used as the judgment standard of the noise level. And then, a Gaussian filter is selected for smoothing, because the Gaussian filter can keep the trend of the original curve to the maximum extent while effectively smoothing data. 2) Gaussian fitting (decomposition) and waveform data point clouding: the waveform data is fitted using a non-linear least squares method based on the assumption that the echo data is an accumulation of multiple gaussian functions. And then extracting discrete point clouds from the processed waveform data through a local maximum peak detection filtering algorithm, wherein the energy and amplitude information of the return signal is recorded in each discrete point. 3) Generating a digital terrain: the purpose of LiDAR data height normalization is to obtain a "true" vegetation height with terrain effects removed, typically using raw LiDAR data height information minus terrain height. Therefore, accurate generation of a Digital Terrain Model (DTM) is an important prerequisite for calculating the normalized vegetation height. Firstly, discrete point clouds extracted from waveform data are classified, then Kraus filtering processing is carried out on the last echo to remove non-ground points, and finally the filtered last echo data are used and a digital terrain model is generated by means of interpolation of a natural proximity method. And then, normalizing the elevation of the vegetation echo point by utilizing the DEM, namely converting the height value of the vegetation point into a height value relative to the ground, and cutting each sample land through coordinates of a lower left corner and an upper right corner. And finally, extracting the normalized point cloud data of the coordinate positions corresponding to the 55 sample plots by a GIS analysis tool.
By performing band combination, tassel cap transformation, texture information extraction, principal component analysis, minimum noise separation transformation and multiple vegetation index transformation on an OLI image, 5 groups (53 in total) of characteristic variables, 6 original single-band variables, 10 band combination variables, 10 information enhancement group variables, 18 vegetation index variables and 9 texture information variables are extracted, wherein the texture analysis is performed on a first principal component of the principal component analysis, the window size is 3 x 3, and the lag distance is 1 pixel. The meanings of the 53 variables and the calculation formula (see table 1). The LiDAR feature variables were calculated 4 sets (34) of feature variables based on three-dimensional normalized LiDAR point cloud values: 10 height variables, 13 height percentile variables, 10 canopy density variables, 1 canopy coverage variable. The meanings of 34 variables and the calculation formula (see table 2).
TABLE 1 OLI characteristic variables and descriptions
And carrying out Pearson correlation analysis on the extracted 87 (34 LiDAR characteristic variables and 53 OLI characteristic variables) variables and the parameters needing to be predicted, and selecting the characteristic variables of which the absolute values of Pearson's correlation coefficients are higher than 0.2 as modeling candidate variables. The calculation method of the Pearson's correlation coefficient comprises the following steps:
in the formula, x i For a certain stand characteristic measured on the ground, y i For a certain variable of the LiDAR characteristics,is x i Is determined by the average value of (a) of (b),is y i Average value of (d);
the study selected 34 candidate variables whose absolute values of their Pearson's correlation coefficients with the parameter to be predicted are shown in fig. 3 and 4. The 34 variables have high correlation coefficients and significant relation with the biomass Pearson's, which shows that the 34 variables have good linear relation with the biomass. These 34 feature variables are therefore used as final modeling candidate variables.
TABLE 2 LiDAR feature variables and descriptions
And (3) establishing a multivariate regression model by taking biomass information obtained by ground actual measurement and summary as a dependent variable and characteristic variables extracted by a remote sensing method as independent variables. Using a stepwise entry method (stepwise) and checking the coefficient of determination (R) 2 ) To select the appropriate variables to enter the model. If there is an independent variable, the statistic F is too small and T test does not reach significant level (P value)>, 0.1), then removing; f value is large and T test reaches a significant level (P value)<, 0.05) is entered. Adopt and decideConstant coefficient (R) 2 ) Root Mean Square Error (RMSE), and relative Root Mean Square Error (RMSE) to evaluate the accuracy of the regression model;
in the formula, x i Is a certain forest stand characteristic measured on the ground,is x i Is determined by the average value of (a),a certain forest stand characteristic estimated by the model, wherein n is the number of sample plots;
as shown in the formula (3), rRMSE is the ratio of the RMSE (root mean square error) to the RMSE(mean of measured values) percentage.
Firstly, an OLI biomass estimation model (OLI model for short) and a LiDAR biomass estimation model (LiDAR model for short) are respectively constructed by utilizing OLI characteristic variables and LiDAR characteristic variables, and then a comprehensive biomass estimation model (comprehensive model for short) is constructed based on the two characteristic variables. When the three models are constructed, the two conditions are divided equally for analysis, firstly, statistical analysis is carried out on all sample plots without distinction, and then the sample plots are divided into coniferous forests, broad-leaved forests and mixed forests according to tree species composition for analysis respectively. The accuracy of each model for the different forest types was evaluated as in table 3. Scatter based on measured values of aboveground and underground biomass plots for different forest types versus predicted values for the models (OLI model, liDAR model, and integrated model) are shown in fig. 5-10.
Table 3 model accuracy evaluation table for different forest types
Wherein R is 2 Determining coefficients for the model; RMSE is root mean square error; rmse is relative root mean square error.
The above results show that the inversion result of the integrated model is better than that of a single characteristic variable model (i.e., an OLI model and a LiDAR model), and the model fitting effect is improved as follows: r 2 The improvement is 3 to 24 percent; the estimation accuracy is improved as follows: rRMSE decreases by 2-10%.