CN114266961A - Method for integrating, learning and classifying marsh vegetation stacks by integrating hyperspectral and multiband fully-polarized SAR images - Google Patents
Method for integrating, learning and classifying marsh vegetation stacks by integrating hyperspectral and multiband fully-polarized SAR images Download PDFInfo
- Publication number
- CN114266961A CN114266961A CN202111461864.XA CN202111461864A CN114266961A CN 114266961 A CN114266961 A CN 114266961A CN 202111461864 A CN202111461864 A CN 202111461864A CN 114266961 A CN114266961 A CN 114266961A
- Authority
- CN
- China
- Prior art keywords
- vegetation
- classification
- hyperspectral
- segmentation
- fully
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a marsh vegetation stack integrated learning classification method integrating hyperspectral and multiband fully-polarized SAR images, which integrates a hyperspectral image and different frequency band fully-polarized SAR images, performs variable optimization through multi-scale segmentation, high correlation variable elimination and a Boruta algorithm, constructs a multidimensional variable data set, performs stack integration on classification models with different optimized parameters by using a Stacking algorithm, constructs a marsh vegetation identification classification model, classifies data to be classified by using the model to obtain a marsh wetland vegetation classification result, and quantitatively evaluates the classification result by using an evaluation index. The invention integrates the advantages of abundant spectral information of the hyperspectral image and the advantages of penetrating the polarized SAR image through the vegetation canopy to realize high-precision identification and classification of the swamp vegetation.
Description
Technical Field
The invention relates to the technical field of plant classification, in particular to a marsh vegetation stack integrated learning classification method integrating hyperspectral and multiband fully-polarized SAR images.
Background
Biological species in the wetland are rich, but human beings cannot go deep into the wetland easily, the traditional method is time-consuming and labor-consuming, and the method for acquiring information of the wetland by remote sensing is a very efficient method. The hyperspectral image data has high spectral resolution but is easily influenced by cloud and rain weather, the polarized SAR image is not afraid of the interference of the cloud and rain, and the SAR data in different frequency bands have different penetrating abilities, but lack abundant spectral information. In the past, a single segmentation scale is mostly adopted in segmentation, and the problem that the classification precision is not high due to poor segmentation effect occurs. Due to high heterogeneity and complex composition of wetland landscape, the wetland classification using a single classifier is still challenging.
For example, patent application 201910079097.2 discloses a plant identification method based on deep learning and plant taxonomy. Marking the family, the genus and the species of the sample plant image; and inputting the sample plant image into a deep convolution neural network for training. Setting the loss function as the weighted sum of cross entropy losses of family, genus and species labels, updating the weight of the neural network through a random gradient descent algorithm, finishing training after the deep convolutional neural network converges, and fixing the weight of each layer to be unchanged to obtain the trained deep convolutional neural network.
However, the above method needs too much data to be collected, and for wetland information, we are still in a search stage, so the implementation effect of the existing stage is not very good, and a feasible way is to integrate the advantages of multiple models through a Stacking model to produce a better result, but the basic model of the Stacking model is not subjected to parameter optimization, and the advantages of the models cannot be brought into full play. Moreover, in the existing scientific literature, a lot of researches for wetland vegetation classification by using a machine learning algorithm based on remote sensing image data are also available. However, the traditional method does not integrate the high spectral resolution and the penetration recognition capability of the fully polarized SAR data of different frequency bands, or uses a single classifier without parameter optimization, so that the effect of the classifier cannot be exerted to the optimum, and the wetland vegetation cannot be classified with high precision.
Disclosure of Invention
In order to solve the problems, the invention aims to provide a marsh vegetation stack integrated learning classification method integrating hyperspectral and multiband fully-polarized SAR images.
The invention also aims to provide a marsh vegetation stack integrated learning classification method integrating hyperspectral and multiband fully-polarized SAR images, which is characterized in that the method extracts backscattering coefficients and polarization decomposition parameters of fully-polarized SAR data of different frequency bands, integrates the backscattering coefficients and the polarization decomposition parameters with the hyperspectral data to construct a multi-source data set, fully combines the advantages of hyperspectral resolution of the hyperspectral image data and the advantages of the polarized SAR images capable of penetrating through a vegetation canopy to obtain rich polarization information, and constructs a marsh vegetation classification model by Stacking and integrating classifiers with different optimized parameters by using a Stacking algorithm to realize high-precision classification of marsh vegetation.
The invention also aims to provide a marsh vegetation stack integrated learning classification method integrating hyperspectral and multiband fully-polarized SAR images. High correlation variables are removed by finding the optimal correlation coefficient, variable optimization is carried out by using a Boruta algorithm, the redundancy of input training data is reduced, and the efficiency and the classification accuracy of a classification model are improved, so that the identification and classification efficiency and the precision of the wetland vegetation are improved.
The invention also aims to provide a marsh vegetation stack integrated learning classification method integrating hyperspectral and multiband fully-polarized SAR images, which utilizes hyperspectral data, C-band fully-polarized SAR data and L-band fully-polarized SAR data with 32 wave bands as data sources, extracts backscattering coefficients and polarization decomposition parameters, establishes a multidimensional data set and solves the problem of low identification and classification precision caused by less data information; by utilizing a multi-scale segmentation algorithm based on inheritance, the optimal segmentation parameters are found for different vegetation types, threshold segmentation is carried out, the segmentation result is qualitatively evaluated by using the segmentation indexes, the optimal segmentation result is found, the optimal training data is provided for the establishment of a subsequent classification model, and the identification and classification precision of the wetland vegetation is improved.
A method for integrating, learning and classifying marsh vegetation stacks by integrating hyperspectral and multiband fully-polarized SAR images comprises the steps of integrating a hyperspectral image and different frequency band fully-polarized SAR images, performing multi-scale segmentation, high correlation variable elimination and variable optimization by a Boruta algorithm, constructing a multi-dimensional variable data set, performing stack integration on classification models with different optimized parameters by a Stacking algorithm, constructing a marsh vegetation identification and classification model, classifying data to be classified by the model to obtain a marsh wetland vegetation classification result, and quantitatively evaluating classification results by using evaluation indexes.
Further, the method further comprises the steps of:
step (1): performing data preprocessing on the hyperspectral image, the C-band fully-polarized SAR image and the L-band fully-polarized SAR image;
performing radiometric calibration, atmospheric correction and orthorectification on the hyperspectral image by utilizing ENVI5.3, extracting backscattering coefficients of the C-band fully-polarized SAR image and the L-band fully-polarized SAR image, performing polarization decomposition on the C-band fully-polarized SAR image and the L-band fully-polarized SAR image in PolSARpro _ v6.0 by using a plurality of polarization decomposition methods including polarization coherent decomposition and polarization incoherent decomposition, and extracting polarization decomposition parameters;
further, the Krogager decomposition can be expressed as:
in which the absolute phase δ is an uncorrelated parameter, δsAnd ksFor describing the spherical scattering component, the phase parameters theta and deltasThe offset of the spherical scattering component with respect to the dihedral component and the helical component, the azimuth angle of the dihedral component and the helical component, k, respectivelysRepresents the final scattering matrix [ s ]]Contribution of (a), ks,kdAnd khWeights for the spherical component, the dihedral component, and the helicoid component, respectively;
the Huynen decomposition is to decompose the Mueller matrix into the sum of a single target scattering matrix and a matrix of a distributed target, and to express the sum by using a coherent scattering matrix, so that the coherent matrix of the time-varying target<[T3]>The real parameters that can be expressed in 9 degrees of freedom are:
wherein the specific meanings of the parameters are as follows:
A0a symmetry factor of the target;
B0-B, an asymmetry factor of the target;
B0+ B, the non-regularity factor of the target;
c, a configuration factor;
d, a measure of local curvature difference;
e, surface torsion;
f, helicity of the target:
g, coupling of symmetric and asymmetric portions;
h, directionality of the target.
Covariance matrix C of the Total backscatter model, polarization decomposition Using Freeman-Durden three-component Scattering mechanism model3Can be expressed as:
in the formula, fV、fsAnd fDThe polarization decomposition parameters respectively represent the weights of the volume scattering component, the secondary scattering component and the odd scattering component;
the Yamaguchi four-component scattering model adds a fourth scattering component on the basis of a three-component scattering model, which is equivalent to the scattering power of a helical scatterer, and the covariance matrix of the total backscattering model can be expressed as:
in the formula, the value of 10log (| S) is selectedVV|2/|DHH|2) Bulk scattering component for > 2 dB:
fS、fH、fV、fHthe four polarization decomposition parameters respectively correspond to the weights of surface scattering, secondary scattering, volume scattering and spiral scattering components;
step (2): performing operations such as wave band synthesis, geographic registration, cutting and the like on the preprocessed image to construct a segmentation data set;
performing operations such as wave band synthesis, geographic registration, cutting and the like on the hyperspectral image, the polarization decomposition parameter and the backscattering coefficient processed in the step (1) in ArcGIS 10.6 to construct a segmented multi-source data set;
and (3): performing inheritance-based multi-scale segmentation on the segmented data set;
specifically, the segmentation data set is subjected to multi-scale segmentation based on inheritance in an eCoginization development processor 9.4, optimal segmentation scales and parameters are determined through repeated testing and adjustment, threshold segmentation is carried out on each land class in different segmentation scales, and finally, the segmented land classes are combined into a multi-scale segmented layer;
and (4): quantitatively evaluating the segmentation result through the segmentation evaluation index;
further, the following segmentation indicators were used to evaluate the segmentation quality:
v is global variance and is used for measuring the homogeneity inside the object, the smaller V is, the better the homogeneity inside the object is, and in the formula, V isiIs the standard deviation of the segmented object i, aiIs the area of the segmented object i, and n is the total number of the segmented objects in the whole image area;
i is a Moire index for evaluating heterogeneity between segmented objects, where n is the total number of segmented objects, wijRepresenting the adjacency relation between the object Ai and the object Bj, w if the object Ai and the object Bj are in adjacency ij1, otherwise w ij0; yi is the spectral average of the subject Ai,the lower the I is, the lower the correlation between the image segmentation objects is, namely the better the separability between the image objects is;
RMAS represents the ratio of the absolute value of the average difference between the object and the neighborhood to the standard deviation of the object and is used for evaluating the image segmentation scale, wherein L is the number of image wave band layers of the segmented data set, and delta CLSegmenting absolute value of difference from neighborhood mean, S, of object for single scale of L-band layer of segmented data setLStandard deviation of segmented objects at a single segmentation scale for the L-band layer, CLiIs the gray value of the pixel point of the L-band layer i,is the average value of wave bands on a single scale segmentation layer, n is the number of pixels, m is the number of objects directly adjacent to a target object, d is the boundary length of the target object, dsjA common boundary length for the target object and the jth directly adjacent object;
and (5): performing feature calculation and output on the segmentation result to construct a multi-dimensional feature data set;
further, 5 vegetation indices (CIgreen, cereg, NDVI, RVI, GNDVI) and normalized water indices (NDWI) were calculated for the segmentation results in ecorgion development developer 9.4; the gray level co-occurrence matrix based on statistical description is proved to have an important role in improving the classification precision of the wetland, so that eight texture features (homogeneity, contrast, non-similarity, entropy, angle second moment, mean value, correlation and standard deviation) which are commonly used for extracting texture information in the remote sensing image are calculated by utilizing the gray level co-occurrence matrix to serve as input texture feature variables; in addition, position characteristics, spectral characteristics, backscattering coefficients and polarization decomposition parameters are calculated and output, and finally a characteristic data set with multiple dimensions is constructed, wherein the vegetation index calculation formula is as follows:
CIgreenthe index is Green chlorophyll index, wherein NIR represents a hyperspectral near infrared band, and Green represents a hyperspectral Green band;
CIregis the red-edge chlorophyll index, where REG represents the red-edge band of the hyperspectral region;
NDVI is the normalized difference vegetation index, where RED represents the hyperspectral RED band;
RVI is a ratio vegetation index;
GNDVI is the green normalized difference vegetation index;
NDWI is the normalized differential water index;
and (6): performing dimension reduction and optimization on the feature data set by removing high correlation variables and a Boruta algorithm;
setting the optimal value of the correlation coefficient to be 0-1, setting the step length to be 0.05, removing variables larger than the threshold range of the correlation coefficient, performing variable selection on the feature data set with the high correlation variables removed by using a Boruta algorithm, taking the feature data set with the highest model training precision as the optimal feature data set, and setting the corresponding correlation coefficient as the optimal correlation coefficient;
further, the algorithm for variable selection by the Boruta algorithm is as follows:
1) randomly disordering each feature value of the feature matrix X to obtain shadow features, and splicing the shadow features and the original features to form a new feature matrix;
3) removing their relevance to the response by adding features;
3) calculating an importance score for each feature on the expanded feature set using a random forest;
4) for each real feature variable, the difference between the real feature variable and the maximum value of the importance of all shadow features is statistically tested, the real feature with the importance higher than that of the shadow features is defined as 'important', and the real feature with the importance lower than that of the shadow features is defined as 'unimportant';
5) removing shadow features and features defined as "not important";
6) the above process is repeated until all variables are designated as "important" or "unimportant".
And (7): performing parameter tuning on the base classifier by using the preferred characteristic data set as training data;
further, the used base classifiers are RF, XGboost and Catboost, and a grid tuning method is used for carrying out parameter tuning on each classifier;
using the multi _ loglos as an evaluation index, selecting a corresponding parameter value when the multi _ loglos is minimum as an optimal parameter, and using a model with the optimal parameter as an optimized optimal base model;
and (8): stacking and integrating the optimized base classifier by using a Stacking algorithm to construct a marsh wetland vegetation identification model;
further, Stacking and integrating the three classifiers optimized in the step (7) as base models by using a Stacking algorithm, wherein the process of training the models by using the Stacking integration is as follows:
inputting:
preferred feature data set D { (x)1,y1),(x2,y2),…,(xm,ym)};
Primary learning algorithm delta1,δ2,δ3,…,δT;
Secondary learning algorithmδs;
Training process:
and (3) outputting:
H(x)=h′(h1(x),h2(x),…,hT(x))
h (x) a final optimized Stacking swamp vegetation stack ensemble learning classification model;
and (9): classifying by using the constructed wetland vegetation classification model to obtain a wetland vegetation classification result;
inputting data to be classified by using the constructed optimization classification model to obtain a classification and distribution result of the vegetation of the marsh wetland;
step (10): evaluating the precision of the classification result;
inputting verification samples, calculating corresponding evaluation indexes through a confusion matrix to perform precision evaluation on classification results, wherein the precision evaluation comprises the following indexes:
UA represents the ratio of the total number of samples correctly classified as i-type swamp vegetation (diagonal value) to the total number of i-type ground objects classified by the classifier (sum of i-type swamp vegetation rows in the confusion matrix);
PA is user precision, which means that a classifier correctly divides samples of the whole image into the ratio of the number of samples (diagonal value) of i-type swamp vegetation to the total number of real samples (the sum of i-type swamp vegetation columns in a confusion matrix) of real i-type swamp vegetation;
OA is overall classification accuracy, which refers to the ratio of the sum of the number of samples of all correctly classified classes (the sum of diagonals in the confusion matrix) to the sum of the number of samples of all tests, where M is the total number of samples;
the Kappa coefficient is an index indicating how much better the classification result is than the random classification, XiiDiagonal elements, X, representing the confusion matrixi+Column sum, X, representing categories+iThe row sum of the category is represented.
Compared with the prior art, the invention has the advantages that:
according to the method, the advantages of rich spectral information of the hyperspectral image and the advantages of penetrating of the polarized SAR image through the vegetation canopy are integrated to realize high-precision identification and classification of the swamp vegetation; specifically, a backscattering coefficient and a polarization decomposition parameter are extracted from the fully polarized SAR data, and the hyperspectral data are integrated to construct a multi-dimensional segmentation data set so as to improve the dimensionality of classified information; the image data set is segmented by using a multi-scale segmentation algorithm based on inheritance, so that the segmentation precision is improved; high correlation variables are removed by setting correlation coefficients, and variable selection is carried out on a data set by using a Boruta algorithm, so that data redundancy is reduced, and the efficiency and the precision of a classification model are improved; and integrating a plurality of optimized machine learning classification models through a Stacking algorithm to construct a marsh vegetation classification model, thereby realizing high-precision identification and classification of marsh wetland vegetation.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention.
Fig. 2 is a diagram of classification results of swamp vegetation in different data source schemes.
FIG. 3 is a thermodynamic diagram of producer accuracy and user accuracy of wetland types integrating hyperspectral and different-frequency-band fully-polarized SAR data schemes.
FIG. 4 is a correlation matrix scatter diagram between correlation coefficients, the number of preferred variables and classification accuracy of each classifier in different schemes.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, the flow chart of the present invention is implemented by integrating hyperspectral images and different frequency band fully-polarized SAR images, performing multi-scale segmentation, high correlation variable elimination and variable optimization with a Boruta algorithm, constructing a multidimensional variable data set, performing stack integration on classification models with different optimized parameters by using a Stacking algorithm, constructing a marshland vegetation identification classification model, classifying data to be classified by using the model to obtain marshland vegetation classification results, and quantitatively evaluating the classification results by using evaluation indexes.
The various implementation steps are described separately below to provide reference.
Step (1): performing data preprocessing on the hyperspectral image, the C-band fully-polarized SAR image and the L-band fully-polarized SAR image;
performing radiometric calibration, atmospheric correction and orthorectification on the hyperspectral image by utilizing ENVI5.3, extracting backscattering coefficients of the C-band fully-polarized SAR image and the L-band fully-polarized SAR image, performing polarization decomposition on the C-band fully-polarized SAR image and the L-band fully-polarized SAR image in PolSARpro _ v6.0 by using a plurality of polarization decomposition methods including polarization coherent decomposition and polarization incoherent decomposition, and extracting polarization decomposition parameters;
the Krogager decomposition can be expressed as:
in which the absolute phase δ is an uncorrelated parameter, δsAnd ksFor describing the spherical scattering component, the phase parameters theta and deltasRespectively scattering by a ballOffset of the component with respect to the dihedral component and the helicoid component, azimuth angle of the dihedral component and the helicoid component, ksRepresents the final scattering matrix [ s ]]Contribution of (a), ks,kdAnd khWeights for the spherical component, the dihedral component, and the helicoid component, respectively;
the Huynen decomposition is to decompose the Mueller matrix into the sum of a single target scattering matrix and a matrix of a distributed target, and to express the sum by using a coherent scattering matrix, so that the coherent matrix of the time-varying target<[T3]>The real parameters that can be expressed in 9 degrees of freedom are:
wherein the specific meanings of the parameters are as follows:
A0a symmetry factor of the target;
B0-B, an asymmetry factor of the target;
B0+ B, the non-regularity factor of the target;
c, a configuration factor;
d, a measure of local curvature difference;
e, surface torsion;
f, helicity of the target:
g, coupling of symmetric and asymmetric portions;
h, directionality of the target.
Covariance matrix C of the Total backscatter model, polarization decomposition Using Freeman-Durden three-component Scattering mechanism model3Can be expressed as:
in the formula, fV、fsAnd fDThe polarization decomposition parameters respectively represent the weights of the volume scattering component, the secondary scattering component and the odd scattering component;
the Yamaguchi four-component scattering model adds a fourth scattering component on the basis of a three-component scattering model, which is equivalent to the scattering power of a helical scatterer, and the covariance matrix of the total backscattering model can be expressed as:
in the formula, the value of 10log (| S) is selectedVV|2/|SHH|2) Bulk scattering component for > 2 dB:
fS、fH、fV、fHthe four polarization decomposition parameters respectively correspond to the weights of surface scattering, secondary scattering, volume scattering and spiral scattering components;
step (2): performing operations such as wave band synthesis, geographic registration, cutting and the like on the preprocessed image to construct a segmentation data set;
performing operations such as wave band synthesis, geographic registration, cutting and the like on the hyperspectral image, the polarization decomposition parameter and the backscattering coefficient processed in the step (1) in ArcGIS 10.6 to construct a segmented multi-source data set;
and (3): performing inheritance-based multi-scale segmentation on the segmented data set;
carrying out inheritance-based multi-scale segmentation on a segmentation data set at eCoginization development 9.4, determining an optimal segmentation scale and parameters through repeated testing and adjustment, carrying out threshold segmentation on each land class in different segmentation scales, and finally combining the segmented land classes into a multi-scale segmented layer;
and (4): quantitatively evaluating the segmentation result through the segmentation evaluation index;
the ideal image segmentation result is: (1) the inside of the object obtained by segmentation has good homogeneity; (2) the segmented objects should have good heterogeneity with neighboring objects. According to the above segmentation criteria, the following segmentation indices are used to evaluate the segmentation quality:
v is global variance and is used for measuring the homogeneity inside the object, the smaller V is, the better the homogeneity inside the object is, and in the formula, V isiIs the standard deviation of the segmented object i, aiIs the area of the object i to be divided, n is the wholeThe total number of the divided objects of each image area;
i is a Moire index for evaluating heterogeneity between segmented objects, where n is the total number of segmented objects, wijRepresenting the adjacency relation between the object Ai and the object Bj, w if the object Ai and the object Bj are in adjacency ij1, otherwise w ij0; yi is the spectral average of the subject Ai,the lower the I is, the lower the correlation between the image segmentation objects is, namely the better the separability between the image objects is;
RMAS represents the ratio of the absolute value of the average difference between the object and the neighborhood to the standard deviation of the object and is used for evaluating the image segmentation scale, wherein L is the number of image wave band layers of the segmented data set, and delta CLSegmenting absolute value of difference from neighborhood mean, S, of object for single scale of L-band layer of segmented data setLStandard deviation of segmented objects at a single segmentation scale for the L-band layer, CLiIs the gray value of the pixel point of the L-band layer i,the mean value of the wave bands on a single scale segmentation layer is obtained, n is the number of pixels, and m is the direct phase with a target objectThe number of adjacent objects, d is the boundary length of the target object, dsjA common boundary length for the target object and the jth directly adjacent object;
and (5): performing feature calculation and output on the segmentation result to construct a multi-dimensional feature data set;
5 vegetation indices (CIgreen, CIreg, NDVI, RVI, GNDVI) and normalized Water index (NDWI) were calculated for the segmentation results in eCognion development developer 9.4; the gray level co-occurrence matrix based on statistical description is proved to have an important role in improving the classification precision of the wetland, so that eight texture features (homogeneity, contrast, non-similarity, entropy, angle second moment, mean value, correlation and standard deviation) which are commonly used for extracting texture information in the remote sensing image are calculated by utilizing the gray level co-occurrence matrix to serve as input texture feature variables; in addition, position characteristics, spectral characteristics, backscattering coefficients and polarization decomposition parameters are calculated and output, and finally a characteristic data set with multiple dimensions is constructed, wherein the vegetation index calculation formula is as follows:
CIgreenthe index is Green chlorophyll index, wherein NIR represents a hyperspectral near infrared band, and Green represents a hyperspectral Green band;
CIregis the red-edge chlorophyll index, where REG represents the red-edge band of the hyperspectral region;
NDVI is the normalized difference vegetation index, where RED represents the hyperspectral RED band;
RVI is a ratio vegetation index;
GNDVI is the green normalized difference vegetation index;
NDWI is the normalized differential water index;
and (6): performing dimension reduction and optimization on the feature data set by removing high correlation variables and a Boruta algorithm;
setting the optimization value of a correlation coefficient to be 0-1, setting the step length to be 0.05, removing variables larger than the threshold range of the correlation coefficient, selecting the variables of a feature data set from which high correlation variables are removed by using a Boruta algorithm, taking the feature data set with the highest model training precision as a preferred feature data set, and taking the corresponding correlation coefficient as the optimal correlation coefficient, wherein the Boruta algorithm is a feature sorting and selecting algorithm based on a random forest algorithm, has the advantages that the Boruta algorithm considers the fluctuation of the average accuracy loss among trees in a forest and can clearly determine whether a variable is important, and the algorithm for selecting the variables by the Boruta algorithm is as follows:
1) randomly disordering each feature value of the feature matrix X to obtain shadow features, and splicing the shadow features and the original features to form a new feature matrix;
4) removing their relevance to the response by adding features;
3) calculating an importance score for each feature on the expanded feature set using a random forest;
4) for each real feature variable, the difference between the real feature variable and the maximum value of the importance of all shadow features is statistically tested, the real feature with the importance higher than that of the shadow features is defined as 'important', and the real feature with the importance lower than that of the shadow features is defined as 'unimportant';
5) removing shadow features and features defined as "not important";
6) the above process is repeated until all variables are designated as "important" or "unimportant".
And (7): performing parameter tuning on the base classifier by using the preferred characteristic data set as training data;
the method uses RF, XGboost and Catboost as base classifiers, and uses a grid tuning method to carry out parameter tuning on each classifier;
the RF classifier is a classifier which trains and predicts a sample by utilizing a plurality of decision trees, has the advantages of difficult overfitting, capability of processing high-dimensional data, simple realization, high training speed and the like, and sets the optimization range of mtry to be 500-2500 and the step length to be 500 for the optimized parameters of the RF classifier, namely the node value mtry and the number ntree of the decision trees;
the XGboost classifier is an improvement on a boosting algorithm on the basis of GBDT, an internal decision tree uses a regression tree, and the XGboost classifier has the advantages of high speed, good effect, capability of processing large-scale data, support of a user-defined loss function and the like, and the parameters optimized by the XGboost classifier are set as follows:
the Catboost classifier is also an improved implementation under a GBDT algorithm framework, and the Catboost solves the problems of gradient deviation and prediction deviation, so that the occurrence of overfitting is reduced, the accuracy and generalization capability of the algorithm are improved, and the optimized parameters of the Catboost classifier are set as follows:
using the multi _ loglos as an evaluation index, selecting a corresponding parameter value when the multi _ loglos is minimum as an optimal parameter, and using a model with the optimal parameter as an optimized optimal base model;
and (8): stacking and integrating the optimized base classifier by using a Stacking algorithm to construct a marsh wetland vegetation identification model;
stacking and integrating the three classifiers optimized in the step (7) as base models by using a Stacking algorithm, wherein the process of training the models by using the Stacking integration is as follows:
inputting:
preferred feature data set D { (x)1,y1),(x2,y2),…,(xm,ym)};
Primary learning algorithm delta1,δ2,δ3,…,δr;
Secondary learning algorithm deltas;
Training process:
and (3) outputting:
H(x)=h′(h1(x),h2(x),…,hT(x))
h (x) a final optimized Stacking swamp vegetation stack ensemble learning classification model;
and (9): classifying by using the constructed wetland vegetation classification model to obtain a wetland vegetation classification result;
inputting data to be classified by using the constructed optimization classification model to obtain a classification and distribution result of the vegetation of the marsh wetland;
step (10): evaluating the precision of the classification result;
inputting verification samples, calculating corresponding evaluation indexes through a confusion matrix to perform precision evaluation on classification results, wherein the precision evaluation comprises the following indexes:
UA represents the ratio of the total number of samples correctly classified as i-type swamp vegetation (diagonal value) to the total number of i-type ground objects classified by the classifier (sum of i-type swamp vegetation rows in the confusion matrix);
PA is user precision, which means that a classifier correctly divides samples of the whole image into the ratio of the number of samples (diagonal value) of i-type swamp vegetation to the total number of real samples (the sum of i-type swamp vegetation columns in a confusion matrix) of real i-type swamp vegetation;
OA is overall classification accuracy, which refers to the ratio of the sum of the number of samples of all correctly classified classes (the sum of diagonals in the confusion matrix) to the sum of the number of samples of all tests, where M is the total number of samples;
the Kappa coefficient is an index indicating how much better the classification result is than the random classification, XiiDiagonal elements, X, representing the confusion matrixi+Column sum, X, representing categories+iThe row sum of the category is represented.
Fig. 2 is a graph of classification results of swamp vegetation in different data source schemes, and it can be seen through comparison of the classification results that better classification extraction effects can be obtained by integrating hyperspectral data and different frequency band fully-polarized SAR data, the land classification is obvious in distinguishing, and the distribution is accurate.
Fig. 3 is a thermodynamic diagram of producer precision and user precision of various land types of a marsh integrating hyperspectral and different frequency band fully-polarized SAR data schemes, and it is seen that the producer precision and the user precision of each vegetation type are both higher.
FIG. 4 is a scatter diagram of correlation matrix between correlation coefficients, number of preferred variables and classification accuracy of each classifier in different schemes. As shown in the figure, the number of preferred variables for each vegetation type and the classification accuracy of each classifier are relatively high.
According to the method, the advantages of rich spectral information of the hyperspectral image and the advantages of penetrating of the polarized SAR image through the vegetation canopy are integrated to realize high-precision identification and classification of the swamp vegetation; extracting a backscattering coefficient and a polarization decomposition parameter from the fully polarized SAR data, and integrating the hyperspectral data to construct a multi-dimensional segmentation data set so as to improve the dimensionality of classification information; the image data set is segmented by using a multi-scale segmentation algorithm based on inheritance, so that the segmentation precision is improved; high correlation variables are removed by setting correlation coefficients, and variable selection is carried out on a data set by using a Boruta algorithm, so that data redundancy is reduced, and the efficiency and the precision of a classification model are improved; and integrating a plurality of optimized machine learning classification models through a Stacking algorithm to construct a marsh vegetation classification model, thereby realizing high-precision identification and classification of marsh wetland vegetation.
In summary, the advantages of the present invention are as follows:
1. by integrating the hyperspectral images and the different-frequency-band fully-polarized SAR images, the penetrating recognition capability of hyperspectral resolution advantage and different-frequency-band polarized SAR data on vegetation structures is fully utilized, and the problem that various swamp vegetation are mixed up and cannot be accurately recognized due to single data source is solved.
2. By carrying out inheritance-based multi-scale threshold segmentation on the segmentation data set by using the optimal segmentation parameters after quantitative evaluation, the segmentation quality is improved, and good training data is provided for subsequent model establishment.
3. The problems of low model operation efficiency and low classification precision caused by data redundancy are solved by finding and setting the optimal correlation coefficient to eliminate high correlation variables and using a Boruta algorithm to perform feature optimization.
4. The Stacking algorithm is used for optimizing parameters of the multiple classifiers respectively and Stacking the classifiers together, so that the advantages of the classifiers are integrated, and the classification capability of the model on the marsh wetland vegetation with high landscape heterogeneity and complex composition is improved.
The present invention is not limited to the above preferred embodiments, and any modifications, equivalent substitutions and improvements made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (9)
1. A method for integrating, learning and classifying marsh vegetation stacks by integrating hyperspectral and multiband fully-polarized SAR images is characterized in that the hyperspectral and multiband fully-polarized SAR images are integrated, variable optimization is performed through multi-scale segmentation, high correlation variables are eliminated and a Boruta algorithm, a multi-dimensional variable data set is constructed, classification models with optimized different parameters are stacked and integrated by using a Stacking algorithm, a marsh vegetation identification classification model is constructed, finally, classification data are classified by using the model to obtain marsh wetland vegetation classification results, and quantitative evaluation is performed on the classification results by using evaluation indexes.
2. The method for integrated learning and classifying of a marsh vegetation stack for integrating hyperspectral and multiband fully-polarized SAR images according to claim 1, characterized in that the method further comprises the following steps:
step (1): performing data preprocessing on the hyperspectral image, the C-band fully-polarized SAR image and the L-band fully-polarized SAR image;
performing radiometric calibration, atmospheric correction and orthorectification on the hyperspectral image by utilizing ENVI5.3, extracting backscattering coefficients of the C-band fully-polarized SAR image and the L-band fully-polarized SAR image, performing polarization decomposition on the C-band fully-polarized SAR image and the L-band fully-polarized SAR image in PolSARpro _ v6.0 by using a plurality of polarization decomposition methods including polarization coherent decomposition and polarization incoherent decomposition, and extracting polarization decomposition parameters;
step (2): performing operations such as wave band synthesis, geographic registration, cutting and the like on the preprocessed image to construct a segmentation data set;
performing operations such as wave band synthesis, geographic registration, cutting and the like on the hyperspectral image, the polarization decomposition parameter and the backscattering coefficient processed in the step (1) in ArcGIS 10.6 to construct a segmented multi-source data set;
and (3): performing inheritance-based multi-scale segmentation on the segmented data set;
and (4): quantitatively evaluating the segmentation result through the segmentation evaluation index;
and (5): performing feature calculation and output on the segmentation result to construct a multi-dimensional feature data set;
and (6): performing dimension reduction and optimization on the feature data set by removing high correlation variables and a Boruta algorithm;
setting the optimal value of the correlation coefficient to be 0-1, setting the step length to be 0.05, removing variables larger than the threshold range of the correlation coefficient, performing variable selection on the feature data set with the high correlation variables removed by using a Boruta algorithm, taking the feature data set with the highest model training precision as the optimal feature data set, and setting the corresponding correlation coefficient as the optimal correlation coefficient;
and (7): performing parameter tuning on the base classifier by using the preferred characteristic data set as training data;
and (8): stacking and integrating the optimized base classifier by using a Stacking algorithm to construct a marsh wetland vegetation identification model;
and (9): classifying by using the constructed wetland vegetation classification model to obtain a wetland vegetation classification result;
inputting data to be classified by using the constructed optimization classification model to obtain a classification and distribution result of the vegetation of the marsh wetland;
step (10): evaluating the precision of the classification result;
inputting verification samples, calculating corresponding evaluation indexes through a confusion matrix to perform precision evaluation on classification results, wherein the precision evaluation comprises the following indexes:
UA represents the ratio of the total number of samples correctly classified as i-type swamp vegetation (diagonal value) to the total number of i-type ground objects classified by the classifier (sum of i-type swamp vegetation rows in the confusion matrix);
PA is user precision, which means that a classifier correctly divides samples of the whole image into the ratio of the number of samples (diagonal value) of i-type swamp vegetation to the total number of real samples (the sum of i-type swamp vegetation columns in a confusion matrix) of real i-type swamp vegetation;
OA is overall classification accuracy, which refers to the ratio of the sum of the number of samples of all correctly classified classes (the sum of diagonals in the confusion matrix) to the sum of the number of samples of all tests, where M is the total number of samples;
the Kappa coefficient is an index indicating how much better the classification result is than the random classification, XiiDiagonal elements, X, representing the confusion matrixi+Column sum, X, representing categories+iThe row sum of the category is represented.
3. The method for integrating the stack ensemble learning and classifying of the swamp vegetation with the hyperspectral and multiband fully-polarized SAR image as claimed in claim 2, wherein in the step (1), further Krogager decomposition can be expressed as:
in which the absolute phase δ is an uncorrelated parameter, δsAnd ksFor describing the spherical scattering component, the phase parameters theta and deltasThe offset of the spherical scattering component with respect to the dihedral component and the helical component, the azimuth angle of the dihedral component and the helical component, k, respectivelysRepresents the final scattering matrix [ s ]]Contribution of (a), ks,kdAnd khWeights for the spherical component, the dihedral component, and the helicoid component, respectively;
the Huynen decomposition is to decompose the Mueller matrix into the sum of a single target scattering matrix and a matrix of a distributed target, and to express the sum by using a coherent scattering matrix, so that the coherent matrix of the time-varying target<[T3]>The real parameters that can be expressed in 9 degrees of freedom are:
wherein the specific meanings of the parameters are as follows:
A0a symmetry factor of the target;
B0-B, an asymmetry factor of the target;
B0+ B, the non-regularity factor of the target;
c, a configuration factor;
d, a measure of local curvature difference;
e, surface torsion;
f, helicity of the target:
g, coupling of symmetric and asymmetric portions;
h, directionality of the target.
Covariance matrix C of the Total backscatter model, polarization decomposition Using Freeman-Durden three-component Scattering mechanism model3Can be expressed as:
in the formula, fV、fsAnd fDThe polarization decomposition parameters respectively represent the weights of the volume scattering component, the secondary scattering component and the odd scattering component;
the Yamaguchi four-component scattering model adds a fourth scattering component on the basis of a three-component scattering model, which is equivalent to the scattering power of a helical scatterer, and the covariance matrix of the total backscattering model can be expressed as:
in the formula, the value of 10log (| S) is selectedVV|2/|SHH|2) Bulk scattering component for > 2 dB:
fS、fH、fV、fHthe four polarization decomposition parameters respectively correspond to the weights of the surface scattering, secondary scattering, volume scattering and helical scattering components.
4. The method for integrating the hyperspectral and multiband fully-polarized SAR images into the marsh vegetation stack integrated learning classification as claimed in claim 3, wherein in step (3), the segmentation data set is subjected to inheritance-based multi-scale segmentation at eCoginization development scope 9.4, the optimal segmentation scale and parameters are determined through repeated testing and adjustment, threshold segmentation is performed in different segmentation scales for each land type, and finally, the segmented land types are combined into a multi-scale segmented image layer.
5. The integrated learning and classifying method for the stack of the marsh vegetation with the integrated hyperspectral and multiband fully-polarized SAR images as claimed in claim 4 is characterized in that in the step (4), the following segmentation indexes are used to evaluate the segmentation quality:
v is global variance and is used for measuring the homogeneity inside the object, the smaller V is, the better the homogeneity inside the object is, and in the formula, V isiIs the standard deviation of the segmented object i, aiIs the area of the segmented object i, and n is the total number of the segmented objects in the whole image area;
i is a Moire index for evaluating heterogeneity between segmented objects, where n is the total number of segmented objects, wijRepresenting the adjacency relation between the object Ai and the object Bj, w if the object Ai and the object Bj are in adjacencyij1, otherwise wij0; yi is the spectral average of the subject Ai,the lower the I is, the lower the correlation between the image segmentation objects is, namely the better the separability between the image objects is;
RMAS represents the ratio of the absolute value of the average difference between the object and the neighborhood to the standard deviation of the object and is used for evaluating the image segmentation scale, wherein L is the number of image wave band layers of the segmented data set, and delta CLSegmenting absolute value of difference from neighborhood mean, S, of object for single scale of L-band layer of segmented data setLStandard deviation of segmented objects at a single segmentation scale for the L-band layer, CLiIs the gray value of the pixel point of the L-band layer i,is the average value of wave bands on a single scale segmentation layer, n is the number of pixels, m is the number of objects directly adjacent to a target object, d is the boundary length of the target object, dsjAs the target object and the jth lineNext to the common boundary length of the adjacent objects.
6. The integrated learning classification method for the stack of swamp vegetation for integrating hyperspectral and multiband fully-polarized SAR images according to claim 1, wherein in step (5), further, 5 vegetation indexes (CIgreen, cereg, NDVI, RVI, GNDVI) and normalized water index (NDWI) are calculated for the segmentation results in ecorging development developer 9.4; the gray level co-occurrence matrix based on statistical description is proved to have an important role in improving the classification precision of the wetland, so that eight texture features (homogeneity, contrast, non-similarity, entropy, angle second moment, mean value, correlation and standard deviation) which are commonly used for extracting texture information in the remote sensing image are calculated by utilizing the gray level co-occurrence matrix to serve as input texture feature variables; in addition, position characteristics, spectral characteristics, backscattering coefficients and polarization decomposition parameters are calculated and output, and finally a characteristic data set with multiple dimensions is constructed, wherein the vegetation index calculation formula is as follows:
CIgreenthe index is Green chlorophyll index, wherein NIR represents a hyperspectral near infrared band, and Green represents a hyperspectral Green band;
CIregis the red-edge chlorophyll index, where REG represents the red-edge band of the hyperspectral region;
NDVI is the normalized difference vegetation index, where RED represents the hyperspectral RED band;
RVI is a ratio vegetation index;
GNDVI is the green normalized difference vegetation index;
NDWI is the normalized differential water index.
7. The method for integrated learning and classifying of the stack of swamp vegetation for integrating hyperspectral and multiband fully-polarized SAR images according to claim 1, wherein in the step (6), the algorithm for variable selection by the Boruta algorithm is as follows:
1) randomly disordering each feature value of the feature matrix X to obtain shadow features, and splicing the shadow features and the original features to form a new feature matrix;
2) removing their relevance to the response by adding features;
3) calculating an importance score for each feature on the expanded feature set using a random forest;
4) for each real feature variable, the difference between the real feature variable and the maximum value of the importance of all shadow features is statistically tested, the real feature with the importance higher than that of the shadow features is defined as 'important', and the real feature with the importance lower than that of the shadow features is defined as 'unimportant';
5) removing shadow features and features defined as "not important";
6) the above process is repeated until all variables are designated as "important" or "unimportant".
8. The integrated learning and classifying method for the marsh vegetation stack integrating the hyperspectral and multiband fully-polarized SAR images according to claim 1, wherein in the step (7), the used base classifiers are RF, XGboost and Catboost, and a grid tuning method is used for carrying out parameter tuning on each classifier;
and using the multi _ loglos as an evaluation index, selecting a corresponding parameter value when the multi _ loglos is minimum as an optimal parameter, and using the model with the optimal parameter as an optimized optimal base model.
9. The method for integrating the hyperspectral and multiband fully-polarized SAR images into the bogus vegetation stack integrated learning and classification as claimed in claim 8, wherein in step (8), the Stacking algorithm is used to stack and integrate the three classifiers optimized in step (7) as the base models, and the process of using the Stacking integrated training model is as follows:
inputting:
preferred feature data set D { (x)1,y1),(x2,y2),…,(xm,ym)};
Primary learning algorithm delta1,δ2,δ3,…,δT;
Secondary learning algorithm deltas;
Training process:
and (3) outputting:
H(x)=h′(h1(x),h2(x),…,hT(x))
h (x) integrating and learning a classification model for the final optimized Stacking marsh vegetation stack.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111461864.XA CN114266961A (en) | 2021-12-02 | 2021-12-02 | Method for integrating, learning and classifying marsh vegetation stacks by integrating hyperspectral and multiband fully-polarized SAR images |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111461864.XA CN114266961A (en) | 2021-12-02 | 2021-12-02 | Method for integrating, learning and classifying marsh vegetation stacks by integrating hyperspectral and multiband fully-polarized SAR images |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114266961A true CN114266961A (en) | 2022-04-01 |
Family
ID=80826277
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111461864.XA Pending CN114266961A (en) | 2021-12-02 | 2021-12-02 | Method for integrating, learning and classifying marsh vegetation stacks by integrating hyperspectral and multiband fully-polarized SAR images |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114266961A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115205693A (en) * | 2022-09-16 | 2022-10-18 | 中国石油大学(华东) | Multi-feature ensemble learning dual-polarization SAR image enteromorpha extracting method |
CN115862010A (en) * | 2022-09-09 | 2023-03-28 | 滁州学院 | High-resolution remote sensing image water body extraction method based on semantic segmentation model |
CN117029701B (en) * | 2023-10-09 | 2023-12-15 | 交通运输部天津水运工程科学研究所 | Coastal water area non-contact type oil spill monitoring method |
-
2021
- 2021-12-02 CN CN202111461864.XA patent/CN114266961A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115862010A (en) * | 2022-09-09 | 2023-03-28 | 滁州学院 | High-resolution remote sensing image water body extraction method based on semantic segmentation model |
CN115862010B (en) * | 2022-09-09 | 2023-09-05 | 滁州学院 | High-resolution remote sensing image water body extraction method based on semantic segmentation model |
CN115205693A (en) * | 2022-09-16 | 2022-10-18 | 中国石油大学(华东) | Multi-feature ensemble learning dual-polarization SAR image enteromorpha extracting method |
CN115205693B (en) * | 2022-09-16 | 2022-12-02 | 中国石油大学(华东) | Method for extracting enteromorpha in multi-feature integrated learning dual-polarization SAR image |
CN117029701B (en) * | 2023-10-09 | 2023-12-15 | 交通运输部天津水运工程科学研究所 | Coastal water area non-contact type oil spill monitoring method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110321963B (en) | Hyperspectral image classification method based on fusion of multi-scale and multi-dimensional space spectrum features | |
CN108764005B (en) | A kind of high-spectrum remote sensing atural object space Spectral Characteristic extracting method and system | |
CN106203523B (en) | The hyperspectral image classification method of the semi-supervised algorithm fusion of decision tree is promoted based on gradient | |
CN106529508B (en) | Based on local and non local multiple features semanteme hyperspectral image classification method | |
CN110008905B (en) | Regional scale wheat stripe rust monitoring method based on red-edge wave band of remote sensing image | |
CN114266961A (en) | Method for integrating, learning and classifying marsh vegetation stacks by integrating hyperspectral and multiband fully-polarized SAR images | |
CN107145836B (en) | Hyperspectral image classification method based on stacked boundary identification self-encoder | |
CN106503739A (en) | The target in hyperspectral remotely sensed image svm classifier method and system of combined spectral and textural characteristics | |
CN103440505B (en) | The Classification of hyperspectral remote sensing image method of space neighborhood information weighting | |
CN105760900B (en) | Hyperspectral image classification method based on neighbour's propagation clustering and sparse Multiple Kernel Learning | |
CN104182767B (en) | The hyperspectral image classification method that Active Learning and neighborhood information are combined | |
CN104732244B (en) | The Classifying Method in Remote Sensing Image integrated based on wavelet transformation, how tactful PSO and SVM | |
CN103208011B (en) | Based on average drifting and the hyperspectral image space-spectral domain classification method organizing sparse coding | |
CN109598306A (en) | Hyperspectral image classification method based on SRCM and convolutional neural networks | |
CN113936214B (en) | Karst wetland vegetation community classification method based on fusion of aerospace remote sensing images | |
CN104298999B (en) | EO-1 hyperion feature learning method based on recurrence autocoding | |
CN103544507A (en) | High spectral data dimension reduction method based on pairwise constraint discriminant analysis-nonnegative sparse divergence | |
Luo et al. | Comparison of machine learning algorithms for mapping mango plantations based on Gaofen-1 imagery | |
CN114694036A (en) | High-altitude area crop classification and identification method based on high-resolution images and machine learning | |
CN110147725A (en) | A kind of high spectrum image feature extracting method for protecting projection based on orthogonal index office | |
CN115512162A (en) | Terrain classification method based on attention twin network and multi-mode fusion features | |
CN105930863A (en) | Determination method for spectral band setting of satellite camera | |
CN109145950B (en) | Hyperspectral image active learning method based on image signal sampling | |
Wang et al. | Expansion spectral–spatial attention network for hyperspectral image classification | |
Ahmad et al. | Remote sensing based vegetation classification using machine learning algorithms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |