CN117372767A

CN117372767A - Hyperspectral image tree classification method, device and storage medium

Info

Publication number: CN117372767A
Application number: CN202311366203.8A
Authority: CN
Inventors: 杨跞文; 顾予航; 张译文; 李文梅
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2023-10-20
Filing date: 2023-10-20
Publication date: 2024-01-09

Abstract

The invention discloses a hyperspectral image tree classification method, a hyperspectral image tree classification device and a storage medium, which comprise the following steps: after multi-scale segmentation is carried out on the hyperspectral image, inputting a pre-trained CNN model for leveling treatment, and obtaining a merging layer of the hyperspectral image; extracting texture and spectral features from the hyperspectral image; texture features and spectral features are input into image features with attention as input of a transducer model, and position codes are added to the input section to obtain image features with relative position information. The output features are then sent to a decoder to obtain a classification result. Various tree types of distribution can be obtained, tree classification is realized, and monitoring and supervision are facilitated. Meanwhile, due to the similarity of the spectrum information of the plants, the plant spectrum information is very difficult to identify by naked eyes, and the establishment of automatic tree species identification is particularly important.

Description

Hyperspectral image tree classification method, device and storage medium

Technical Field

The invention discloses a hyperspectral image tree classification method, a hyperspectral image tree classification device and a storage medium, and relates to the field of automatic control of hydraulic engineering construction.

Background

The forest tree species fine classification based on hyperspectral remote sensing data is mainly represented on a spectrum matching classification algorithm, a spectrum characteristic and a statistical analysis method. The three most widely applied directions of fine identification of hyperspectral remote sensing forest tree species are an early traditional classification method, a multi-source remote sensing data collaboration method and a deep learning-based method.

However, the conventional machine learning classification method has a great limitation in hyperspectral images. The tree classification by using the machine learning method needs to carry out the data dimension reduction step, but the hyperspectral image contains a large number of continuous narrow bands, and cannot be fully utilized after dimension reduction treatment. Secondly, machine learning has reached the ceiling in terms of current algorithms and classification accuracy, and is difficult to optimize from other angles;

in the multisource remote sensing data collaboration method, liDAR data are not easy to acquire, airborne data coverage area is small, large-scale acquisition cannot be achieved, and satellite-borne data generally adopt a large footprint mode, so that application requirements of forest tree species fine research are difficult to meet;

deep learning based methods require a large number of training samples and require high quality of the marker data, which limits their practical application. Meanwhile, a large number of parameters may cause a problem of over fitting of the model in the training process, resulting in insufficient generalization capability of the model. Secondly, when extracting the spatial spectrum joint features, the size of the spatial neighborhood, the size, the structure and the complexity of the network and the size of the input data space need to be considered, which has great influence on the classification capability of the model.

Disclosure of Invention

Aiming at the defects in the background technology, the invention provides a hyperspectral image tree classification method, a hyperspectral image tree classification device and a storage medium, and the scheme of combining multiscale segmentation with a transducer model achieves various tree type distributions and tree classification.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: a hyperspectral image tree classification method comprises the following steps:

acquiring a hyperspectral image;

denoising, filtering and dimension reduction preprocessing are carried out on the hyperspectral image;

carrying out multi-scale segmentation on the preprocessed hyperspectral image;

inputting the segmented hyperspectral image into a pre-trained CNN model for flattening, and obtaining a flattened hyperspectral image;

extracting the flattened hyperspectral image through a gray level co-occurrence matrix to obtain texture features;

performing independent principal component analysis on the flattened hyperspectral image to obtain spectral features of different wavebands, and selecting the spectral features of the first m wavebands with more spectral features as extracted spectral features;

inputting the extracted texture features and spectrum features into a double-head attention network model to obtain output image features with double-head attention;

taking the image features with double-head attention as input of a transducer model, and adding a position code to obtain the image features with relative position information;

and encoding the image features with the relative position information, decoding the encoded image features and the output of class embedding, up-sampling, classifying each pixel, and outputting a hyperspectral image tree classification result.

Further, the multi-scale segmentation of the preprocessed hyperspectral image specifically includes: judging the size of the change rate value of the local change of the homogeneity of the preprocessed hyperspectral image under different segmentation scale parameters, and segmenting the hyperspectral image according to the corresponding scale value when the local change rate value is the maximum.

Further, inputting the segmented hyperspectral image into a pretrained CNN model for flattening, and obtaining a flattened hyperspectral image:

the pre-trained CNN model includes: firstly, convoluting hyperspectral data with a 3D kernel, performing preliminary pre-training, performing structure extraction and comparison on true and false hyperspectral images of each input wave band through 3-layer convolution, and finally judging whether the input image is the probability of the true hyperspectral image or not through a leakage ReLU function;

the flattening method specifically comprises the following steps: the three-dimensional layer in the CNN model is converted to a one-dimensional vector using a "flattening layer".

Further, inputting the extracted texture features and spectrum features into a dual-head attention network model to obtain output image features with dual-head attention, wherein the method specifically comprises the following steps: the texture features and the spectrum features are input into a double-head attention network, three attention features are obtained through 1*1 convolution layers with three different weights, the first attention feature is transposed and multiplied by the second attention feature, and then the result is input into a Softmax function to obtain an attention map; after the obtained attention attempts are transposed and multiplied by the third attention feature matrix, the layers are convolved by 1*1 to obtain the image features with double-headed attention.

Further, the double-headed attention network model β employs the following formula:

where N is the number of image features, s _i Scoring a function for attention;

s _i ＝(W _f x) ^T *(W _g x)

wherein x is the image attention characteristic extracted by the convolution network, W _f And W is _g Is two weight matrixes, and is realized by 1*1 convolution; t is the transposition of the matrix;

image feature x with double-headed attention ⁰ Is as follows:

wherein h (x _i )＝W _h x _i ；W _h And W is _v Is two weight matrices; x is x _i Beta is the attention network model function for the input image features.

Further, taking the image features with double-headed attention as input of a transducer model, and adding position codes to obtain the image features with relative position information, wherein the method specifically comprises the following steps of:

taking the image characteristics with double-head attention as input of a transducer model, adding position codes, and obtaining the image characteristics with relative position information specifically comprises the following steps:

wherein: x is x ^0′ For image features with relative position information,to define a position vector.

Further, coding the image features with the relative position information, decoding the coded image features and the output of class embedding, up-sampling and classifying each pixel, and outputting a hyperspectral image tree classification result specifically comprises the following steps of;

wherein: b is an index matrix according to each pixel position in the relative position index matrix, softMax is the dimension normalization of the respective row vector, i.e. divided byd _k Is the feature dimension. Q represents a query, K represents a key and V represents a value, W _Q 、W _K 、W _V Respectively Q, K, V. The final output can be obtained by carrying out dot product operation on the query and the key, carrying out normalization processing to obtain the weight of each value, multiplying and summing the weight and the value.

A hyperspectral image tree classification device comprises a processor and a storage medium; the storage medium is used for storing instructions; the processor is used for operating according to the instruction to execute the steps of the hyperspectral image tree classification method.

A storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the hyperspectral image tree species classification method described above.

The beneficial effects are that: according to the method, various tree types are distributed through the scheme of combining multi-scale segmentation with a transducer model, tree classification is achieved, and monitoring and supervision are facilitated. Meanwhile, due to the similarity of the spectrum information of the plants, the plant spectrum information is very difficult to identify by naked eyes, and the establishment of automatic tree species identification is particularly important.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The implementation of the technical solution is described in further detail below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.

As shown in fig. 1, a hyperspectral image tree classification method includes:

s1, acquiring hyperspectral images;

step S2, denoising, filtering and dimension reduction pretreatment are carried out on the hyperspectral image;

s3, carrying out multi-scale segmentation on the preprocessed hyperspectral image;

s4, inputting the segmented hyperspectral image into a pre-trained CNN model for flattening, and obtaining a flattened hyperspectral image;

s5, extracting the flattened hyperspectral image through a gray level co-occurrence matrix to obtain texture features;

s6, performing independent principal component analysis on the flattened hyperspectral image to obtain spectral features of different wavebands, and selecting the spectral features of the first m wavebands with more spectral features as extracted spectral features;

step S7, inputting the extracted texture features and spectrum features into a double-head attention network model to obtain output image features with double-head attention and fusing feature layers;

step S8, taking the image features with attention as input of a transducer model, and adding position codes into an input part to obtain the image features with relative position information;

and S9, encoding the image features with the relative position information by using an encoder, decoding the encoder and the output embedded by the class by using a Mask converter, up-sampling, classifying each pixel, and outputting a hyperspectral image tree classification result.

Step S2, performing denoising, filtering and dimension reduction preprocessing on the hyperspectral image, including:

hyperspectral image denoising is the process of separating sparse noise S and gaussian noise N from a three-dimensional tensor Y to obtain a pure image X; the solution adopted in this embodiment is to add an unknown hyperspectral image and mix early growth prior information; in this framework, the most widely used denoising method is Robust Principal Component Analysis (RPCA); the robust principal component analysis can be written as the following optimization problem:

the hyperspectral data of the unmanned aerial vehicle is filtered and smoothed by using a Savitzky Golay method, and the filtering method is a filtering method based on time domain local polynomial least squares fitting, and can filter noise and simultaneously keep the shape and the width of the signal.

Principal Component Analysis (PCA) is the most basic dimension reduction method of hyperspectral data, and plays an important role in hyperspectral data compression, decorrelation, denoising and feature extraction.

In the principal component analysis transformation of hyperspectral remote sensing data, each band is usually regarded as a vector, and the spatial dimension of the image is m×nm×n, assuming that hyperspectral remote sensing data has pp bands. The specific process flow is as follows:

image vectorization: the input image data may be expressed as x= (X1, X2, …, xp) tx= (X1, X2, …, xp) T, where xix is an n×1n×1 column vector, where n=m×nn=m×n expands and connects images in rows or columns with a rule called a vector.

Vector centering: the average vector of the vector set is subtracted from all vectors in the vector set, i.e., y=x-E (X).

The covariance matrix Σ of the vector group YY is calculated.

The eigenvalue matrix ΛΛ and eigenvector matrix AA of the covariance matrix Σ are solved.

The principal component conversion is performed, z=atiyz=atiy.

Step S3, carrying out multi-scale segmentation on the preprocessed hyperspectral image, wherein the step comprises the following steps:

considering that hyperspectral images are often affected by noise and weak edges, isolated regions can be easily generated using conventional region segmentation algorithms; therefore, the present embodiment adopts an improved algorithm, a multi-scale segmentation method; in this algorithm, a segmentation threshold needs to be determined, which will have a significant impact on the segmentation result; therefore, in this embodiment, the transformation matrix of the image is obtained through the adjacent differential transformation, and the segmentation threshold is obtained through the statistical information of the transformation matrix; the specific steps of the segmentation are as follows:

(1) Applying a gaussian filter to the image I to obtain a smoothed image I' =i×k, where K is a gaussian kernel function;

(2) The maximum gradient transformation is performed on the segmented images I and I 'according to the following formula, resulting in maximum gradient transformation matrices MGT (I) and MGT (I').

MGT(I)＝max(|MGT _i (I)|)i＝1,2,3,4

MGT(I′)＝max(|MGT _i (I′)|)i＝1,2,3,4

(3) Obtaining a segmentation threshold based on statistical information of MGT (I) and MGT (I ') λ and λ';

(4) The original image I and the smooth image I 'are respectively subjected to region growing segmentation, and the results are respectively stored in RG and RG';

(5) Integrating the segmentation results RG and RG' to obtain a final segmentation result Map;

the matrix Map is the result of dividing the image I, in which the black part is the target area (gray value is filled with 0) and the white part is the background area (gray value is filled with 1).

The purpose of step (1) is to achieve a multi-scale representation of the image, wherein the original image I and the smoothed image I' are representations of the image I on different scales; wherein, I 'has eliminated some noise in the image I, but also weakened the edge of the original image, so its segmentation result RG' can effectively eliminate the influence of noise, but the segmentation effect at the region edge is poor; the segmentation result RG of the original hyperspectral image is greatly influenced by noise, the decomposition effect is poor, but the effect is good at the edge of the region; thus, in step (5), combining the two may combine their advantages.

converting three-dimensional layers in the network into one-dimensional vectors using "flattening layers" to accommodate the input of fully connected layers for classification; for example, a 5x5x2 tensor is converted into a vector of size 50; the previous network convolution layer extracted features from the input hyperspectral image, but now classified these features at the time; classifying these functions using a transducer model requires one-dimensional input, which is why flattening of the layers is required.

In some embodiments, step S5, extracting texture features from the flattened hyperspectral image through a gray level co-occurrence matrix;

in a specific implementation, since the texture of vegetation in the hyperspectral image generally has no obvious directivity, the moving direction is an average value of 0 °, 45 °, 90 °, 135 ° 4 directions, and the gradient window sizes of 3×3,5×5, …,31×31 and the step length is 1, so as to extract the texture features.

In the embodiment, a plurality of wave bands with high definition, low interference information and obvious ground characteristic information are selected for texture analysis, so that the difference of various ground characteristic image characteristics can be fully displayed; eight texture features are extracted, including mean, variance, homogeneity, contrast, difference, entropy, second moment and correlation; the appropriate texture window size is selected to achieve the highest overall classification accuracy of the tree species.

S6, performing independent principal component analysis on the flattened hyperspectral image to obtain spectral features of different wavebands, and selecting the spectral features of the first m (m=5) wavebands with more spectral features as extracted spectral features;

step S7, inputting the texture features and the spectrum features extracted in the step S5 and the step S6 into a double-head attention network model to obtain output image features with double-head attention and fusing feature layers; comprising the following steps:

texture and spectral features are input into a dual-head attention network, three attention features are obtained through 1*1 convolution layers with three different weights, a first attention feature is transposed and multiplied by a second attention feature, and then the result is input into a Softmax function to obtain an attention map; after the obtained attention attempts are transposed and multiplied with the third attention feature matrix, the layers are convolved by 1*1 to obtain the final image feature with attention.

The attention network model beta adopts the following formula:

where N is the number of image features and the attention scoring function s _i The calculation formula of (2) is as follows:

s _i ＝(W _f x) ^T *(W _g x)

image feature x with attention ⁰ The calculation formula of (2) is as follows:

wherein h (x) _i )＝W _h x _i ；W _h And W is _v Is two weight matrixes, and is realized by 1*1 convolution; x is x _i Beta is the attention network model function for the input image features.

Feature layer fusion uses Late fusion (Late fusion) approach: the fusion is carried out on the prediction scores, namely a plurality of models are trained, each model has a prediction score, and the results of all models are fused to obtain the final prediction result; (by combining detection results of different layers to improve detection performance, before final fusion is completed, starting detection of wine on a partially fused layer, detecting multiple layers, and finally fusing multiple detection results); the research thought selects features not to be fused, multi-scale features are respectively predicted, and then prediction results are synthesized, such as Single Shot MultiBox Detector (SSD), multi-scale CNN (MS-CNN).

Step S8, taking the image characteristic with attention as input of a transducer model, adding position codes into an input part to obtain the image characteristic with relative position information, and comprising the following steps:

After adding the position-coded information, the dimension remains the final dimension in patch ebadd.

Wherein: b is an index matrix according to each pixel position in the relative position index matrix, softMax is the dimension normalization of the respective row vector, i.e. divided byd _k Is a feature dimension, Q represents a query, K represents a key and V represents a value, W _Q 、W _K 、W _V Respectively Q, K, V. The final output can be obtained by carrying out dot product operation on the query and the key, carrying out normalization processing to obtain the weight of each value, multiplying and summing the weight and the value.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims

1. The hyperspectral image tree classification method is characterized by comprising the following steps of:

acquiring a hyperspectral image;

carrying out multi-scale segmentation on the preprocessed hyperspectral image;

2. The hyperspectral image tree classification method as claimed in claim 1, wherein the multi-scale segmentation of the preprocessed hyperspectral image specifically comprises: judging the size of the change rate value of the local change of the homogeneity of the preprocessed hyperspectral image under different segmentation scale parameters, and segmenting the hyperspectral image according to the corresponding scale value when the local change rate value is the maximum.

3. The hyperspectral image tree classification method according to claim 1, wherein the segmented hyperspectral image is input into a pretrained CNN model for flattening, and the flattened hyperspectral image is obtained:

4. The hyperspectral image tree classification method according to claim 1, wherein inputting the extracted texture features and spectral features into a double-headed attention network model to obtain the output image features with double-headed attention specifically comprises: the texture features and the spectrum features are input into a double-head attention network, three attention features are obtained through 1*1 convolution layers with three different weights, the first attention feature is transposed and multiplied by the second attention feature, and then the result is input into a Softmax function to obtain an attention map; after the obtained attention attempts are transposed and multiplied by the third attention feature matrix, the layers are convolved by 1*1 to obtain the image features with double-headed attention.

5. The hyperspectral image tree classification method as claimed in claim 4 wherein the dual head attention network model β uses the following formula:

where N is the number of image features, s _i Scoring a function for attention;

s _i ＝(W _f x) ^T *(W _g x)

image feature x with double-headed attention ⁰ Is as follows:

6. The method of classifying hyperspectral image tree species according to claim 5 wherein the step of taking the image features with double-headed attention as input of a transducer model and adding position codes to obtain the image features with relative position information comprises the steps of:

7. The hyperspectral image tree classification method as claimed in claim 1, wherein the image features with relative position information are encoded, the encoded image features and the output of class embedding are decoded, each pixel is classified after up-sampling, and the hyperspectral image tree classification result is output specifically comprising the following steps of;

wherein: b is an index matrix according to each pixel position in the relative position index matrix, softMax is the dimension normalization of the respective row vector, i.e. divided byd _k Is a feature dimension, Q represents a query, K represents a key and V represents a value, W _Q 、W _K 、W _V The full connection mapping corresponding to Q, K, V respectively; the final output can be obtained by carrying out dot product operation on the query and the key, carrying out normalization processing to obtain the weight of each value, multiplying and summing the weight and the value.

8. The hyperspectral image tree classification device is characterized by comprising a processor and a storage medium; the storage medium is used for storing instructions; the processor being operative according to the instructions to perform the steps of the method according to any one of claims 1 to 7.

9. A storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the method according to any of claims 1 to 7.