CN112464891A

CN112464891A - Hyperspectral image classification method

Info

Publication number: CN112464891A
Application number: CN202011468157.9A
Authority: CN
Inventors: 梁联晖; 李军
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2021-03-09
Anticipated expiration: 2040-12-14
Also published as: CN112464891B

Abstract

The invention discloses a hyperspectral image classification method, which combines the advantages of 3D Octave convolution and Bi-RNN attention network, and firstly utilizes 3D Octave convolution to obtain spatial features for hyperspectral images while reducing spatial redundancy Then, the spectral information of the hyperspectral image is extracted by the Bi‑RNN spectral attention network, and the spatial and spectral feature maps are fused through the fully connected layer to achieve feature fusion, and finally the classification results are output through softmax. The invention realizes the accurate classification of hyperspectral remote sensing images under low training samples, and adopts the parallel data processing method to speed up the running speed of the model.

Description

Hyperspectral image classification method

Technical Field

The invention belongs to the field of hyperspectral image processing in the field of remote sensing, and particularly relates to a hyperspectral image classification method.

Background

The hyperspectral remote sensing technology is a technology for crossing multiple subjects such as computer science, geography and the like, and utilizes a hyperspectral imager to image in different electromagnetic wave ranges by utilizing narrow spectral intervals so as to obtain a spectral curve with reverse cultural spectral characteristics. Data of hundreds of spectral bands are recorded at the same spatial resolution to form a three-dimensional hyperspectral image with a large amount of spatial and spectral information. The hyperspectral image uses two-dimensional space imaging to express the reflection effect of a surface object in a single wave band, and the reflection effects of a plurality of wave bands are combined in sequence to form a multi-layer approximately continuous spectral vector dimension. Each hyperspectral pixel point characteristic is composed of the spectral vectors, each pixel data is a continuous spectral curve, and observed ground feature information is recorded in detail. The hyperspectral images can describe the spectral information and the spatial information of the ground object in detail, so with the development of a hyperspectral image classification technology, the hyperspectral image classification is widely applied to the fields of environment monitoring, urban and rural planning, mineral exploitation, national defense construction, accurate agriculture and the like.

The hyperspectral image classification method can be roughly divided into three categories, namely a classification method based on spectral information, a classification method based on space-spectral feature combination and a deep learning classification method. The first method only utilizes spectral dimension information in a hyperspectral image, and ignores the correlation among pixels in space; the second category of methods improves the classification performance of hyperspectral images to some extent, but they depend to a large extent on handmade features. That is, the classification effect is mainly determined by low-level features, but this cannot represent complex content in the hyperspectral image, so that the classification performance is limited; compared with the first two traditional shallow classification methods, the third method has stronger characterization and generalization capabilities, can extract deeper image features, and obtains more distinguishing features to obtain a good classification result. However, although these methods achieve a good classification effect, models based on convolutional neural networks are accompanied by redundancy of a large amount of spatial dimension information, and the model performance is seriously affected to some extent. Meanwhile, in deep learning, a large amount of manpower and material resources are consumed for manual marking of the hyperspectral remote sensing images, and the number of ready-made marking samples is small. Therefore, how to learn the space and spectrum characteristics of the hyperspectral remote sensing images under the condition of reducing the spatial information redundancy and the low training samples has great significance in improving the classification accuracy of the hyperspectral images.

Aiming at all the current model methods for performing hyperspectral image classification by using Octave convolution, the method only solves the problem of reducing the redundancy of spatial characteristic information. However, in terms of the method for extracting the spectral information of the hyperspectral image, the method utilizes either the Octave convolution itself or the convolution neural network-based method to extract the spectral information. The two methods for extracting the spectral information both treat the spectral information of the hyperspectral data as a disordered high-dimensional vector for data processing, which does not accord with the characteristics of the spectral data and can destroy the correlation among the spectra, thereby causing the problems that the extraction of the spectral information is influenced and the spectral feature information cannot be accurately extracted.

The method aims at solving the problem that the existing method for performing hyperspectral classification by using a Bi-RNN (bidirectional recurrent neural network) cannot avoid the redundancy of a large amount of spatial feature information, so that the problem that the information of image spatial dimension cannot be accurately extracted and the classification precision is influenced is caused. The method aims at the problem that the data stream is serial and cannot be processed in parallel when the conventional Octave convolution method is used for spectral image classification.

For example, patent application CN 202010066659.2 discloses a hyperspectral remote sensing image classification method based on 3-dimensional and 2-dimensional mixed convolution, which includes acquiring a hyperspectral remote sensing image to be classified; performing spectral dimensionality reduction by using a principal component analysis method; arranging the spectral bands in the dimensionality-reduced hyperspectral remote sensing images from high to low along the middle of the channel to two sides of the channel according to the spectral information quantity; giving corresponding weight to the spectral band according to the spectral information content contained in the spectral band; taking cubic data with a fixed space size for each pixel point in the spectral band, extracting spectral-spatial characteristics according to the cubic data by using 3-dimensional convolution, and fusing spectral information by using 2-dimensional convolution to obtain a final characteristic diagram; extracting second-order information from the feature map by using a covariance pooling method, and outputting a feature vector; and inputting the feature vectors into a three-layer full-connection network to obtain a prediction classification result. The method combines the advantages of 3-dimensional convolution and 2-dimensional convolution, and realizes accurate classification of the hyperspectral remote sensing images under low training samples. However, the dimension reduction preprocessing is required to be performed on the spectrum in step S2 of the invention, which results in relatively complicated method and model. In addition, the method cannot avoid the problem of spatial information redundancy, and the method has insufficient extraction capability on spectral information.

Therefore, there is a need in the art for a new hyperspectral image classification method to solve the above problems.

Disclosure of Invention

The invention provides a hyperspectral image classification method based on a 3D Octave convolution and a Bi-RNN attention network, which combines the advantages of the 3D Octave convolution and the Bi-RNN attention network and realizes the accurate classification of hyperspectral images under low training samples.

Therefore, the invention provides a hyperspectral image classification method, wherein the hyperspectral image belongs to a remote sensing image acquired by an aerial camera, and the hyperspectral image classification method is based on a 3D Octave convolution and a Bi-RNN attention network, wherein the Bi-RNN is a bidirectional recurrent neural network, and comprises the following steps:

s1, acquiring a hyperspectral remote sensing image to be classified;

step S2, obtaining spatial feature information Z for the hyperspectral image by using continuous 4 or more 3D Octave convolutions^O(ii) a The number of 3D Octave convolutions is preferably 4;

step S3, regarding the hyperspectral data output after the step S1 as an ordered spectral vector, parallel to the step S2, inputting the spectral sequence into the bidirectional hidden layers one by one, and connecting the output state of the forward hidden layer and the output state of the reverse hidden layer through a series function to obtain a vector g_n；

Step S4, connecting output vector g of bidirectional hidden layer_nAs input to the attention module; probability weights W derived by random initialization of attention mechanism_iAnd vector g_nIs multiplied by an offset parameter b_iAfter the tanh activation function, calculating by a softmax function to obtain an attention weight parameter beta;

step S5, multiplying the attention weight parameter β by the corresponding value of the vector gn obtained in step S3, and then summing them to obtain a new spectral information vector label y;

step S6, extracting the spatial feature information Z of the last full connection layer of the 3D Octave convolution network in the step S2^OCombining the new spectral information vector label y obtained from the last full-link layer of the Bi-RNN attention network in the step S5 to form a new full-link layer and outputting a feature vector;

and S7, inputting the feature vectors into more than two layers of full-connection layer networks, preferably 2-5 layers, more preferably 3 layers, and predicting classification results through a softmax layer.

In a specific embodiment, step S2 includes:

let the size of the image used for the hyperspectral image classification be W × H × L;

reshaping the hyperspectral image classification data into X with the size of L multiplied by N, wherein N is W multiplied by H;

the hyperspectral data X is used as the input of a 3D Octave convolutional network, and the input data and the output data of the Octave convolutional network are assumed to be X ═ X respectively^H，X^L}，Z＝{Z^H，Z^LH and L are respectively expressed as high frequency information and low frequency information; that is, the input hyperspectral data X and the data Z output after the data processing of the 3D Octave convolution network can be respectively represented as the sum of corresponding high-frequency information and low-frequency information;

the Octave convolution model is built as follows:

Z^H＝Z^H→H+Z^L→Hand Z^L＝Z^L→L+Z^H→L；

Wherein Z is^H→H，Z^L→LRepresenting the updating of hyperspectral image data information in high and low frequency, respectively, Z^L→H，Z^H→LRespectively representing the conversion of the hyperspectral image data information between low-frequency and high-frequency frequencies and between high-frequency and low-frequency frequencies;

high frequency characteristic information and low frequency for completing hyperspectral imageUpdating and converting the characteristic information, and assuming that the weight parameter corresponding to the Octave convolution model is W ═ W^H，W^L](ii) a Likewise, the weight parameter W^HAnd W^LAre respectively defined as W^L＝[W^L→L,W^H→L],W^H＝[W^H→H,W^L→H]Wherein W is^H→H，W^L→LIndicating the information update weight, W, within the corresponding frequency^H→L，W^L→HRepresenting information conversion weights between corresponding frequencies;

from above to obtain Z^HAnd Z^LAre respectively:

wherein, T in the formula (1) and the formula (2) represents matrix transposition, up represents up-sampling operation, and pool represents average pooling operation;

calculating the Octave convolution network output Z, wherein the expression of Z is as follows:

Z＝[Z^L，Z^H]

＝[(Z^L→L+Z^H→L)，(Z^H→H+Z^L→H)]

＝[∑(W^L)^TX，∑(W^H)^TX]

＝[∑(W^L→L)^TX^L+∑(W^H→L)^Tpool(X^H)，∑(W^H→H)^TX^H+up(∑(W^L→H)^TX^L)]。

in a specific embodiment, the step S3 includes:

let the hyperspectral input data X be an ordered spectral vector, X ═ X₁，X₂，X₃，...，X_n) Calculating the bidirectional hidden layer output h of the Bi-RNN network_nAs follows：

In the formulae (3) and (4), n represents the range of the spectral band from 1 to m, and the coefficient matrix

And

the input from the current hidden layer is,

indicating the last hidden state h_n-1，

From h in subsequent hidden states_n+1Initially, f is the nonlinear activation function of the hidden layer, and the output of the encoder is taken as vector g_nIs input, calculate g_nThe following were used:

where concat () is a series function between the forward hidden state function and the reverse hidden state function.

In a specific embodiment, the step S4 includes:

acquiring weight values of different spectral information, wherein the weight of the attention layer is calculated as follows:

e_in＝tanh(W_ig_n+b_i) (6)

β_in＝softmax(W_i'e_in+b_i') (7)

in the formulae (6) and (7), W_iAnd W_i' is a transformation matrix, b_iAnd b_i' is a bias term, and softmax () maps non-normalized output values to probability distributions, with the output values constrained within a (0, 1) interval; equation (6) is a layer of neural network that rearranges the state vector space of Bi-RNN, which is then converted to e by tanh activation_inAs a new hidden representation of hn; equation (7) generates the attention weight β by softmax layer, the β_inIs a component of one of the attention weight parameters beta, i.e. the ith weight parameter, where we are based on e_inThe importance of the input is measured by the correlation with another channel vector, e_inIs an intermediate parameter.

In a specific embodiment, the step S5 includes:

calculate the prediction tag y for pixel X_n：

y_n＝U[g_n,β] (8)

Where U () is the sum function of all state vectors weighted by the corresponding attention weights; a prediction label y of the pixel X_nIs a component of the spectral information vector label y.

In a specific embodiment, the step S7 includes: inputting the feature vectors into a 3-layer full-connection layer network, wherein the three full-connection layer network comprises three full-connection layers, normalizing the first two full-connection layers in the three full-connection layers by using Batch _ normal, activating the normalized full-connection layers by using a relu function, then using a regularized Dropout method, and outputting a prediction classification result by using Softmax in the last full-connection layer.

In a specific embodiment, the hyperspectral image classification method is implemented by using a hyperspectral image classification system which comprises a hyperspectral image module (1), a convolution network module (2), a Bi-RNN attention network module (3), a spatio-spectral feature fusion network module (4) and a classification image module (5), wherein the step S1 is implemented in the hyperspectral image module (1), the step S2 is implemented in the 3D Octave convolution network module (2), the steps S3 to S5 are implemented in the Bi-RNN attention network module (3), the step S6 is implemented in the spatio-spectral feature fusion network module (4), and the step S7 is implemented in the classification image module (5).

The invention has at least the following beneficial effects: the hyperspectral image classification method based on the 3D Octave convolution and the Bi-RNN attention network provided by the application utilizes 4 3D Octave convolutions to obtain spatial features for hyperspectral images and simultaneously reduces spatial redundant information, then utilizes the Bi-RNN spectral attention network to extract spectral information of the hyperspectral images, enhances the important system of spectral bands with higher spectral system quantity, improves the classification precision under the condition of low training samples, fully utilizes the advantages of the 3D Octave convolution and the Bi-RNN attention network, remarkably improves the classification accuracy, adopts a parallel data processing mode, and accelerates the operation speed of the model.

Drawings

FIG. 1 is a block diagram of a hyperspectral image classification method based on a 3D Octave convolution and a Bi-RNN attention network according to the invention.

Fig. 2 is a flowchart of the 3D Octave convolution of the present application.

FIG. 3 is a flow chart of the Bi-RNN attention network of the present application.

FIG. 4 is a classification diagram of different methods on a Pavia University dataset. Wherein, (a) a pseudo-color image, (b) a real terrain image, (c) SVM, (D)2D-CNN, (e) ARNN, (f) SSAN, (g)3DOC-SSAN and (h) the method of the present invention.

FIG. 5 is a classification diagram of different methods on a Botswana dataset. Wherein, (a) a pseudo-color image, (b) a real terrain image, (c) SVM, (D)2D-CNN, (e) ARNN, (f) SSAN, (g)3DOC-SSAN and (h) the method of the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

In one embodiment, the hyperspectral image classification method of the 3D Octave convolution and the Bi-RNN attention network fully utilizes the advantages of the 3D Octave convolution and the Bi-RNN attention network, and achieves the purpose of obtaining a classification result with high accuracy under low training samples.

Specifically, as shown in fig. 1, the method for classifying hyperspectral remote sensing images of a 3D Octave convolution and a Bi-RNN attention network in the embodiment includes the following steps:

and step S1, acquiring the hyperspectral remote sensing image to be classified.

Using Z₁、Z₂、Z₃And Z₄These 4D Octave convolutions obtain spatial features for the hyperspectral image while reducing spatial redundancy information, as shown in fig. 2. In this example, it is provided that the spatial feature information is obtained by each 3D Octave convolution, and the specific steps are as follows, see step S2.

In an embodiment, the provided 3D Octave convolution obtains spatial features for hyperspectral images as follows:

low frequency signal X in 1 st 3D Octave convolved input data X^LSet to 0;

calculating the 1 st 3D Octave convolution network output Z₁，Z₁The expression of (a) is as follows:

Z₁＝[Z₁ ^L，Z₁ ^H]

＝[(0+Z₁ ^H→L)，(Z₁ ^H→H+0)]

＝[∑(W₁ ^H→L)^Tpool(X^H)，∑(W₁ ^H→H)^TX^H]

input of 2 nd 3D Octave convolutionData X₂Is Z₁Wherein Z is₁ ^H→HDenotes the high frequency part, Z₁ ^H→LRepresenting the low frequency part.

Computing the 2 nd 3D Octave convolutional network output Z₂，Z₂The expression of (a) is as follows:

Z₂＝[Z₂ ^L，Z₂ ^H]

＝[(Z₂ ^L→L+Z₂ ^H→L)，(Z^H→H+Z^L→H)]

＝[∑(W₂ ^L)^TZ₁，∑(W₂ ^H)^TZ₁]

＝[∑(W₂ ^L→L)^TZ₁ ^L+∑(W₂ ^H→L)^Tpool(Z₁ ^H)，∑(W₂ ^H→H)^TZ₁ ^H+up(∑(W₂ ^L→H)^TZ₁ ^L)]

redundant information of the characteristic diagram of the hyperspectral image is reduced, and important characteristics are reserved.

High frequency signature Z using pooling₂ ^HDown-sampling, and comparing the down-sampling result with the low-frequency characteristic diagram Z₂ ^LMerging into a new profile Z^pool。

Input data X of 3D Octave convolution₃Is Z^poolThe low frequency part is set to 0.

Calculating the output Z of the 3D Octave convolution network₃，Z₃The expression of (a) is as follows:

Z₃＝[Z₃ ^L，Z₃ ^H]

＝[(0+Z₃ ^H→L)，(Z₃ ^H→H+0)]

＝[∑(W₃ ^L)^TZ_pool，∑(W₃ ^H)^TZ_pool]

＝[∑(W₃ ^H→L)^Tpool(Z_pool ^H)，∑(W₃ ^H→H)^TZ_pool ^H]

input data X of 4 th 3D Octave convolution₄Is Z₃Wherein Z is₃ ^H→HDenotes the high frequency part, Z₁ ^H→LRepresenting the low frequency part.

Calculating the 4 th 3D Octave convolution network output Z₄，Z₄The expression of (a) is as follows:

Z₄＝[Z₄ ^L，Z₄ ^H]

＝[(Z₄ ^L→L+Z₄ ^H→L)，(Z₄ ^H→H+Z₄ ^L→H)]

＝[∑(W₄ ^L)^TZ₃，∑(W₄ ^H)^TZ₃]

＝[∑(W₄ ^L→L)^TZ₃ ^L+∑(W₄ ^H→L)^Tpool(Z₃ ^H)，∑(W₄ ^H→H)^TZ₃ ^H+up(∑(W₄ ^L→H)^TZ₃ ^L)]

ensuring the integrity of the information, and matching the low-frequency characteristic diagram Z₄ ^LFused to Z after upsampling₄ ^HIn (b) to obtain Z^O。

The 3D Octave convolution structure is set to be a 4-layer convolution structure, the sizes of convolution kernels of the four-layer convolution structure are all set to be 5 multiplied by 3, and the number of the convolution kernels is respectively set to be 24, 48, 24 and 1.

The 3D Octave convolution method aims to reduce spatial redundant information under the condition that inherent spectral dimension information of a hyperspectral image is reserved. In fact, the 3D Octave convolution method is a multi-frequency feature representation method, which stores high-frequency and low-frequency maps into different groups, and stores and processes the low-frequency part in the feature map using low-dimensional vectors, and since the low-frequency component is redundant, the redundancy can be reduced by reducing the resolution of the low-frequency feature. Thus, the following reasoning can be drawn: after the 3D Octave convolution, the spatial redundancy information of the hyperspectral image is greatly reduced, which has important influence on the subsequent classification of the hyperspectral image.

And step S3, the hyperspectral data output after the step S1 is executed is regarded as an ordered spectral vector. Parallel to step S2, inputting the spectrum sequences into the bidirectional hidden layer of the Bi-RNN network one by one, and connecting the state output by the forward hidden layer and the state output by the reverse hidden layer by a series function to obtain a vector gn;

let the hyperspectral input data X be an ordered spectral vector, X ═ X1, X2, X3.

Where n represents the range 1-m of the spectral band, the coefficient matrix and the input from the current concealment layer, representing the last concealment state hn-1, starting from hn +1 in the subsequent concealment state, f is the nonlinear activation function of the concealment layer, and the output of the encoder is the input of the vector gn, calculated gn as follows:

The Bi-RNN comprises a hidden layer of a bidirectional GRU layer, the spectrum sequences are input one by one, and two hidden layers running along opposite directions are connected to a single output, so that front and back spectrum information in the hyperspectral image spectrum sequences can be processed.

Step S4, connecting the bidirectional hidden layersThe output vector g after_nAs input to the attention module. Probability weights W derived by random initialization of attention mechanism_iAnd vector g_nIs multiplied by an offset parameter b_iAnd after the tanh activation function, calculating by a softmax function to obtain an attention weight parameter beta. As shown in fig. 3.

Obtaining the weight values of different spectral information, and calculating the weight of a spectral attention layer as follows:

e_in＝tanh(W_ig_n+b_i)

β_in＝softmax(W_i'e_in+b_i')

wherein, W_iAnd W_i' is a transformation matrix, b_iAnd b_i' is the bias term, while softmax () maps non-normalized output values to probability distributions, and the output values are constrained to be within the (0, 1) interval. tanh activation converts it to e_inAs h_nA new hidden representation of (a). The attention weight β is generated by the softmax layer.

The attention weight parameter β and the vector g obtained in step S3 are combined_nMultiplying corresponding values, and then summing the values to obtain a new spectral information vector label y; the method comprises the following steps:

calculate the prediction tag y for pixel X_n：

y_n＝U[g_n,β]

Where U () is the sum function of all state vectors weighted by the corresponding attention weights.

In practice, the spectral curve is not a straight line of fixed constant but a continuous curve with peaks and valleys. Thus, some important spectral channels in the spectrum should have greater weight, while those minor spectral segments should be given less weight. Additional attention weights can enhance the spectral correlation between spectral channels, with a powerful function of capturing context information in the sequence.

In order to assign appropriate weighting parameters to each spectral channel, highlight and distinguish valid features, obtain more relevant and noticeable information, and attenuate information that is not conducive to classification. And a Bi-RNN attention network is introduced, so that the model can capture the correlation between internal spectral channels and perform better classification, and the training model is more accurate.

Step S6, extracting the spatial feature information Z of the last full connection layer of the 3D Octave convolution network in the step S2^OAnd combining the obtained new spectral information vector label y with the last full-connection layer of the Bi-RNN attention network in the step S5 to form a new full-connection layer and output the feature vector.

Step S7, inputting the feature vectors into a 3-layer full-connection layer network, wherein the three full-connection layer network comprises three full-connection layers, and the first two full-connection layers in the three full-connection layers are activated by a relu function after being normalized by a Batch _ normal; to prevent overfitting, the first two of the three fully-connected layers use the regularized Dropout method, and the last fully-connected layer outputs the prediction classification results using Softmax.

In the embodiment, 4 3D Octave convolutions are used for extracting the hyperspectral image spatial information, so that the redundancy of the spatial information is reduced, meanwhile, a Bi-RNN attention network is used for extracting the hyperspectral image spectral information, the importance of spectral band information with higher spectral information content is enhanced by using the attention network, and the classification accuracy under the condition of low training samples is improved; the advantages of the 3D Octave convolution and the Bi-RNN attention network are fully utilized, the classification result with high accuracy is obtained under the condition of low training samples, and the running speed of the model is accelerated by adopting a parallel data processing mode.

Example 1

The experimental hardware platform is a high-performance computer, and is configured as follows: intel Core i9-9900K @3.60GHz eight cores, 32G memory, and the graphics card is Nvidia GeForce RTX 2080Ti (11 GB). The software platforms are Python3.6.0 and TensorFlow1.14 in the Windows10 system environment.

First, dividing experimental data and samples

To evaluate the classification effect of the proposed method, the Pavia University dataset and the Botswana dataset were selected to verify the performance of the proposed method.

The Pavia University dataset is remotely sensed image data obtained by reflective optical imaging spectrometer sensors at the University of paviia, north italy. The pixel size of the method is 610 multiplied by 340, 115 original spectral bands are arranged in the range of 430-860 nm, 12 noise bands are removed, and the remaining 103 spectral bands are used for classification. 9 semantic classes are defined in the Pavia University dataset, and the size of each class sample, and the division of the number of experimental training samples and test set samples are shown in Table 1.

The Botswana data set was acquired by the United states aviation administration by Hyperion sensor imaging spectrometer on EO-1 satellites from 5/31 of 2001. The image covers a 7.7km long strip-shaped zone in the region of the Gorgwarnaokavantage delta, the spatial resolution of the image reaches 30m, and the spectral resolution reaches 10 nm. The image originally comprises 242 wave bands, after the wave bands affected by noise are eliminated, the remaining 145 wave bands can be used for classifying the hyperspectral image, the image size is 1476 x 256, and the image contains 14 different categories in total. The size of each class sample, and the partitioning of the number of samples in the experimental training sample and test set are shown in table 2.

The classification precision evaluation index of the hyperspectral image adopts three commonly used evaluation indexes, namely overall classification precision (OA), average classification precision (AA) and Kappa coefficient to measure the classification precision.

TABLE 1 training set and test set sample number for the Pavia University dataset

TABLE 2 training and test set sample numbers for the Botswana data set

Second, parameter setting

In the experiment, three parameters of the learning rate, the space size and the discarding rate can cause remarkable influence on the experiment. Here we take the Pavia University dataset as an example, and make detailed evaluations of experimental parameters.

1) Learning rate: in experiments, we tested the effect of different learning rates. The learning rate determines the learning process and the amount of assignment errors each time the model weights are updated. Too large a learning rate may cause periodic oscillations in the training, and too small a learning rate may cause the model to fail to converge. Therefore, the learning rates of the model are respectively selected from [0.01,0.005,0.001,0.0007,0.0005,0.0003, 0.0001, 0.00007, 0.00005, 0.00003 and 0.00001] to carry out experiments, and the results show that the classification effect is the best when the learning rate is 0.0001.

2) The space size is as follows: because the extraction of the image space features depends heavily on the size of the space domain area. And a larger spatial input will provide more opportunities to learn more spatial features, but a larger spatial region also brings unnecessary information and the possibility of causing an image over-smoothing phenomenon. Therefore, selecting an appropriate space size is very important to improve classification performance. Under the conditions that the number of spectral channels is fixed, the optimal learning rate is 32, the batch size is 32, and the number of training iterations is 100, the classification accuracy results under different spatial dimensions are shown in table 3.

As can be seen from tables 3 and 4, the classification effect is best when the space size of the input data is 15 × 15 and the loss rate is 0.6. To optimize the classification performance, the experiment selects the best loss rate.

Also in the Botswana dataset, to optimize the classification performance, the learning rate of the experiment was set to 0.0001, the spatial size 13 × 13, the batch size 16, and the number of iterations of the training was set to 400.

TABLE 3 Classification accuracy under different spatial dimensions

TABLE 4 Classification accuracy at different loss rates

Third, experimental results

In order to ensure the accuracy of the experimental results, the experiment was repeated 10 times and the average was taken.

In order to verify the effectiveness and superiority of the method, the invention is compared with some traditional methods and mainstream deep learning methods (such as SVM, ARNN, SSAN, 3DOC-SSAN, 2D-CNN) in experiments. The results of the classification performance comparison experiments on the Pavia University dataset by the different methods are shown in table 5.

As can be seen from the results in Table 5, the performance of the method provided by the invention is obviously better than that of the traditional SVM method on the Pavia University data set, and the OA value, the AA value and the Kappa value of the method provided by the invention are higher than the precision of other mainstream deep learning classification methods, wherein the OA value is 9.50% higher than that of the SVM, 0.70% higher than that of 2D-CNN, 1.74% higher than that of ARNN, 0.97% higher than that of SSAN, and 0.10% higher than that of 3DC-SSAN classification methods. AA values 6.28% higher than SVM, 0.55% higher than 2D-CNN, 0.79% higher than ARNN, 0.66% higher than SSAN, 0.09% higher than 3 DC-SSAN. The Kappa value is 11.02% higher than that of SVM, 1.61% higher than that of 2D-CNN, 1.47% higher than that of ARNN, 2.37% higher than that of SSAN, and 0.02% higher than that of 3 DC-SSAN. The three indexes show that the method is superior to other methods in classification performance.

TABLE 5 Classification Performance of different methods on the Pavia University dataset

While the classification map of the different methods on the Pavia University dataset is shown in fig. 4. As can be seen from the figure, the final classification results of SVM, ARNN, SSAN and 2D-CNN all have a large amount of disordered speckles, and some regions have the phenomenon of wrong classification. The 3DOC-SSAN method has good classification effect, but there are also fewer blobs in the lower right corner and the upper left corner. The classification result graph of the method of the invention has the advantages that the ground objects are basically classified completely and correctly, spots are hardly seen, and the homogeneous area is relatively smooth.

Results of the classification performance comparison experiments on the Botswana data set by the different methods are shown in Table 6. Meanwhile, the classification map on the data set is shown in fig. 5.

As can be seen from Table 6, the accuracy of the method provided by the invention on the Botswana data set is higher than that of other methods on three indexes of OA value, AA value and Kappa value. Wherein the OA value is 8.88 percent higher than that of SVM, 1.65 percent higher than that of 2D-CNN, 2.77 percent higher than that of ARNN, 1.79 percent higher than that of SSAN, and 0.36 percent higher than that of the 3DC-SSAN classification method. The AA value was 9.67% higher than that of SVM, 1.95% higher than that of 2D-CNN, 2.71% higher than that of ARNN, 1.69% higher than that of SSAN, and 0.34% higher than that of 3 DC-SSAN. The Kappa value is 7.57% higher than that of SVM, 1.81% higher than that of 2D-CNN, 3.01% higher than that of ARNN, 1.94% higher than that of SSAN, and 0.38% higher than that of 3 DC-SSAN. And the classification precision reaches 100% in 11 classes, except that the classification precision on the flood plain grassland 1 is 97.38%, the classification precision of the other two classes also reaches more than 99.88%.

TABLE 6 Classification Performance of different methods on Botswana datasets

Meanwhile, as can be seen from tables 5 and 6, the 3D Octave convolution classification method 3DC-SSAN and the method of the present invention have significantly better performance than the 2D-CNN, ARNN and SSAN classification methods, which proves that the 3D Octave convolution has certain advantages in reducing spatial redundancy information and improving classification performance. The method of the invention has better classification performance than the 3DC-SSAN method without adding the Bi-RNN attention network, which shows that the Bi-RNN attention network has certain advantages in the aspect of extracting information of enhanced spectral features and is beneficial to the improvement of classification performance.

In addition, compared with the 3DOC-SSAN model, the model of the method does not need to be additionally provided with a space attention network module, the model is relatively simple, the model can be processed in parallel during model training, and the speed is higher when parallel running calculation is adopted. Because the data flow of the 3DOC-SSAN method is serial, the hyperspectral data needs to be input into an Octave convolution model for preprocessing, then the data can be respectively added into a spectrum and a space attention network for respectively extracting the spatial spectrum characteristics, then the characteristic information is fused through a data fusion module, and finally classification is carried out. The method of the present invention is different in that the data streams of the method of the present invention are parallel. Meanwhile, the running time of the Bi-RNN attention network running once is about 3 times faster than that of the 3D Octave convolution network. When parallel operation is adopted, the method is applicable to both a task-based parallel processing mode and a data-based parallel processing mode. After the 3D Octave convolution network is executed, the injected spatial spectral feature information can be directly fused into the network, the spatial attention and the spectral attention networks are operated without extra time overhead, and the operation time of the model is greatly reduced in comparison.

Fourth, conclusion

In order to reduce redundancy of spatial characteristic information, enhance acquisition of spectral information and improve classification performance of hyperspectral images, the invention provides a new model based on a 3D Octave convolution and a Bi-RNN attention network. The model is simple in structure, complex preprocessing and postprocessing of the hyperspectral image data are not needed, and end-to-end training can be achieved. Experiments show that the classification performance is obviously improved compared with the traditional method, and compared with some current mainstream deep learning algorithms, the method provided by the invention can be used for more fully extracting the spatial and spectral characteristic information and has better classification performance.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. a hyperspectral image classification method, the hyperspectral image belongs to the remote sensing image obtained by aerial photography, it is characterized in that, described hyperspectral image classification method is based on 3D Octave convolution and Bi-RNN attention network, described Bi-RNN is a bidirectional recurrent neural network, and the hyperspectral image classification method includes the following steps:

Step S1, obtaining a hyperspectral remote sensing image to be classified;

Step S2, utilize 4 consecutive or more than 3D Octave convolutions to obtain the spatial characteristic information Z ^O for hyperspectral image; The quantity of preferred 3D Octave convolutions is 4;

In step S3, the hyperspectral data output after performing step S1 is regarded as an ordered spectral vector, and in parallel with step S2, the spectral sequence is input to the bidirectional hidden layer one by one, and the state and reverse output of the forward hidden layer are input. The states output to the hidden layer are connected by a concatenation function to obtain a vector g _n ;

Step S4, the output vector g _n after the bidirectional hidden layer is connected is used as the input of the attention module; the product of the probability weight Wi and the _{vector g n} _obtained by random initialization of the attention mechanism, plus a bias parameter b _i , After the tanh activation function, the attention weight parameter β is calculated by the softmax function;

Step S5, multiply the attention weight parameter β by the corresponding value of the vector gn obtained in step S3, and then sum them to obtain a new spectral information vector label y;

Step S6, the spatial feature information Z ^O extracted from the last fully connected layer of the 3D Octave convolutional network in step S2 and the new spectral information vector obtained from the last fully connected layer of the Bi-RNN attention network in step S5 The labels y are combined to form a new fully connected layer that outputs feature vectors;

Step S7, input the feature vector into a fully connected layer network with more than two layers, the number of layers of the fully connected layer network is preferably 2 to 5 layers, more preferably 3 layers, and then the classification result is predicted through the softmax layer.

2. The hyperspectral image classification method according to claim 1, wherein step S2 comprises:

Let the size of the image used for hyperspectral image classification be W × H × L;

Reshape the hyperspectral image classification data to be X and the size to be L×N, where N=W×H;

Taking the hyperspectral data X as the input of the 3D Octave convolutional network, it is assumed that the input and output data of the Octave convolutional network are X={X ^H , ^XL }, Z={Z ^H , Z ^L }, where H and L is represented as high-frequency information and low-frequency information respectively; that is, the input hyperspectral data X and the output data Z after 3D Octave convolutional network data processing can be respectively expressed as the sum of the corresponding high-frequency information and low-frequency information;

The Octave convolution model is established as follows:

Z ^H =Z ^H→H +Z ^L→H and Z ^L =Z ^L→L +Z ^H→L ;

Among them, Z ^H→H , Z ^L→L represent the update of hyperspectral image data information in high frequency and low frequency respectively, Z ^L ^→H , Z ^H→L represent low frequency to high frequency and high frequency respectively Conversion of hyperspectral image data information to low frequency frequencies;

In order to complete the update and conversion of the high-frequency feature information and low-frequency feature information of the hyperspectral image, it is assumed that the weight parameter corresponding to the Octave convolution model is W=[W ^H , W ^L ]; Similarly, the weight parameters W ^H and W ^L are defined respectively W ^L =[W ^L→L ,W ^H→L ],W ^H =[W ^H ^→H ,W ^L→H ], where W ^H→H , W ^L→L represent the information update weight in the corresponding frequency , W ^H→L , W ^L→H represents the information conversion weight between corresponding frequencies;

The expressions of Z ^H and Z ^L obtained from the above are:

Among them, T in equations (1) and (2) represents matrix transposition, up represents upsampling operation, and pool represents average pooling operation;

Calculate the output Z of the Octave convolutional network, and the expression of Z is as follows:

Z=[Z ^L , Z ^H ]

=[(Z ^L→L +Z ^H→L ), (Z ^H→H +Z ^L→H )]

=[∑(W ^L ) ^T X,∑(W ^H ) ^T X]

=[∑(W ^L→L ) ^T X ^L +∑(W ^H→L ) ^T pool(X ^H ), ∑(W ^H→H ) ^T X ^H +up(∑(W ^L→H ) ^T XL) ].

3. The hyperspectral image classification method according to claim 1, wherein the step S3 comprises:

Let the hyperspectral input data X be an ordered spectral vector, X=(X ₁ , X ₂ , X ₃ , ..., X _n ), calculate the bi-directional hidden layer output h _n of the Bi-RNN network as follows:

In equations (3) and (4), n represents the range of the spectral band from 1 to m, and the coefficient matrix

and

the input from the current hidden layer,

represents the previous hidden state h _n-1 ,

Starting from h _n+1 in subsequent hidden states, f is the nonlinear activation function of the hidden layer, and the output of this encoder is used as the input of the vector g _n , which _is calculated as follows:

where concat() is the concatenation function between the forward hidden state function and the reverse hidden state function.

4. The hyperspectral image classification method according to claim 1, wherein the step S4 comprises:

To obtain the weight value of different spectral information, the weight of the attention layer is calculated as follows:

e _in =tanh(W _i g _n +b _i ) (6)

β _in =softmax(W _i 'e _in +b′ _i ) (7)

In equations (6) and (7), _Wi and _Wi ' are transformation matrices, _bi and _bi ' are bias terms, and softmax() maps unstandardized output values to probability distributions, and the output values are constrained In the interval (0, 1); Equation (6) is a layer of neural network that rearranges the state vector space of Bi-RNN, and then tanh activation converts it to e _in as a new hidden representation of hn; Equation (7) generates the attention weight β through the softmax layer, and the β _in is a component of one of the attention weight parameters β, that is, the ith weight parameter, where we use the correlation between e _in and another channel vector To measure the importance of the input, the e _in is an intermediate parameter.

5. The hyperspectral image classification method according to claim 1, wherein the step S5 comprises:

Calculate the predicted label _yn for pixel X:

y _n =U[g _n ,β] (8)

In the formula, U( ) is the sum function of all state vectors weighted by the corresponding attention weights; the predicted label y _n of the pixel X is a component of the spectral information vector label y.

6. The hyperspectral image classification method according to claim 1, wherein the step S7 comprises: inputting the feature vector into a 3-layer fully-connected layer network, including three fully-connected layers, and among the three fully-connected layers The first two fully connected layers are normalized using Batch_normal, then activated by the relu function, and then using the regularized Dropout method. The last fully connected layer uses Softmax to output the predicted classification result.

7. The hyperspectral image classification method according to any one of claims 1 to 6, wherein the hyperspectral image classification method is completed by using a hyperspectral image classification system, and the hyperspectral image classification system comprises: Hyperspectral image module (1), convolutional network module (2), Bi-RNN attention network module (3), space-spectral feature fusion network module (4) and classification image module (5), the step S1 is in The hyperspectral image module (1) is completed, the step S2 is completed in the 3D Octave convolutional network module (2), the steps S3 to S5 are completed in the Bi-RNN attention network module (3), and the step S6 is completed in the spatial-spectral feature The fusion network module (4) is done, and step S7 is done in the classification image module (5).