CN112464891B

CN112464891B - Hyperspectral image classification method

Info

Publication number: CN112464891B
Application number: CN202011468157.9A
Authority: CN
Inventors: 梁联晖; 李军
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2023-06-16
Anticipated expiration: 2040-12-14
Also published as: CN112464891A

Abstract

The invention discloses a hyperspectral image classification method, which combines the advantages of 3D Octave convolution and Bi-RNN attention network, firstly utilizes 3D Octave convolution to acquire spatial features for hyperspectral images and simultaneously reduces spatial redundancy information, then utilizes Bi-RNN spectrum attention network to extract spectral information of hyperspectral images, realizes feature fusion of spatial and spectral feature images through a full connection layer, and finally outputs classification results through softmax. The method and the device realize accurate classification of the hyperspectral remote sensing images under the condition of low training samples, and accelerate the running speed of the model by adopting a parallel data processing mode.

Description

Hyperspectral image classification method

Technical Field

The invention belongs to the field of hyperspectral image processing in the remote sensing field, and particularly relates to a hyperspectral image classification method.

Background

The hyperspectral remote sensing technology is a multi-subject crossing technology of computer science, geography and the like, and uses narrow spectral intervals to image in different electromagnetic wave ranges through a hyperspectral imager so as to obtain a spectral curve capable of reversing the spectral characteristics of the ground object. Data of hundreds of spectral bands are recorded at the same spatial resolution to form a three-dimensional hyperspectral image with a large amount of spatial and spectral information. The hyperspectral image uses two-dimensional space imaging to represent the reflection effect of a surface object in a single wave band, and the reflection effects of a plurality of wave bands are sequentially combined in sequence to form a spectrum vector dimension with multiple layers and approximate continuity. Each hyperspectral pixel characteristic is composed of the spectral vectors, each pixel data is a continuous spectral curve, and observed ground feature information is recorded in detail. Because the hyperspectral image can describe the spectral information and the spatial information of the ground object in detail, with the development of hyperspectral image classification technology, hyperspectral image classification is widely applied to the fields of environmental monitoring, urban and rural planning, mineral exploitation, national defense construction, precise agriculture and the like.

The hyperspectral image classification method can be roughly classified into a classification method based on spectral information, a classification method based on space-spectral feature combination and a deep learning classification method. The first type of method only uses spectrum dimension information in the hyperspectral image, and ignores correlation among pixels in space; the second class of methods improves the classification performance of hyperspectral images to some extent, but they depend largely on hand-made features. I.e. classification map effects are mainly determined by low-level features, which however do not represent the complex content in the hyperspectral image, so that classification performance is limited; compared with the prior two traditional shallow classification methods, the third method has stronger characterization and generalization capability, can extract deeper image features, and can obtain more distinguishing features so as to obtain good classification results. However, although these methods obtain better classification results, models based on convolutional neural networks are accompanied by redundancy of a large amount of spatial dimension information, which can seriously affect the performance of the models to some extent. Meanwhile, in deep learning, manual marking of hyperspectral remote sensing images requires a large amount of manpower and material resources, so that existing marked samples are fewer. Therefore, how to learn the spatial and spectral characteristics of the hyperspectral remote sensing image under the condition of reducing the spatial information redundancy and the low training sample, improves the classification accuracy of the hyperspectral image, and has important significance.

For all current model methods for classifying hyperspectral images by using Octave convolution, the problem of reducing the redundancy of spatial characteristic information is only solved. But in terms of the method of extracting spectral information of hyperspectral images, they extract spectral information either by using the Octave convolution itself or by using a convolutional neural network-based method. The two methods for extracting the spectrum information are to treat the spectrum information of the hyperspectral data as a disordered high-dimensional vector for data processing, and the method is not in accordance with the characteristics of the spectrum data, and can damage the correlation among spectrums, so that the problem that the extraction of the spectrum information is influenced and the spectrum characteristic information cannot be accurately extracted is caused.

Aiming at the problems that the prior method for carrying out hyperspectral classification by utilizing Bi-RNN (bidirectional cyclic neural network) cannot avoid the redundancy of a large amount of spatial characteristic information, so that the problem that the information of the spatial dimension of an image cannot be accurately extracted and the classification accuracy is affected can be solved. The method aims at the problems that the existing method for classifying the spectrum images by utilizing the Octave convolution method has serial data streams and can not run processing in parallel.

For example, patent application CN 202010066659.2 discloses a hyperspectral remote sensing image classification method based on 3-dimensional and 2-dimensional mixed convolution, which comprises obtaining hyperspectral remote sensing images to be classified; performing spectrum dimension reduction by using a principal component analysis method; according to the spectrum information quantity, arranging spectrum bands in the hyperspectral remote sensing image after dimension reduction from high to low along the middle of the channel to two sides of the channel; giving corresponding weight to the spectral band according to the spectral information amount contained in the spectral band; taking cube data with a fixed space size from each pixel point in the spectrum band, extracting spectrum-space characteristics according to the cube data by using 3-dimensional convolution, and fusing spectrum information by using 2-dimensional convolution to obtain a final characteristic diagram; extracting second-order information from the feature map by adopting a covariance pooling method, and outputting feature vectors; and inputting the feature vector into a three-layer fully-connected network to obtain a prediction classification result. The invention combines the advantages of 3-dimensional convolution and 2-dimensional convolution, and realizes the accurate classification of hyperspectral remote sensing images under a low training sample. However, in step S2 of the present invention, it is necessary to perform the dimension reduction preprocessing on the spectrum first, which results in relatively complex method and model. In addition, the problem of spatial information redundancy cannot be avoided in the method, and the extraction capability of the method on spectrum information is not high enough.

Accordingly, there is a need in the art for a new hyperspectral image classification method to address the above-described problems.

Disclosure of Invention

The invention provides a hyperspectral image classification method based on 3D Octave convolution and Bi-RNN attention network, which combines the advantages of the 3D Octave convolution and the Bi-RNN attention network, and realizes accurate classification of hyperspectral images under a low training sample.

Therefore, the invention provides a hyperspectral image classification method, which belongs to a remote sensing image obtained by shooting by an aerial camera, and is characterized in that the hyperspectral image classification method is based on a 3D Octave convolution and a Bi-RNN attention network, wherein the Bi-RNN is a bidirectional cyclic neural network, and the hyperspectral image classification method comprises the following steps:

s1, acquiring hyperspectral remote sensing images to be classified;

step S2, spatial feature information Z for hyperspectral image is acquired by using 3D Octave convolution with 4 or more continuous steps ^O The method comprises the steps of carrying out a first treatment on the surface of the The number of 3D Octave convolutions is preferably 4;

step S3, regarding the hyperspectral data output after the step S1 is executed as an ordered spectrum vector, parallel to the step S2, inputting a spectrum sequence one by one into the bidirectional hidden layer, and connecting the state output by the forward hidden layer and the state output by the reverse hidden layer through a series function to obtain a vector g _n ；

Step S4, outputting vector g after connecting the two-way hidden layer _n As input to the attention module; probability weight W obtained by random initialization of attention mechanism _i Vector g of AND _n Adding a bias parameter b to the product of (2) _i After the tanh activation function, the attention weight parameter beta is obtained through calculation of a softmax function;

step S5, multiplying the attention weight parameter beta by the vector gn corresponding value obtained in the step S3, and then summing the multiplied attention weight parameter beta to obtain a new spectrum information vector label y;

step S6, extracting the spatial feature information Z from the last full-connection layer of the 3D Octave convolution network in the step S2 ^O Combining the new spectrum information vector label y obtained by the last full-connection layer of the Bi-RNN attention network in the step S5 to form a new full-connection layer and outputting a feature vector;

and S7, inputting the feature vector into more than two layers of full-connection layer networks, wherein the number of layers of the full-connection layer networks is preferably 2-5 layers, more preferably 3 layers, and predicting the classification result through a softmax layer.

In a specific embodiment, step S2 includes:

let the size of the image for hyperspectral image classification be w×h×l;

remodelling hyperspectral image classification data to X, of size l×n, where n=w×h;

taking hyperspectral data X as the input of a 3D Octave convolution network, the input and output data of the Octave convolution network are assumed to be X= { X respectively ^H ，X ^L }，Z＝{Z ^H ，Z ^L H and L are represented as high frequency information and low frequency information, respectively; namely, the input hyperspectral data X and the data Z which are output after being processed by the 3D Octave convolution network data can be respectively expressed as the sum of corresponding high-frequency information and low-frequency information;

the Octave convolution model is built as follows:

Z ^H ＝Z ^H→H +Z ^L→H and Z ^L ＝Z ^L→L +Z ^H→L ；

Wherein Z is ^H→H ，Z ^L→L Update of hyperspectral image data information in high frequency and low frequency, Z ^L→H ，Z ^H→L Representing the conversion of hyperspectral image data information between low frequency and high frequency and between high frequency and low frequency, respectively;

in order to complete the updating and conversion of the high-frequency characteristic information and the low-frequency characteristic information of the hyperspectral image, it is assumed that the weight parameter corresponding to the Octave convolution model is W= [ W ] ^H ，W ^L ]The method comprises the steps of carrying out a first treatment on the surface of the Likewise, the weight parameter W ^H And W is ^L Respectively defined as W ^L ＝[W ^L→L ,W ^H→L ],W ^H ＝[W ^H→H ,W ^L→H ]Wherein W is ^H→H ，W ^L→L Representing the information update weight, W, in the corresponding frequency ^H→L ，W ^L→H Information conversion weights between corresponding frequencies are represented;

from above obtain Z ^H And Z ^L The expressions of (2) are respectively:

wherein, in the formulas (1) and (2), T represents matrix transposition, up represents up-sampling operation, and pool represents average pooling operation;

the expression for calculating the Octave convolutional network output Z, is as follows:

Z＝[Z ^L ，Z ^H ]

＝[(Z ^L→L +Z ^H→L )，(Z ^H→H +Z ^L→H )]

＝[∑(W ^L ) ^T X，∑(W ^H ) ^T X]

＝[∑(W ^L→L ) ^T X ^L +∑(W ^H→L ) ^T pool(X ^H )，∑(W ^H→H ) ^T X ^H +up(∑(W ^L→H ) ^T X ^L )]。

in a specific embodiment, the step S3 includes:

let hyperspectral input data X be an ordered spectral vector, x= (X) ₁ ，X ₂ ，X ₃ ，...，X _n ) Calculating Bi-directional hidden layer output h of Bi-RNN network _n The following are provided:

in the formulas (3) and (4), n represents the range of spectral bands 1-m, and the coefficient matrix

And->

Input from the current hidden layer,/->

Representing the last hidden state h _n-1 ，/>

From h in the subsequent hidden state _n+1 Initially, f is the nonlinear activation function of the hidden layer, and the output of the encoder is taken as vector g _n Is input to calculate g _n The following are provided:

where concat () is a series function between a forward hidden state function and a reverse hidden state function.

In a specific embodiment, the step S4 includes:

the weight values of different spectrum information are obtained, and the weight of the attention layer is calculated as follows:

e _in ＝tanh(W _i g _n +b _i ) (6)

β _in ＝softmax(W _i 'e _in +b _i ') (7)

in the formula (6) and the formula (7), W _i And W is _i ' is a transform matrix, b _i And b _i ' is the bias term and softmax () is the mapping of non-normalized output values to probability distributions, with the output values constrained to be within the (0, 1) interval; equation (6) is a neural network with the state vector space of Bi-RNNs rearranged and then tanh activated to convert it to e _in As a new hidden representation of hn; formula (7) produces a concentration weight β through the softmax layer, said β _in As a constituent of one of the attention weighting parameters β, i.e. the i-th weighting parameter, where we are based on e _in Correlation with another channel vector measures the importance of the input, e _in Is an intermediate parameter.

In a specific embodiment, the step S5 includes:

calculating a predictive label y for a pixel X _n ：

y _n ＝U[g _n ,β] (8)

Where U () is a summing function for all state vectors under the corresponding attention weighting; predictive label y of said pixel X _n Is an integral part of the spectral information vector tag y.

In a specific embodiment, the step S7 includes: the feature vector is input into a 3-layer full-connection layer network, wherein the 3-layer full-connection layer network comprises three full-connection layers, the first two full-connection layers in the three full-connection layers are normalized by using Batch_normal, then activated by a relu function, then a regularized Dropout method is used, and the last full-connection layer outputs a prediction classification result by using Softmax.

In a specific embodiment, the hyperspectral image classification method is completed by using a hyperspectral image classification system, the hyperspectral image classification system comprises a hyperspectral image module (1), a convolution network module (2), a Bi-RNN attention network module (3), a space-spectrum feature fusion network module (4) and a classification image module (5), the step S1 is completed in the hyperspectral image module (1), the step S2 is completed in the 3D Octave convolution network module (2), the steps S3-S5 are completed in the Bi-RNN attention network module (3), the step S6 is completed in the space-spectrum feature fusion network module (4) and the step S7 is completed in the classification image module (5).

The invention has at least the following beneficial effects: according to the hyperspectral image classification method based on the 3D Octave convolution and the Bi-RNN attention network, the 4 3D Octave convolutions are utilized to acquire the spatial characteristics of hyperspectral images and simultaneously reduce the spatial redundancy information, and then the Bi-RNN spectrum attention network is utilized to extract the spectral information of the hyperspectral images, so that the importance system of the spectral band with higher spectral coefficient is enhanced, the classification precision under the condition of a low training sample is improved, the advantages of the 3D Octave convolution and the Bi-RNN attention network are fully utilized, the classification accuracy is remarkably improved, and the parallel data processing mode is adopted to accelerate the running speed of the model.

Drawings

FIG. 1 is a block diagram of a hyperspectral image classification method based on 3D Octave convolution and Bi-RNN attention network according to the present invention.

Fig. 2 is a flow chart of the 3D Octave convolution of the present application.

Fig. 3 is a flow chart of the Bi-RNN attention network of the present application.

FIG. 4 is a classification diagram of different methods on a Pavia University dataset. Wherein, (a) pseudo-color map, (b) true ground map, (c) SVM, (D) 2D-CNN, (e) ARNN, (f) SSAN, (g) 3DOC-SSAN, and (h) the method of the invention.

FIG. 5 is a classification diagram of different methods on a Botswana dataset. Wherein, (a) pseudo-color map, (b) true ground map, (c) SVM, (D) 2D-CNN, (e) ARNN, (f) SSAN, (g) 3DOC-SSAN, and (h) the method of the invention.

Detailed Description

The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

In one embodiment, a method for classifying hyperspectral images of a 3D Octave convolution and Bi-RNN attention network is provided, and the method fully utilizes the advantages of the 3D Octave convolution and the Bi-RNN attention network to achieve a classification result with high accuracy under a low training sample.

Specifically, as shown in fig. 1, the hyperspectral remote sensing image classification method of the 3D actave convolution and Bi-RNN attention network in the present embodiment includes the following steps:

and S1, acquiring hyperspectral remote sensing images to be classified.

By Z ₁ 、Z ₂ 、Z ₃ And Z ₄ These 4 3D Octave convolutions acquire spatial features for hyperspectral images while reducing spatial redundancy information as shown in fig. 2. In this example, the spatial feature information is acquired through each 3D Octave convolution, and the specific steps are as follows, see step S2.

In an embodiment, the procedure for obtaining spatial features for hyperspectral images by the provided 3D actave convolution is as follows:

low frequency signal X in input data X of 1 st 3D Octave convolution ^L Set to 0;

computing the 1 st 3D Octave convolutional network output Z ₁ ，Z ₁ The expression of (2) is as follows:

Z ₁ ＝[Z ₁ ^L ，Z ₁ ^H ]

＝[(0+Z ₁ ^H→L )，(Z ₁ ^H→H +0)]

＝[∑(W ₁ ^H→L ) ^T pool(X ^H )，∑(W ₁ ^H→H ) ^T X ^H ]

input data X of the 2 nd 3D Octave convolution ₂ Is Z ₁ Wherein Z is ₁ ^H→H Represents a high frequency part, Z ₁ ^H→L Representing the low frequency part.

Computing the 2 nd 3D Octave convolutional network output Z ₂ ，Z ₂ The expression of (2) is as follows:

Z ₂ ＝[Z ₂ ^L ，Z ₂ ^H ]

＝[(Z ₂ ^L→L +Z ₂ ^H→L )，(Z ^H→H +Z ^L→H )]

＝[∑(W ₂ ^L ) ^T Z ₁ ，∑(W ₂ ^H ) ^T Z ₁ ]

＝[∑(W ₂ ^L→L ) ^T Z ₁ ^L +∑(W ₂ ^H→L ) ^T pool(Z ₁ ^H )，∑(W ₂ ^H→H ) ^T Z ₁ ^H +up(∑(W ₂ ^L→H ) ^T Z ₁ ^L )]

redundant information of the feature map of the hyperspectral image is reduced, and important features are reserved.

High frequency characteristic diagram Z by pooling operation ₂ ^H Downsampling, and then combining the downsampled result with the low-frequency characteristic diagram Z ₂ ^L Fused into a new feature map Z ^pool 。

Input data X of 3 rd 3D Octave convolution ₃ Is Z ^pool The low frequency portion is set to 0.

Computing 3 rd 3D Octave convolutional network output Z ₃ ，Z ₃ The expression of (2) is as follows:

Z ₃ ＝[Z ₃ ^L ，Z ₃ ^H ]

＝[(0+Z ₃ ^H→L )，(Z ₃ ^H→H +0)]

＝[∑(W ₃ ^L ) ^T Z _pool ，∑(W ₃ ^H ) ^T Z _pool ]

＝[∑(W ₃ ^H→L ) ^T pool(Z _pool ^H )，∑(W ₃ ^H→H ) ^T Z _pool ^H ]

input data X of 4 th 3D Octave convolution ₄ Is Z ₃ Wherein Z is ₃ ^H→H Represents a high frequency part, Z ₁ ^H→L Representing the low frequency part.

Calculating the 4 th 3D Octave convolutional network output Z ₄ ，Z ₄ The expression of (2) is as follows:

Z ₄ ＝[Z ₄ ^L ，Z ₄ ^H ]

＝[(Z ₄ ^L→L +Z ₄ ^H→L )，(Z ₄ ^H→H +Z ₄ ^L→H )]

＝[∑(W ₄ ^L ) ^T Z ₃ ，∑(W ₄ ^H ) ^T Z ₃ ]

＝[∑(W ₄ ^L→L ) ^T Z ₃ ^L +∑(W ₄ ^H→L ) ^T pool(Z ₃ ^H )，∑(W ₄ ^H→H ) ^T Z ₃ ^H +up(∑(W ₄ ^L→H ) ^T Z ₃ ^L )]

ensuring the integrity of information, and mapping the low frequency characteristic diagram Z ₄ ^L Fusion to Z after upsampling operation ₄ ^H In (1) to obtain Z ^O 。

The 3D Octave convolution structure is set to be a 4-layer convolution structure, the convolution kernel sizes of the four-layer convolution structure are set to be 5 multiplied by 3, and the number of the convolution kernels is set to be 24, 48, 24 and 1 respectively.

The 3D Octave convolution method aims to reduce spatial redundancy information in the case of preserving spectral dimension information inherent to hyperspectral images. In practice the 3D actave convolution method is a multi-frequency feature representation method, storing high frequency and low frequency maps into different groups, and storing and processing the low frequency parts in the feature map using low-dimensional vectors, since the low frequency components are redundant, redundancy can be reduced by reducing the resolution of the low frequency features. Thus, the following can be deduced: after the 3D actave convolution, the spatially redundant information of the hyperspectral image is greatly reduced, which will have an important impact on the classification of the subsequent hyperspectral image.

And S3, regarding the hyperspectral data output after the step S1 is executed as an ordered spectrum vector. In parallel with the step S2, the spectrum sequences are input into the Bi-directional hidden layer of the Bi-RNN network one by one, and the state output by the forward hidden layer and the state output by the reverse hidden layer are connected through a series function to obtain a vector gn;

let the hyperspectral input data X be an ordered spectral vector, x= (X1, X2, X3,..xn), calculate the Bi-directional hidden layer output hn of the Bi-RNN network as follows:

where n represents the range of spectral bands 1-m, the coefficient matrix and the input from the current concealment layer represent the last concealment state hn-1, starting from hn+1 in the subsequent concealment state, f is the nonlinear activation function of the concealment layer, and the output of the encoder is taken as the input of the vector gn, calculated gn as follows:

The Bi-RNN comprises a hidden layer of a Bi-directional GRU layer, which inputs the spectral sequence one by one, and two hidden layers running in opposite directions are connected to a single output, so that the front and back spectral information in the spectral sequence of the hyperspectral image can be processed.

Step S4, outputting vector g after connecting the two-way hidden layer _n As input to the attention module. Probability weight W obtained by random initialization of attention mechanism _i Vector g of AND _n Adding a bias parameter b to the product of (2) _i After the tanh activation function, the attention weight parameter beta is obtained through calculation of a softmax function. As shown in fig. 3.

The weight values of different spectrum information are obtained, and the weight of the spectrum attention layer is calculated as follows:

e _in ＝tanh(W _i g _n +b _i )

β _in ＝softmax(W _i 'e _in +b _i ')

wherein W is _i And W is _i ' is a transform matrix, b _i And b _i ' is the bias term and softmax () is the mapping of non-normalized output values to probability distributions, with the output values constrained to be within the (0, 1) interval. tanh activation converts it to e _in As h _n Is a new hidden representation of the object. The attention weight β is generated by the softmax layer.

The attention weighting parameter beta and the vector g obtained in the step S3 _n Multiplying the corresponding values, and then summing the multiplied values to obtain a new spectrum information vector label y; comprising the following steps:

calculating a predictive label y for a pixel X _n ：

y _n ＝U[g _n ,β]

Where U () is a summing function for all state vectors under the corresponding attention weighting.

In practice, the spectral curve is not a straight line of fixed constant, but a continuous curve with peaks and valleys. Thus, some important spectral channels in the spectrum should have a greater weight, while those minor spectral segments should be given a lesser weight. The additional attention weighting may enhance the spectral correlation between spectral channels, with a powerful function of capturing context information in the sequence.

In order to assign the appropriate weighting parameters to each spectral channel, the effective features are highlighted and distinguished, more relevant and notable information is obtained, and information that is unfavorable for classification is attenuated. The Bi-RNN attention network is introduced, so that the model can capture the correlation between internal spectrum channels and perform better classification, and the training model is more accurate.

Step S6, extracting spatial feature information Z from the last full-connection layer of the 3D Octave convolution network in the step S2 ^O And (3) combining the new spectrum information vector label y with the last full-connection layer of the Bi-RNN attention network in the step S5 to form a new full-connection layer and outputting a feature vector.

Step S7, inputting the feature vector into a 3-layer full-connection layer network, wherein the network comprises three full-connection layers, and the first two full-connection layers in the three full-connection layers are activated through a relu function after being normalized by using batch_normal; to prevent overfitting, the first two of the three fully connected layers use a regularized Dropout method, and the last fully connected layer outputs a predictive classification result using Softmax.

According to the embodiment, 4 3D Octave convolutions are utilized to extract the space information of the hyperspectral image, so that the redundancy of the space information is reduced, meanwhile, a Bi-RNN attention network is utilized to extract the spectrum information of the hyperspectral image, the attention network is utilized to enhance the importance of the spectrum band information with higher spectrum information content, and the classification accuracy under the condition of low training samples is improved; the advantages of the 3D Octave convolution and the Bi-RNN attention network are fully utilized, the classification result with high accuracy is obtained under the condition of low training samples, and the running speed of the model is accelerated by adopting a parallel data processing mode.

Example 1

The experimental hardware platform is a high-performance computer, which is configured as follows: the Intel Core i9-9900K@3.60GHz eight cores and 32G memory, and the display card is Nvidia GeForce RTX 2080Ti (11 GB). The software platform is Python3.6.0 and TensorFlow1.14 under Windows10 system environment.

1. Experimental data and sample partitioning

To evaluate the classification effect of the proposed method, the Pavia University dataset and the Botswana dataset were selected to verify the performance of the proposed method.

The Pavia University dataset is remote sensing image data obtained by a reflectance optical imaging spectrometer sensor at the University of Pavia in north italy. Its pixel size is 610×340, there are 115 original spectral bands in 430-860 nm range, 12 noise bands are removed, and the remaining 103 spectral bands are used for classification. 9 semantic categories were defined in the Pavia University dataset, the size of each category sample, and the partitioning of the number of experimental training samples and test set samples are shown in table 1.

The Botswana dataset was taken by the U.S. aviation bureau via a Hyperion sensor imaging spectrometer on EO-1 satellite on 31 th year 2001. The image covers a banded zone 7.7km long in the delta area of the borawamour card Mo Ge, the spatial resolution of the image reaches 30m, and the spectral resolution reaches 10nm. The image originally comprises 242 wave bands, after the wave bands affected by noise are removed, the rest 145 wave bands can be used for classifying hyperspectral images, the image size is 1476×256, and 14 different classes are contained in total. The size of each class sample, and the division of the number of experimental training samples and test set samples are shown in table 2.

The classification accuracy evaluation index of the hyperspectral image adopts three commonly used evaluation indexes, namely, overall classification accuracy (OA), average classification accuracy (AA) and Kappa coefficient to measure the classification accuracy.

Table 1 training set and test set sample numbers for Pavia University dataset

Table 2 training set and test set sample numbers for Botswana dataset

2. Parameter setting

In the experiment, three parameters of learning rate, space size and discarding rate can have significant influence on the experiment. Here we take the Pavia University dataset as an example, to evaluate experimental parameters in detail.

1) Learning rate: in experiments, we tested the effect of different learning rates. The learning rate determines the learning process and the amount of assignment error each time the model weights are updated. Too large a learning rate may lead to periodic oscillations in the training, too small a learning rate may lead to model failure to converge. Therefore, the learning rate of the model is respectively selected from [0.01,0.005,0.001,0.0007,0.0005,0.0003,0.0001,0.00007,0.00005,0.00003,0.00001] to carry out experiments, and the results show that the classification effect is best when the learning rate is 0.0001.

2) Space size: because of the extraction of image space features, the size of the spatial domain area is severely dependent. And a larger spatial input will provide more opportunities to learn more spatial features, but the larger the spatial region, the more information that is unnecessary and the potential for image overcorrection. Therefore, selecting an appropriate space size is very important for improving classification performance. The classification accuracy results under different spatial dimensions are shown in table 3 under the condition that the number of spectrum channels is fixed, the optimal learning rate, the batch size is 32, and the training iteration number is 100.

As is clear from tables 3 and 4, the sorting effect is best when the space size of the input data is 15×15 and the loss rate is 0.6. In order to optimize the classification performance, the best loss rate was chosen for this experiment.

Also in the Botswana dataset, the learning rate of the experiment was set to 0.0001, the spatial size was 13×13, the batch size was 16, and the number of iterations of training was set to 400 for optimal classification performance.

TABLE 3 classification accuracy at different spatial dimensions

TABLE 4 classification accuracy at different loss rates

3. Experimental results

To ensure the accuracy of the experimental results, the experiment was repeated 10 times and then averaged.

In order to verify the effectiveness and superiority of the proposed method, the invention is experimentally compared with some traditional methods and mainstream deep learning methods (such as SVM, ARNN, SSAN,3DOC-SSAN, 2D-CNN). The results of the classification performance versus experiment for the different methods on the Pavia University dataset are shown in table 5.

As can be seen from the results in Table 5, the performance of the proposed method is obviously better than that of the SVM of the traditional method on the Pavia University dataset, and the OA value, AA value and Kappa value of the proposed method are all higher than those of other mainstream deep learning classification methods, wherein the OA value is 9.50% higher than that of the SVM, the OA value is 0.70% higher than that of the 2D-CNN, the OA value is 1.74% higher than that of the ARNN, the OA value is 0.97% higher than that of the SSAN, and the OA value and Kappa value are 0.10% higher than that of the 3DC-SSAN classification method. AA was 6.28% higher than SVM, 0.55% higher than 2D-CNN, 0.79% higher than ARNN, 0.66% higher than SSAN, and 0.09% higher than 3 DC-SSAN. Kappa values were 11.02% higher than SVM, 1.61% higher than 2D-CNN, 1.47% higher than ARNN, 2.37% higher than SSAN, and 0.02% higher than 3 DC-SSAN. All three indexes show that the method is superior to other methods in classification performance.

TABLE 5 Classification Properties of different methods in a Pavia University dataset

And a classification diagram of the different methods on the Pavia University dataset is shown in fig. 4. As can be seen from the figure, the final classification results of SVM, ARNN, SSAN,2D-CNN all have a large number of scrambled spots, and some regions have misclassification phenomenon. The 3DOC-SSAN method has good classification effect, but there are fewer spots in the lower right and upper left corners. The classification result diagram of the method of the invention is characterized in that the ground objects are basically and completely classified, spots are hardly seen, and the ground objects are relatively smooth in a homogeneous region.

The classification performance of the different methods on the Botswana dataset versus the experimental results are shown in table 6. Meanwhile, a classification chart on the dataset is shown in fig. 5.

As can be seen from Table 6, the accuracy of the method of the present invention on the Botswana dataset was higher than that of the other methods on the three indicators of OA value, AA value and Kappa value. Wherein the OA value is 8.88% higher than SVM, 1.65% higher than 2D-CNN, 2.77% higher than ARNN, 1.79% higher than SSAN, and 0.36% higher than the accuracy of the 3DC-SSAN classification method. AA was 9.67% higher than SVM, 1.95% higher than 2D-CNN, 2.71% higher than ARNN, 1.69% higher than SSAN, and 0.34% higher than 3 DC-SSAN. Kappa values were 7.57% higher than SVM, 1.81% higher than 2D-CNN, 3.01% higher than ARNN, 1.94% higher than SSAN, and 0.38% higher than 3 DC-SSAN. The classification precision of the method reaches 100% in 11 categories, and the classification precision of the other two categories reaches more than 99.88% except 97.38% in the flood plain grassland 1.

Table 6 classification performance of different methods on Botswana dataset

Meanwhile, as can be seen from tables 5 and 6, the 3D Octave convolution classification method 3DC-SSAN and the method of the invention have obviously better performance than the classification methods of 2D-CNN, ARNN and SSAN, which proves that the 3D Octave convolution has certain advantages in reducing space redundancy information and improving classification performance. The method of the invention has better classification performance than the 3DC-SSAN method without adding the Bi-RNN attention network, which proves that the Bi-RNN attention network has certain advantages in the aspect of enhancing the extraction of the information of the spectral characteristics and is beneficial to the improvement of the classification performance.

In addition, compared with a 3DOC-SSAN model, the method provided by the invention has the advantages that a space attention network module is not required to be additionally added in the model, the model is relatively simple, parallel processing can be adopted in the former model training, and the speed is higher when parallel operation calculation is adopted. Because the data flow of the 3DOC-SSAN method is serial, hyperspectral data are input into an Octave convolution model for preprocessing, the data can be respectively added into a spectrum and a spatial attention network for spatial spectrum characteristic extraction, then characteristic information is fused through a data fusion module, and finally classification is carried out. The method of the invention is different in that the data streams of the method of the invention are parallel. Meanwhile, the Bi-RNN attention network runs once for about 3 times faster than the 3D actave convolutional network. When parallel operation is adopted, the method is applicable to both a task-based parallel processing mode and a data-based parallel processing mode. After the 3D Octave convolution network is executed, the method can directly inject the spatial spectrum characteristic information into the fusion network, and the spatial attention and the spectral attention network are operated without additional time expenditure, so that the operation time of the model is greatly reduced.

4. Conclusion(s)

In order to reduce redundancy of space characteristic information, enhance acquisition of spectrum information and improve classification performance of hyperspectral images, the invention provides a novel model based on 3D Octave convolution and Bi-RNN attention network. The model has a simple structure, does not need complex pretreatment and post-treatment on hyperspectral image data, and can realize end-to-end training. Experiments show that compared with the traditional method, the classification performance is obviously improved, and compared with the current mainstream deep learning algorithms, the method provided by the invention can more fully extract the spatial and spectral characteristic information and has better classification performance.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. The hyperspectral image classification method is characterized by being based on a 3D actave convolution and a Bi-RNN attention network, wherein the Bi-RNN is a bidirectional cyclic neural network, and comprises the following steps of:

s1, acquiring hyperspectral remote sensing images to be classified;

s2, utilizing more than 4 continuous 3D Octave convolutions to acquire space characteristic information Z for hyperspectral images;

step S5, combining the attention weighting parameter beta with the vector g obtained in the step S3 _n Multiplying the corresponding values, and then summing the multiplied values to obtain a new spectrum information vector label y;

step S6, combining the spatial feature information Z extracted by the last full-connection layer of the 3D Octave convolution network in the step S2 with the new spectral information vector label y obtained by the last full-connection layer of the Bi-RNN attention network in the step S5 to form a new full-connection layer, and outputting a feature vector;

s7, inputting the feature vectors into a network of more than two layers of full-connection layers, and predicting classification results through a softmax layer;

and wherein step S2 comprises:

let the size of the image for hyperspectral image classification be w×h×l;

taking hyperspectral data X as the input of a 3D Octave convolution network, the input and output data of the Octave convolution network are assumed to be X= { X respectively ^H ，X ^L }，Z＝{Z ^H ，Z ^L H and L are represented as high frequency information and low frequency information, respectively; that is, the input hyperspectral data X and the data Z output after the data processing of the 3D Octave convolution network can be respectively expressed as corresponding high-frequency information and low-frequency informationAnd;

the Octave convolution model is built as follows:

Z ^H ＝Z ^H→H +Z ^L→H and Z ^L ＝Z ^L→L +Z ^H→L ；

Wherein Z is ^H→H ，Z ^L→L Update of hyperspectral image data information in high frequency and low frequency, Z ^L ^→H ，Z ^H→L Representing the conversion of hyperspectral image data information between low frequency and high frequency and between high frequency and low frequency, respectively;

in order to complete the updating and conversion of the high-frequency characteristic information and the low-frequency characteristic information of the hyperspectral image, it is assumed that the weight parameter corresponding to the Octave convolution model is W= [ W ] ^H ，W ^L ]The method comprises the steps of carrying out a first treatment on the surface of the Likewise, the weight parameter W ^H And W is ^L Respectively defined as W ^L ＝[W ^L→L ,W ^H→L ],W ^H ＝[W ^H ^→H ,W ^L→H ]Wherein W is ^H→H ，W ^L→L Representing the information update weight, W, in the corresponding frequency ^H→L ，W ^L→H Information conversion weights between corresponding frequencies are represented;

from above obtain Z ^H And Z ^L The expressions of (2) are respectively:

Z＝[Z ^L ，Z ^H ]

＝[(Z ^L→L +Z ^H→L )，(Z ^H→H +Z ^L→H )]

＝[∑(W ^L ) ^T X，∑(W ^H ) ^T X]

＝[Σ(W ^L→L ) ^T X ^L +Σ(W ^H→L ) ^T pool(X ^H )，Σ(W ^H→H ) ^T X ^H +up(∑(W ^L→H ) ^T X ^L )]。

2. the hyperspectral image classification method of claim 1 wherein the number of layers of the fully connected layer network in step S7 is 2-5.

3. The hyperspectral image classification method as claimed in claim 1, wherein the step S3 comprises:

And->

Input from the current hidden layer,/->

Representing the last hidden state h _n-1 ，/>

4. The hyperspectral image classification method as claimed in claim 1, wherein the step S4 comprises:

e _in ＝tanh(W _i g _n +b _i ) (6)

β _in ＝softmax(W _i 'e _in +b′ _i ) (7)

5. The hyperspectral image classification method as claimed in claim 1, wherein the step S5 comprises:

calculating a predictive label y for a pixel X _n ：

y _n ＝U[g _n ,β] (8)

6. The hyperspectral image classification method as claimed in claim 1, wherein the step S7 includes: the feature vector is input into a 3-layer full-connection layer network, wherein the 3-layer full-connection layer network comprises three full-connection layers, the first two full-connection layers in the three full-connection layers are normalized by using Batch_normal, then activated by a relu function, then a regularized Dropout method is used, and the last full-connection layer outputs a prediction classification result by using Softmax.

7. The hyperspectral image classification method as claimed in any one of claims 1 to 6, wherein the hyperspectral image classification method is done using a hyperspectral image classification system comprising a hyperspectral image module (1), a convolutional network module (2), a Bi-RNN attention network module (3), a spatio-spectral feature fusion network module (4) and a classified image module (5), wherein step S1 is done in the hyperspectral image module (1), step S2 is done in the 3D actave convolutional network module (2), steps S3 to S5 are done in the Bi-RNN attention network module (3), step S6 is done in the spatio-spectral feature fusion network module (4) and step S7 is done in the classified image module (5).