CN112052758B

CN112052758B - Hyperspectral image classification method based on attention mechanism and cyclic neural network

Info

Publication number: CN112052758B
Application number: CN202010860393.9A
Authority: CN
Inventors: 冯婕; 赵宁; 焦李成; 张向荣; 尚荣华; 王蓉芳; 刘若辰
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2020-08-25
Filing date: 2020-08-25
Publication date: 2023-05-23
Anticipated expiration: 2040-08-25
Also published as: CN112052758A

Abstract

The invention discloses a hyperspectral image classification method based on an attention mechanism and a cyclic neural network, which mainly solves the problem of low classification precision caused by insufficient hyperspectral image feature extraction in the prior art. The implementation steps are as follows: (1) constructing a recurrent neural network; (2) generating a training set; (3) training a recurrent neural network; (4) classifying the hyperspectral image to be classified. The invention utilizes the attention proposal module and the processing module to obtain the local area which needs to be focused on by the spatial pixel block of the input hyperspectral image, utilizes the cyclic neural network to extract the spatial-spectral joint characteristics of the hyperspectral image to be classified to obtain the classification result, has the advantage of high classification precision of the hyperspectral image, and can be used for ground object target identification in the fields of resource exploration, forest covering, disaster monitoring and the like.

Description

Hyperspectral image classification method based on attention mechanism and cyclic neural network

Technical Field

The invention belongs to the technical field of image processing, and further relates to a hyperspectral image classification method based on an attention mechanism and a recurrent neural network in the technical field of hyperspectral image classification. The method can be used for classifying the ground object targets in the hyperspectral images and identifying the ground object targets in the fields of resource exploration, forest covering, disaster monitoring and the like.

Background

With the development of remote sensing science and technology and imaging technology, the application field of hyperspectral remote sensing technology is more and more wide. The hyperspectral data is added with one-dimensional spectral information on the basis of common two-dimensional image data, and can be regarded as a three-dimensional data cube. The hyperspectral remote sensing image can acquire the approximately continuous spectral information of the target ground object in a large number of wave bands such as ultraviolet, visible light, near infrared, mid infrared and the like, and has important value for accurately identifying the ground object. Hyperspectral remote sensing images have been widely used in urban environmental monitoring, surface soil monitoring, geological exploration, disaster assessment, agricultural yield estimation, crop analysis and the like. In recent years, hyperspectral image classification methods are sequentially proposed, and traditional classification methods based on K-nearest neighbor, support vector machines and the like and deep learning methods based on cyclic neural networks, convolutional neural networks and the like all obtain good results. However, the hyperspectral image classification field still has the following problems, such as the characteristics of the hyperspectral image, i.e. the spectrum difference of the same kind of pixels is large and the characteristic difference of different kinds of pixels is small, so that the traditional classifier is difficult to judge correctly; besides, the hyperspectral image has rich space information and spectrum information, and the traditional classification method is difficult to fully extract high-identification features in the two types of information and perform fusion classification of the two types of features, so that the classification precision is not high.

Lichao Mou et al in its published paper "Deep Recurrent Neural Networks for Hyperspectral Image Classification" ("IEEE Transactions on Geoscience & Remote Sensing", 2017, 55 (7): 3639-3655) propose a hyperspectral image classification method based on deep-loop neural networks. According to the method, a hyperspectral image is regarded as an image containing time sequence information, images among different wave bands are considered to have time sequence, then feature vectors based on single pixel points are constructed, each wave band value of the feature vectors is sequentially input into a corresponding cyclic network module, finally the single pixel points are classified into various types of class probabilities through a full connection layer and a softmax layer, and classification of the hyperspectral image pixel by pixel points is achieved. The cyclic convolution network is different from the traditional feedforward neural network, can memorize the information of the previous layers of networks and is applied to the calculation of the current layer, is good at processing the sequence signals with time sequence relation, and the cyclic neural network is used for classifying the hyperspectral images to obtain good effect. However, the method still has the defects that the designed cyclic neural network is deeper, the training parameters are more, the spatial correlation and similarity between each pixel point and the adjacent pixel points are ignored, the spatial information is not effectively utilized, the fusion of the spectrum and the spatial information is insufficient, the distinguishable characteristic extraction is insufficient, and the classification precision is not high.

The university of aviation aerospace in Beijing proposes a hyperspectral image classification method based on deep learning in its application patent document (patent application number: 201710052345.5, application publication number: CN 106845418A). The method firstly adopts a nonlinear self-coding network to reduce the dimension of the hyperspectral image. In the image after dimension reduction, a data cube with a label pixel neighborhood is used as a sample to be input into a convolutional neural network, then a label corresponding to the pixel is used as expected output of the convolutional neural network, the convolutional neural network is trained, and finally the trained convolutional neural network acts on each pixel in a hyperspectral image to obtain a classification result. Although the method retains the nonlinear information of the samples, the method still has the defects of too many network training parameters, too small number of samples relative to the number of the network parameters, long training time and low classification speed.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a hyperspectral image classification method based on an attention mechanism and a cyclic neural network, which is used for solving the problems of insufficient fusion of hyperspectral image spectrum information and space information, insufficient distinguishable feature extraction, too many network training parameters, low classification precision and low classification speed in the prior art.

The idea for realizing the purpose of the invention is as follows: because the prior art is too dependent on the classification model of the cyclic neural network for the hyperspectral image, the inherent characteristics of the hyperspectral image are not considered, the input of the cyclic neural network with time sequence information is simulated by using the spectral vector of the continuous wave band, the spectral vector of a single pixel in the hyperspectral image is only used as the input of the cyclic neural network, the occurrence of the phenomenon of spectral information redundancy exists, the number of layers of the constructed cyclic neural network is larger, in addition, the prior art ignores the spatial information of the adjacent area of the single pixel, the fusion of the spatial information and the spectral information is insufficient, and the classification precision and the classification speed of the hyperspectral image are low. According to the invention, the attention proposal module is used for capturing the local area which needs to be focused on by the input space pixel block, so that the time sequence required by the input of the cyclic neural network is satisfied, the advanced semantic information which is easy to classify is better mined, the sample data volume required by the training network is reduced, the network can be converged more quickly, and the precision and the speed of the classification of the hyperspectral image are improved.

The specific steps for realizing the invention are as follows:

(1) Constructing a recurrent neural network:

(1a) Building a cyclic neural network formed by cascading three sub-networks with different structures:

the first sub-network and the second sub-network in the cyclic neural network have the same structure, and the first sub-network and the second sub-network in turn are as follows: the system comprises a 1 st convolution layer, a 1 st pooling layer, a 2 nd convolution layer, a 2 nd pooling layer, an attention proposal module and a processing module, wherein the attention proposal module is formed by connecting two fully-connected layers in series, and the processing module is realized by mask calculation, a pixel level multiplication function and a bilinear interpolation function; setting parameters of each layer as follows: the number of convolution kernels of the 1 st convolution layer and the 2 nd convolution layer is set to be 16 and 32 respectively, the sizes of the convolution kernels are set to be 3 multiplied by 3, the step sizes are set to be 1, and the filling pixels are set to be 1; setting the sizes of the 1 st and 2 nd pooling layer filters to 2 multiplied by 2, and setting the step sizes to 2; setting the number of nodes of two full-connection layers of the attention proposal module to 128 and 3 respectively;

the structure of the third sub-network in the cyclic neural network is as follows: convolution 1- & gt pond 1- & gt convolution 2- & gt pond 2- & gt convolution 3- & gt pond 4- & gt convolution 4- & gt pond 4- & gt full connection layer 1; setting parameters of each layer as follows: the number of convolution kernels of the 1 st, 2 nd, 3 rd and 4 th convolution layers is set to 64, 128, 256 and 512 respectively, the convolution kernels are set to 3×3 in size, the step sizes are set to 1, and the filling pixels are set to 1; setting the sizes of the 1 st, 2 nd, 3 rd and 4 th pooling layer filters to be 2 multiplied by 2, and setting the step sizes to be 2; setting the number of nodes of the 1 st full-connection layer as the category number of hyperspectral images to be classified;

(2) Generating a training set:

(2a) Inputting a hyperspectral image and a corresponding label image;

(2b) Selecting 5 wave bands containing 99% of information from the hyperspectral image by using a principal component analysis method to form a hyperspectral image of 5 wave bands, and normalizing the hyperspectral image to be between 0 and 1 to obtain a normalized hyperspectral image;

(2c) Taking each tagged pixel as a center on the normalized hyperspectral image, taking a space pixel block with a neighborhood size of 61 multiplied by 61 pixels, and forming all the space pixel blocks into a space pixel block set;

(2d) Randomly selecting 5% of spatial pixel blocks from the spatial pixel block set to form a training set;

(3) Training a recurrent neural network:

inputting all the space pixel blocks in the training set into a first sub-network of a cyclic neural network, obtaining a square region of interest which needs to be focused on by using an attention proposal module, obtaining an input space pixel block set of a second sub-network of the cyclic neural network by using a processing module, inputting the input space pixel block set of the second sub-network of the cyclic neural network into the attention proposal module and the processing module, obtaining an input space pixel block set of a third sub-network of the cyclic neural network, outputting a prediction label of each space pixel block, calculating loss values between the prediction labels and real labels of all the space pixel blocks by using a cross entropy loss function, and updating all parameters in the cyclic neural network by using a gradient descent algorithm until the cyclic neural network converges to obtain the trained cyclic neural network;

(4) Classifying hyperspectral images to be classified:

and (3) processing the hyperspectral image to be classified by adopting the same method as the step (2) to obtain all the space pixel block sets, inputting all the space pixel blocks into a trained cyclic neural network, and outputting a prediction label of each space pixel block set.

Compared with the prior art, the invention has the following advantages:

firstly, because the circulating neural network based on the attention mechanism constructed by the invention can effectively capture the local area which needs to be focused on by the attention proposal module, the time sequence of the circulating neural network is satisfied, the circulating neural network is favorable for better mining the high-level semantic information which is easy to classify, the space-spectrum joint characteristics of the hyperspectral image to be classified are extracted by the circulating neural network, the problems that the prior art excessively depends on the classification model of the circulating neural network for the hyperspectral image, the spectrum vector of a single pixel is used for simulating the information with the time sequence, the inherent characteristics of the hyperspectral image are not considered, the spectrum vector of a continuous wave band is used for simulating the input of the circulating neural network with the time sequence information, the space information is not effectively utilized, the fusion of the spectrum and the space information is insufficient, the distinguishable characteristics are extracted insufficiently, and the classification precision is not high are overcome. The method and the device make full use of the space-spectrum combined characteristics and time sequence information of the hyperspectral image, and improve the classification precision.

Secondly, because the circulating neural network based on the attention mechanism constructed by the invention has fewer network layers and training parameters, the sample data volume required by the training network is reduced, the network can converge more quickly, the classification speed is improved, and the problems of more samples, long training time and low classification speed caused by more circulating neural network layers and training parameters in the prior art are solved. The number of layers and training parameters of the loop neural network are fewer, the number of samples required by the training network is reduced, the network can converge more quickly, and the classification speed is improved.

Drawings

FIG. 1 is a flow chart of the present invention;

fig. 2 is a simulation diagram of the present invention, in which fig. 2 (a) is a pseudo-color diagram of a hyperspectral image to be classified used in a simulation experiment of the present invention, fig. 2 (b) is a manual mark diagram of the hyperspectral image to be classified, fig. 2 (c) is a simulation diagram of the prior art, and fig. 2 (d) is a simulation diagram of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

The specific steps of the present invention will be described in further detail with reference to fig. 1.

And 1, constructing a circulating neural network.

A cyclic neural network formed by cascading three sub-networks with different structures is built.

the mask calculation in the processing module is realized by the following formula:

wherein ,M_i Mask representing ith sub-network of cyclic neural network, i=1, 2, x, y representing mask M of ith sub-network, respectively _i The values of x, y are equal to 0 to the width and height of the hyperspectral image to be classified,

respectively representing the abscissa and the ordinate of the upper left corner of the square region of interest of the ith sub-network,/>

Respectively representing the abscissa and the ordinate of the right lower corner of the square region of interest of the ith sub-network,

representing the center point coordinates and half the width value of the square region of interest output by the attention proposal module of the ith sub-network of the recurrent neural network, respectively.

The pixel-level multiplication function in the processing module is as follows:

wherein ,

representing a square region block of interest obtained by an ith sub-network of the cyclic neural network, X _i Representing an input spatial pixel block of the ith sub-network of the recurrent neural network.

The bilinear interpolation function in the processing module is as follows:

wherein ,X_j(p,q) Representing pixel values of an input spatial pixel block of a jth sub-network of the cyclic neural network at coordinates (p, q), j=2, 3, p, q ranging from 0 to the sum of the widths and heights of the hyperspectral images to be classified, Σ representing the summation operation, α, β=0, 1, λ representing the upsampling factor, equal in magnitude to X _i Wide value and wide value

Double the quotient of the width values, +.>

Representation->

Pixel value at coordinates (m, n), pixel value at coordinates (m, n)>

{. And []Representing operations to take fractional and integer parts, respectively.

And 2, generating a training set.

(2a) Inputting a hyperspectral image and a corresponding label image, wherein the input hyperspectral image is an Indian pins hyperspectral image in the example, the sizes of the height and the width are 145, the wave band number is 200, and the category number is 16;

the normalization formula is as follows:

wherein R represents a hyperspectral image after normalization processing, I represents a hyperspectral image before normalization processing, and max (·) and min (·) represent operations of taking the maximum value and the minimum value respectively.

(2d) Randomly selecting 5% of spatial pixel blocks from the spatial pixel block set to form a training set, wherein the number of the spatial pixel blocks in the training set is 512 in the example;

and step 3, training a cyclic neural network.

Inputting all the space pixel blocks in the training set into a cyclic neural network, obtaining an input space pixel block of a second sub-network of the cyclic neural network by using an attention proposal module and a processing module, inputting the input space pixel block of the second sub-network into the attention proposal module and the processing module to obtain an input space pixel block of a third sub-network of the cyclic neural network, outputting a prediction label of each space pixel block, calculating loss values between the prediction labels and real labels of all the space pixel blocks by using a cross entropy loss function, and updating all parameters in the cyclic neural network by using a gradient descent algorithm until the cyclic neural network converges to obtain the trained cyclic neural network;

the cross entropy loss is calculated from the following formula:

wherein: l represents cross entropy loss, h represents the total number of spatial pixel blocks in the training set, Σ represents the summing operation, g=1 _g The true label vector representing the g-th block of spatial pixels, ln represents a logarithmic operation based on a natural constant e, f _g And the prediction label vector output by the cyclic neural network to the g space pixel block is represented.

And 4, classifying the hyperspectral images to be classified.

And (2) processing the hyperspectral image to be classified by adopting the same method as the step (2) to obtain all the space pixel block sets, inputting all the space pixel blocks into a trained cyclic neural network, and outputting a prediction label of each space pixel block set.

The effects of the present invention are further described below in conjunction with simulation experiments:

1. simulation experiment conditions:

the hardware platform of the simulation experiment of the invention is: the processor is Intel i7 7820x CPU, the main frequency is 3.6GHz, and the memory is 64GB.

The software platform of the simulation experiment of the invention is: windows 10 operating system and python 3.6.

The input image used in the simulation experiment of the invention is Indian pine Indian pins hyperspectral image, the hyperspectral data is collected from Indian remote sensing test area in northwest of Indiana in U.S., the imaging time is 6 months in 1992, the image size is 145×145×200 pixels, the image contains 200 wave bands and 16 types of ground objects in total, and the image format is mat.

2. Simulation content and result analysis:

the simulation experiment of the invention adopts the deep cyclic neural network DRNN classification method of the invention and the prior art to respectively classify the input Inndian Pines hyperspectral images to obtain a classification result graph.

In the simulation experiment, the adopted prior art refers to:

the prior art deep-loop neural network DRNN classification method is a hyperspectral image classification method, which is proposed by Mou Lichao et al in 'Deep Recurrent Neural Networks for Hyperspectral Image Classification, IEEE Transactions on Geoscience & Remote Sensing,55 (7): 3639-3655, 2017', and is called as the deep-loop neural network DRNN classification method for short.

The effects of the present invention are further described below in conjunction with the simulation diagram of fig. 2.

FIG. 2 (a) is a pseudo-color image composed of the 50 th, 27 th and 17 th bands of the Indian Pines hyperspectral image bands. Fig. 2 (b) is a manual signature of the input hyperspectral image Indian Pines. Fig. 2 (c) is a simulation diagram of classifying Indian pins hyperspectral images of Indian pine using a prior art deep-loop neural network DRNN classification method. FIG. 2 (d) is a simulation of classifying an Indian Pines hyperspectral image using the method of the present invention.

As can be seen from fig. 2 (c), compared with the classification result of the present invention, the classification result of the deep cyclic neural network DRNN in the prior art has more noise points and poor edge smoothness, mainly because the method only extracts the spectral features of the hyperspectral image pixels, and does not extract the spatial features, resulting in low classification accuracy.

As can be seen from fig. 2 (d), compared with the classification result of the deep-circulation neural network DRNN in the prior art, the classification result of the invention has fewer noise points, better region consistency and edge smoothness, and the classification effect of the invention is better than that of the classification method of the deep-circulation neural network DRNN in the prior art, and the classification effect is more ideal.

The classification results of the two methods were evaluated by the following three evaluation indexes (classification accuracy of each class, total accuracy OA, average accuracy AA) respectively.

TABLE 1 quantitative analysis Table of the classification results of the invention and the prior art in simulation experiments

The total accuracy OA, the average accuracy AA, and the classification accuracy of the 16-class ground object were calculated by using the following formulas, and all the calculation results were plotted in table 1:

as can be seen from the combination of Table 1, the overall classification accuracy OA of the invention is 96.5%, the average classification accuracy AA is 95.2%, and the two indexes are higher than those of the prior art method, so that the invention can obtain higher hyperspectral image classification accuracy.

The simulation experiment shows that: the method of the invention utilizes the built cyclic neural network with the attention mechanism, utilizes the attention proposal module to effectively capture the local area which needs to be focused on by the input space pixel block, is favorable for better mining the advanced semantic information which is easy to classify, utilizes the cyclic neural network to fully extract the space-spectrum combined characteristic of the hyperspectral image to be classified, improves the classification precision of the hyperspectral image, can lead the network to converge more quickly, improves the classification speed, solves the problems of insufficient fusion of spectrum and space information, insufficient distinguishable characteristic extraction, low classification precision, too many network training parameters, long training time and slow classification speed of the network training in the prior art method, and is a very practical hyperspectral image classification method.

Claims

1. A hyperspectral image classification method based on an attention mechanism and a cyclic neural network is characterized in that the cyclic neural network is constructed and trained, a local area of interest of an input space pixel block is obtained by using an attention proposal module and a processing module, and a spatial-spectral joint characteristic of a hyperspectral image to be classified is extracted by using the cyclic neural network, and the method specifically comprises the following steps:

(1) Constructing a recurrent neural network:

the first sub-network and the second sub-network in the cyclic neural network have the same structure, and the first sub-network and the second sub-network in turn are as follows: the system comprises a 1 st convolution layer, a 1 st pooling layer, a 2 nd convolution layer, a 2 nd pooling layer, an attention proposal module and a processing module, wherein the attention proposal module is formed by connecting two fully-connected layers in series, and the processing module is realized by mask processing, a pixel level multiplication function and a bilinear interpolation function; setting parameters of each layer as follows: the number of convolution kernels of the 1 st convolution layer and the 2 nd convolution layer is set to be 16 and 32 respectively, the sizes of the convolution kernels are set to be 3 multiplied by 3, the step sizes are set to be 1, and the filling pixels are set to be 1; setting the sizes of the 1 st and 2 nd pooling layer filters to 2 multiplied by 2, and setting the step sizes to 2; setting the number of nodes of two full-connection layers of the attention proposal module to 128 and 3 respectively;

(2) Generating a training set:

(2a) Inputting a hyperspectral image and a corresponding label image;

(3) Training a recurrent neural network:

(4) Classifying hyperspectral images to be classified:

2. The method of classifying hyperspectral images based on an attention mechanism and a recurrent neural network as claimed in claim 1, wherein the masking process in step (1 a) is implemented by the following formula:

Respectively representing the abscissa and the ordinate of the right lower corner of the square area of interest of the ith sub-network,/>

Representing the center point coordinates and half of the width value of the square region of interest output by the attention module of the ith sub-network of the recurrent neural network, respectively.

3. The method of classifying hyperspectral images based on an attention mechanism and a recurrent neural network as claimed in claim 1, wherein the pixel-wise multiplication function in step (1 a) is as follows:

wherein ,

4. The method of classifying hyperspectral images based on an attention mechanism and a recurrent neural network as claimed in claim 1, wherein the bilinear interpolation function in step (1 a) is as follows:

Double the quotient of the width values, +.>

Representation->

Pixel value at coordinates (m, n), pixel value at coordinates (m, n)>

5. The method of classifying hyperspectral images based on an attention mechanism and a recurrent neural network as claimed in claim 1, wherein the normalization formula in step (2 b) is as follows:

6. The method for classifying hyperspectral images based on an attention mechanism and a recurrent neural network according to claim 1, wherein: the cross entropy loss in step (3) is calculated from the following formula: