CN112836773A

CN112836773A - Hyperspectral image classification method based on global attention residual error network

Info

Publication number: CN112836773A
Application number: CN202110376903.XA
Authority: CN
Inventors: 高红民; 张亦严; 陈忠昊; 曹雪莹; 李臣明
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2021-04-08
Filing date: 2021-04-08
Publication date: 2021-05-25
Anticipated expiration: 2041-04-08
Also published as: CN112836773B

Abstract

The invention discloses a hyperspectral image classification method based on a global attention residual error network, which comprises the following steps of: constructing an integral network, which comprises a multi-scale feature extraction network, a global attention module and an improved residual error network module; performing multi-scale feature extraction, and extracting the hierarchical features of the hyperspectral image; the global attention module is used for constructing a spatial and spectral dependency relationship of a global pixel point through the combination of the spatial attention module and the spectral attention module; fusing the improved residual error network module and the global attention module to form a novel global attention residual error network; and sending the output result into a classifier for final classification through global pooling, and outputting the result. According to the method, rich space-spectrum characteristics are obtained simultaneously by introducing the multi-scale large and small receptive fields and the global attention module, the problem of gradient disappearance is relieved by adding the improved residual error network, and network convergence is accelerated, so that the classification precision is improved, and a good and stable classification effect is ensured.

Description

Hyperspectral image classification method based on global attention residual error network

Technical Field

The invention belongs to the technical field of hyperspectral remote sensing image processing, and particularly relates to a hyperspectral image classification method based on a global attention residual error network.

Background

HyperSpectral Images (HSI, HyperSpectral Images) have significant applications in various remote sensing fields such as target detection, agricultural monitoring, ocean security and the like in recent years, and the HyperSpectral Images are different from conventional two-dimensional digital Images and are three-dimensional cube data consisting of two-dimensional digital Images and one-dimensional spectral dimensions. The spectrum wave band contains abundant geographic feature information, so the feature selection and the feature extraction are particularly important for the classification of hyperspectral pixel points. The conventional method for classifying the hyperspectral images commonly comprises the following steps: k Nearest neighbor (K-NN, K Nearest neighbor) [6], Extreme Learning Machine (ELM), Support Vector Machine (SVM), and the like. The classification is carried out by the aid of spectral information, and K neighbor is the simplest classifier in the field of machine learning; the extreme learning machine is a novel fast learning algorithm, and for a neural network with a single hidden layer, the ELM can randomly initialize input weights and bias and obtain corresponding output weights. The Support Vector Machine (SVM) seeks the optimal approximation to the model through small sample data training to obtain the optimal balance point between the complexity and learning ability of the model, and is one of widely used machine learning algorithms. In addition, Principal Component Analysis (PCA), which reduces noise interference by selecting a compressed spectrum band through a frequency band, can retain important features, and is particularly suitable for hyperspectral image classification.

Image classification based on a deep learning method is becoming more and more popular with researchers in recent years. The basic idea of deep learning image classification is to use a simple linear equation and an activation function to obtain nonlinear features through the dense connection of multiple layers of neurons, extract abstract features from original image data, and form specific network weight parameters through the training of a large amount of original data, thereby improving the classification accuracy of images. The spectral dimension of the hyperspectral image is up to dozens or even hundreds of wave bands, Chen et al introduces the concept of deep learning into the hyperspectral image classification for the first time, provides a model based on a stacked self-encoder (SAE), extracts original data information by using an automatic encoder, and finally classifies by using an SVM (support vector machine), but the method only uses spectral information but neglects the use of spatial information; in order to further utilize spatial feature information of the hyperspectral image, a hyperspectral algorithm based on a Convolutional Neural Network (CNN) is proposed. CNN has made a great breakthrough in the field of computer vision due to its excellent image characterization ability and has been successfully applied to the field of hyperspectral image classification. Makantasis et al designs a 2D convolution-based neural network model, packs the middle pixels into blocks of fixed size by filling with surrounding pixels, inputs them into the neural network for spatial information extraction, and finally sends them into the multilayer perceptron for classification. However, more and more researches show that the expected effect is difficult to achieve by simply using one dimension for image classification, so researchers are more focused on classification experiments combining spatial information and spectral information.

Along with the continuous deepening of feature extraction, the neural network is inevitably deeper and deeper, the phenomena of gradient dispersion and network degradation become more serious along with the increase of training times, and the classification precision does not rise or fall; the size of a convolution kernel in the convolution neural network is mostly fixed parameters, so that the scope of a receptive field is limited, generally speaking, a fine-grained structure of an image cannot be captured if the receptive field is large, a coarse-grained structure of the image cannot be eliminated if the receptive field is small, and the fixed receptive field size cannot adapt to images with different ground feature characteristics; due to the spectrum redundancy characteristic of the hyperspectral image, the feature extraction capability of the neural network is limited, and the key for improving the performance is to selectively screen out more critical features from a plurality of complex features of the hyperspectral image.

Disclosure of Invention

The purpose of the invention is as follows: in order to solve the problems of insufficient utilization of image features, limited scope of receptive fields, difficulty in network structure redundancy convergence and the like in a hyperspectral remote sensing image classification algorithm in the prior art, a hyperspectral image classification method based on a global attention residual error network is provided, a multiscale global attention residual error network (GSSARN) is designed, rich space-spectrum features are obtained simultaneously by introducing multiscale receptive fields and global attention modules, the problem of gradient disappearance is relieved by adding an improved residual error network, and network convergence is accelerated.

The technical scheme is as follows: in order to achieve the above object, the present invention provides a hyperspectral image classification method based on a global attention residual error network, comprising the following steps:

s1: constructing an integral network, wherein the integral network comprises a multi-scale feature extraction network, a global attention module and an improved residual error network module, and the global attention module comprises a spatial attention module and a spectral attention module;

s2: performing multi-scale feature extraction on the initialized and preprocessed hyperspectral data by using a multi-scale feature extraction network through convolution kernels with different sizes, and extracting layered features of the hyperspectral image;

s3: the global attention module is used for constructing a spatial and spectral dependency relationship of a global pixel point through the combination of the spatial attention module and the spectral attention module;

s4: fusing the improved residual error network module and the global attention module to form a novel global attention residual error network;

s5: deepening the number of network layers through a global attention residual error network, refining feature extraction, sending an output result into a Softmax classifier through global pooling for final classification, and outputting the result.

Further, the multi-scale feature extraction process in step S2 is as follows:

a1: performing multi-scale feature extraction, and performing convolution operation on the hyperspectral image by using convolution kernels with different sizes;

a2: and carrying out concatate splicing operation on the convolved feature map.

Further, the operation mechanism of the spatial attention module in step S3 includes operations of pressing and activating, specifically:

firstly, the number of feature maps of an original hyperspectral cube is increased through convolution, activation and batch standardization, more representative features of a hyperspectral image can be extracted by reducing the size of a convolution kernel, and extrusion operation is completed; then, convolution is carried out again, the activation function is changed into Sigmoid, and activation operation is completed; performing pixel level dot product operation on the feature map q after convolution and the preamble original block, and distributing the obtained weight values to respective spatial positions of all pixels; the formula for the last convolution operation and weight value acquisition:

wherein the content of the first and second substances,

wherein W is convolution kernel, and convolution bias value b is initialized to 0, not participating in convolution operation, F_sqShowing the course of extrusion, F_exRepresenting the process of activation, U being the original signature, q representing the combination of linear activations of the signature U at each spatial position along the spectral channel, σ being the Sigmoid activation function,

is the pixel level corresponding dot product.

Further, the operation mechanism of the spectrum attention module in step S3 is as follows: the spectrum attention module obtains global information through matrix multiplication of original data and the convolved features, weight values among pixel points are redistributed, and finally the weight values are distributed to the pixel points through channel-wise dependency relationship conversion and feature fusion.

Further, the operation process of the spectral attention module in step S3 is as follows:

firstly, input HIS Cube is sent into a convolution neural network through convolution, the number of channels is compressed from C to 1, then the spatial dimension is fused with high and wide through Reshape, and then the spatial dimension and the spectral dimension are interchanged to complete transposition operation, namely the feature size is changed from (h, w,1) to (1, h) w; next, combining the height and width of the original HIS Cube, namely changing the feature size from (h, w, c) to (h × w, c), wherein the feature shape after convolution is (1, h × w), after the activation operation of Softmax, the feature shape after convolution is subjected to matrix multiplication operation with the shape of the transformed HIS Cube to (h × w, c), so as to obtain a feature shape with a shape of (1,1, c), and up to this point, both the spatial dimensions h and c are compressed to 1, and only the spectral dimension c remains;

from fully connected CxC to 2 xC/r of the convolution network, r is a scaling coefficient, and C/r represents the hidden feature dimension of the bottleneck transition.

Further, in the step S3, the spatial attention module and the spectral attention module are combined to obtain a spatial-spectral joint global attention network mechanism, where the formula is as follows:

U_{Spe_Spa}＝λ*U_Spe+(1-λ)*U_Spa

where λ is a trainable parameter.

Further, the residual error network modules improved in the step S4 are divided into a starting residual error network, a middle residual error network and an ending residual error network.

Further, the learning process of the spatio-spectral joint global attention network comprises the following sub-steps:

b1: the space attention module is used for extracting features from shallow to deep through three times of convolution operation; adding BN and ReLU to the first convolution to form a first layer of feature extraction network;

b2: adding BN and ReLU to the second convolution to form a second layer of feature extraction network;

b3: performing convolution for the third time, wherein sigmoid is adopted by an activation function to perform weight distribution;

b4: adding pixel points of the feature graph obtained by the third convolution and the feature graph input into the space attention module in a pixel point level manner, and distributing weighted values to all space pixel points;

b5: for convolution of spectral modules, the input Cube is first convolved

Sending the signal into a convolutional neural network, and compressing the channel number from C to 1;

b6: and performing Reshape operation again, and exchanging the height and width in the spatial dimension with the spectral dimension to finish transposition operation:

b7: for the original Cube

The height and the width are combined to become

B8: after the feature diagram Y ' after the convolution is subjected to Softmax operation, matrix multiplication operation is carried out on the feature diagram Y ' and the X ' to obtain the feature diagram with the shape of

Compressing the space dimensions h and w to be 1, and only remaining the spectrum dimension c to obtain a global attention weight value of the spectrum dimension;

b9: and finally, combining the space attention module and the spectrum attention module to obtain a weighted value of the pixel point space-spectrum combined global attention.

Further, the learning process of the improved residual error network module comprises the following sub-steps:

c1: firstly, after an initial residual error module and BN and ReLU operation, the convolution operation is carried out again, and then the characteristic connection is directly carried out with an input characteristic diagram only after the BN operation;

c2: through a middle residual module, activating the sparse network through BN and ReLU, performing convolution layer twice again, and adding the convolution layer after the convolution layer passes through the BN layer again and the input characteristic layer;

c3: and finally, the final residual error module is used for firstly obtaining BN layer sparse network parameters and then passing through two convolutional layers, and is different from the middle residual error module in that the final residual error block is firstly subjected to characteristic connection and then subjected to BN and ReLU operation to obtain final output.

According to the method, firstly, in order to solve the defect that a neural network is insensitive to fine-grained capture of an image due to the fact that the size of a sensing field is fixed, convolution preprocessing is conducted on original image blocks through convolution kernels with different numbers of sizes of 1 x 1,3 x 3 and 5 x 5 respectively, then feature graphs formed after convolution are spliced through Concatenate operation, the number of the convolution kernels represents the number of the feature graphs, ground object features with different sizes can be extracted through multi-scale convolution fusion, and limitation of the size of a single convolution kernel on a space visual field is effectively made up.

And secondly, sequentially sending the fused feature maps into GR (GSSAT _ ResBlock) modules, wherein each GR module consists of a residual network and a global attention module, the residual network is divided into a Starting residual (Starting ResBlock), a Middle residual (Middle ResBlock) and an ending residual (ending ResBlock) according to different positions, and the Starting residual (Starting ResBlock), the Middle residual (Middle ResBlock) and the ending residual (ending ResBlock) are respectively embedded into GR1, GR2 and GR3 modules. The method aims to fully utilize the abstract features extracted by the preamble network to perform feature fusion to enhance the distinguishing performance between pixel points and relieve the problem of gradient dispersion encountered by the neural network in the back propagation. The global attention module (GSSAT _ Block) is formed by combining a space attention (Spa _ AT) module and a spectrum attention (Spe _ AT) module according to a proportionality coefficient lambda, and through weight assignment and feature fusion, the discrimination capability of a network on edge pixels is improved, and the overall classification precision is improved. And finally, the feature map extracted by the Global attention residual error network is sent into a Softmax classifier after Global pooling (Global pooling) operation to obtain a final classification result.

Has the advantages that: compared with the prior art, the method has the advantages that rich space-spectrum characteristics are obtained simultaneously by introducing the multi-scale large and small receptive fields and the global attention module, the problem of gradient disappearance is relieved by adding the improved residual error network, and network convergence is accelerated, so that the classification precision is improved, and a good and stable classification effect is ensured.

Drawings

FIG. 1 is a diagram of a network model in which the present invention is implemented;

FIG. 2 is a model diagram of a spatial attention module;

FIG. 3 is a model diagram of a spectral attention module;

FIG. 4 is a model diagram of a spatio-spectral combined attention module;

FIG. 5 is a diagram of an improved residual network model;

FIG. 6 is a diagram of the results of Indanpins algorithm classification, wherein (a) is a diagram of the results of real features, (b) is a diagram of the results of DFFTCN classification, (c) is a diagram of the results of HybirdsN model classification, (d) is a diagram of the results of SATN model classification, (e) is a diagram of the results of SSRN model classification, and (f) is a diagram of the GSSARN model classification according to the present invention;

fig. 7 is a diagram of classification results of University of pavia algorithm, wherein (a) is a diagram of true feature results, (b) is a diagram of DFFTCN classification results, (c) is a diagram of HybirdSN model classification results, (d) is a diagram of SATN model classification results, (e) is a diagram of SSRN model classification results, and (f) is a diagram of GSSARN model classification according to the present invention.

FIG. 8 is a Salinas algorithm classification result diagram, wherein (a) is a real feature result diagram, (b) is a DFFTCN classification result diagram, (c) is a Hybirdsn model classification result diagram, (d) is an SATN model classification result diagram, (e) is an SSRN model classification result diagram, and (f) is a GSSARN model classification diagram of the present invention.

Detailed Description

The present invention is further illustrated by the following figures and specific examples, which are to be understood as illustrative only and not as limiting the scope of the invention, which is to be given the full breadth of the appended claims and any and all equivalent modifications thereof which may occur to those skilled in the art upon reading the present specification.

The invention provides a hyperspectral image classification method based on a global attention residual error network, which comprises the steps of firstly constructing an integral network, wherein the integral network comprises a multi-scale feature extraction network, a global attention module and an improved residual error network module, and is specifically shown in figure 1; the multi-scale feature extraction network extracts the hierarchical features of the hyperspectral image through convolution kernel sizes of 3 different sizes; and then, the global attention module constructs the spatial and spectral dependency relationship of the global pixel points through the combination of the spatial attention module and the spectral attention module, and then refines the feature extraction through the combination of the global attention module and the improved residual error network and deepens the network layer number.

Based on the above method, the embodiment applies the above method to hyperspectral image classification, and the specific process is as follows:

step 1: all parameters in the original network are initialized to satisfy a gaussian distribution with a mean of 0 and a variance of 0.1.

Step 2: raw hyperspectral image

Wherein h, w, c are height, width and spectral dimension of the hyperspectral data, respectively.

And step 3: preprocessing original hyperspectral data, and filling surrounding pixel points with a current pixel as a center to pack the hyperspectral data into HIS Cube with a fixed size.

And 4, step 4: and performing multi-scale feature extraction to obtain different sizes of ground feature features of the hyperspectral remote sensing image, and combining convolution kernels with different sizes to more comprehensively master the coarse grain features and the detail features of the image.

The specific learning process of the multi-scale feature extraction network comprises the following substeps:

step 4.1: and (3) performing multi-scale feature extraction, performing convolution operation on the hyperspectral images by using convolution kernels of three different sizes (1 × 1), (3 × 3) and (5 × 5), wherein the number of corresponding convolution kernels is 64, 32 and 32 respectively.

Step 4.. 2: and performing concatate splicing operation on the convolved feature map, wherein the number of channels of the feature map is 128.

And 5: the global attention network combining the space attention and the spectrum attention can deepen feature extraction, enhance the discrimination among pixel points and improve the classification effect of hyperspectral marginal pixels.

Step 6: by combining the improved residual error network and the global attention module, the problem of gradient dispersion in the training process is effectively solved, and network convergence is accelerated.

And 7: the residual error network and the global attention module are fused to form a novel global attention residual error network, and the residual error network is divided into a starting residual error network, a middle residual error network and an ending residual error network according to the difference of the positions of the residual error network in the whole network.

And 8: and (4) outputting a result through superposition combination of three global attention residual error networks, sending the result into a Softmax classifier through global pooling for final classification, and outputting the result.

The spatial attention module, the spectral attention module, the spatio-spectral attention module and the improved residual error network are explained in detail in turn as follows:

1. space attention module

Fig. 2 is a schematic diagram of a spatial attention module that performs a squeeze-activated like operation. Firstly, the number of feature maps (feature maps) is increased by an original HyperSpectral Cube (HIS Cube) through two layers of convolution, activation and Batch Normalization (BN), the convolution kernel size of the convolution kernel of the previous layer is 3 x 3, the convolution kernel size of the convolution kernel of the second layer is 1 x 1, more representative features of a HyperSpectral Image can be extracted by reducing the size of the convolution kernel, and the step finishes the squeezing operation in the similar squeezing-activation, so that the subsequent weight distribution and weight assignment are facilitated. And then convolution is carried out again, the activation function is changed into Sigmoid, and since the Sigmoid function can map any real number to the (0,1) interval, the function of good weight distribution is achieved, and the extrusion-activation operation is completed.

And performing pixel level dot product operation on the convolved feature map q and the preamble original block, and distributing the obtained weight values to respective spatial positions of all pixels. So far, the formula for the last convolution operation and weight value acquisition:

wherein the content of the first and second substances,

w is convolution kernel, and convolution offset value b is initialized to 0, does not participate in convolution operation, F_sqShowing the course of extrusion, F_exRepresenting the activated process, U is the original characteristic diagram, q represents the characteristic diagramU linearly activates the combination at each spatial position along the spectral channel, σ is the Sigmoid activation function,

is the pixel level corresponding dot product.

2. Spectral attention module

Fig. 3 is a model diagram of a spectral attention module, which is similar to the spatial attention module, and the spectral attention module needs to compress the spatial dimension to improve the feature selection capability in the spectral dimension. The embodiment designs a novel context semantic model, so that the network integrates global information into a feature map while having the characteristic of light weight, and obtains the mutual dependency relationship among all geographic position pixel points of image information, thereby improving the global view of the network.

Firstly, input HIS Cube is sent into a convolution neural network with a convolution kernel size of 1 x 1 through convolution, the number of channels is compressed from C to 1, then the space dimension is fused with the height (h) and the width (d) through Reshape, and then the transformation (Transpose) operation is completed through the exchange with the spectrum dimension, namely the feature size is changed from (h, w,1) to (1, h x w); and then, combining the height (h) and the width (d) of the original HIS Cube, namely, changing the size of the feature map from (h, w, c) to (h, w, c), wherein the shape of the feature map after convolution is that (1, h, w) is subjected to Softmax activation operation and then is subjected to matrix multiplication operation with the shape of the HIS Cube after transformation to be (h, w, c), so as to obtain the feature map with the shape of (1,1, c), and thus, the spatial dimensions h and c are both compressed to be 1, and only the spectral dimension c is left.

Next, a bottleneck transformation module, which is usually implemented by adding two layers of full connections [19], however, the full connection operation can greatly increase the parameters of the network, increase the training difficulty and convergence difficulty while the network redundancy, and does not meet the goal of lightweight network. For this purpose, a two-layer 1 × 1 convolutional neural network is designed to replace the fully-connected operation of the complex redundancy, and the parameter number of the network can be significantly reduced through convolution from fully-connected C × C to 2 × C/r of the convolutional network, wherein r is a proportionality coefficient, and C/r represents the hidden feature dimension of the bottleneck transformation. In the embodiment, r is set to be 16, and because the convolution operation of two layers makes the convergence of the neural network difficult, the model adds Layer Normalization (LN) and a ReLU activation function between the two layers of convolution to help the network accelerate the convergence.

In a word, the spectrum attention module obtains global information by multiplying the original data by the matrix of the convolved features, and redistributes weight values among the pixel points; and finally, distributing the weighted values to all pixel points through channel-wise dependency relationship conversion and feature fusion so as to improve the classification performance of the network.

3. Empty-spectrum attention module

Fig. 4 is a model diagram of the spatio-spectral attention module, which needs to combine the spatial attention and the spectral attention, and the resulting spatio-spectral attention mechanism is as follows:

U_{Spe_Spa}＝λ*U_Spe+(1-λ)*U_Spa

the lambda is a trainable parameter, and by controlling the change of the lambda, the neural network can learn the spatial channel weight and the spectral channel weight of different proportionality coefficients, and different hyperspectral image data can be flexibly adjusted to obtain the optimal classification effect.

The specific learning process of the space-spectrum joint global attention network comprises the following sub-steps:

step 5.1: and the space attention module is used for extracting the features from shallow to deep through three times of convolution operation. The number of the first convolution kernels is 32, the size of the convolution kernels is (5,5), and BN and ReLU are added to form a first-layer feature extraction network.

Step 5.2: the number of the second convolution kernels is 64, the size of the convolution kernels is (3,3), and BN and ReLU are added to form a second layer of feature extraction network.

Step 5.3: the number of convolution kernels in the third time is 128, the size of the convolution kernels is (1,1), sigmoid is adopted by the activation function, and weight distribution is carried out.

Step 5.4: and performing pixel point level addition on the feature graph obtained by the third convolution and the feature graph input into the space attention module, and distributing the weight values to all space pixel points.

Step 5.5: for the convolution of the spectral module, first, theOf input Cube

And (3) sending the signal into a convolutional neural network with a convolution kernel size of 1 x 1, and compressing the number of channels from C to 1.

Step 5.6: and performing Reshape operation again, fusing the height (h) and the width (w) in the spatial dimension, and then interchanging the height (h) and the width (w) with the spectral dimension to complete transposition operation:

step 5.7: for the original Cube

The height (h) and the width (w) are multiplied and combined to become

Step 5.8: after the feature diagram Y ' after the convolution is subjected to Softmax operation, matrix multiplication operation is carried out on the feature diagram Y ' and the X ' to obtain the feature diagram with the shape of

By this time, the spatial dimensions h and w are compressed to 1, and only the spectral dimension c is left, so that a global attention weight value of the spectral dimension is obtained.

Step 5.9: and finally, combining the space attention module and the spectrum attention module to obtain a weighted value of the space-spectrum combined global attention of the pixel point.

4. Improved residual error network

The traditional residual error network can relieve the disappearance of network gradients and reduce the occurrence of an overfitting phenomenon, and in order to obtain more robust space-spectrum characteristics, the structure of the traditional residual error network is correspondingly changed by the method, so that the improved residual error network model diagram shown in fig. 5 is obtained. Referring to fig. 5, the residual blocks are divided into a Starting residual block (Starting block), a Middle residual block (Middle block), and an Ending residual block (Ending block) according to positions of the residual blocks before and after the network. The residual error networks at different positions can accelerate the mobility of information, so that the back propagation is more favorable, and the network convergence is accelerated.

The initial residual error network starts to pass through an initial residual error block after the original data is preprocessed by a multi-scale convolution kernel, the initial residual error block firstly passes through a convolution and Batch Normalization (BN) activation function ReLU, a more deep detail characteristic diagram is obtained while the characteristic diagram is reduced, and then the initial residual error network is formed by adding the preprocessed original input after the convolution and batch normalization.

Before passing through the intermediate residual block network, the network already passes through a complete GSSAT-ResBlock, and the network already completes coarse-grained feature extraction through the combination of the initial residual block network and the global attention module, but the depth of the network is still insufficient at this time, and a lot of defects still exist in detail feature extraction. In order to refine the characteristic diagram, the size (kernel size) and the step size of a convolution kernel are changed in an intermediate residual block, different from an initial residual block, an intermediate residual network directly enters a BN layer to perform sparse processing on network parameters, the convergence speed of the network is accelerated, and the characteristic diagram is refined through a set of complete convolution, BN and ReLU operations; finally, the complete intermediate residual block is formed by one convolution and the original input jump and connection addition output.

After the network passes through the first two GSSAT-ResBlock modules, although more refined features have been extracted by the network, as network depth increases, the network starts to become redundant and back propagation becomes difficult. In the ending residual error network, after the output of the preorder network and the branch of the ending residual error network are in jump connection, the network is subjected to one-time BN and Relu operation, and the last sparse network parameter is changed into the positive distribution with the mean value of 0 and the variance of 1, so that the input has a stable distribution, which is more beneficial to the back propagation of the parameters, and the convergence speed of the network can be greatly improved.

The learning process of the improved residual error network comprises the following sub-steps:

step 6.1: firstly, the number of convolution kernels is 64 and the size of the convolution kernels is (1,1) after an initial residual error module, then, the convolution operation is carried out again after BN and ReLU operations, the number of the convolution kernels is 128 and the size of the convolution kernels is (3,3), and then, the feature connection is directly carried out with an input feature map only after the BN operation.

Step 6.2: and then, after passing through a middle residual module, firstly activating a sparse network through BN and ReLU, and then passing through convolution layers twice, wherein the number of convolution kernels is 64 and 128 respectively, the sizes of the convolution kernels are (1,1) and (3,3), and the convolution kernels are added with the input characteristic layer after passing through the BN layer again.

Step 6.3: and finally, the final residual error module is also the sparse network parameter of the BN layer firstly, so that the back propagation is easier, then two layers of convolution layers are used, the number of convolution kernels is 64 and 128, the sizes of the convolution kernels are (1,1) and (3,3), and different from the middle residual error module, the final residual error module is subjected to characteristic connection firstly and then is subjected to BN and ReLU operation to obtain final output.

In order to verify the effect of the method of the present invention, based on the above technical solution, the present embodiment performs a simulation experiment, and specific results and analysis are as follows:

1. experimental images

In this embodiment, the proposed global attention residual network is tested on three reference data sets, i.e. ip (indian pine), pu (university of pavia), sa (salana), to verify the validity and reliability of the GSSRN model.

Indiana pine (IP, IndianPines) data sets are images acquired by infrared imaging spectroscopy (AVIRIS) in the united states at northwest indiana with a spatial dimension of 145 pixels by 145 pixels. The imaging wavelength range of the AVIRIS imaging spectrometer is 0.4-2.5 μm, and the ground object is imaged in 220 continuous wave bands. The remaining 200 bands excluding 20 bands are generally used as the subject of the study. The spatial resolution of the image formed by the spectral imager is about 20m, the total number of pixels containing the ground objects is only 10249, and the pixels contain 16 types of ground objects in total.

The University of Pavea (PU) dataset. The PU data are images of the city of Pavea, North east Italy, by a German airborne reflectance optical Spectroscopic Imaging System (ROSIS-03). The image spatial dimension was 610 pixels by 340 pixels and the imager wavelength was 0.43-0.86 μm, resulting in an image spatial resolution of 13 m. In the experiment, 12 frequency bands due to strong noise and water vapor interference are removed, an image composed of the remaining 103 spectral bands is generally used, and the total number of pixels containing the ground features is only 42776, and the pixels contain 9 types of ground features such as roads, the number of the ground features, roofs and the like.

The Salinas (SA, Salinas) dataset. The SA dataset was taken by an AVIRIS imaging spectrometer. It is an image of the Salinas valley in California, USA. Unlike Indian Pines, its spatial resolution reaches 3.7 m. Similarly, 20 bands interfered by water vapor are eliminated in the experiment, and the remaining 204 bands are reserved for the experiment. The size of the image is 512 pixels by 217 pixels, wherein 54129 pixels are applicable to classification, and the types of the ground objects contained in the pixels are 16 types in total.

TABLE-1-IP dataset, PU dataset, and SA dataset Gray-level map and surface feature information Categories

2. Experimental methods and related parameter settings

The experimental computer is configured with AMD Ryzen 53600X as CPU, NVIDIA GeForce RTX 2060Super as GPU, Python as programming language, Jupyter notewood as environment and Keras as deep learning model frame.

In the aspect of evaluating indexes, 3 indexes of total precision (OA), average precision (AA) and Kappa coefficient (KA) are selected. All data were averaged over 10 experimental data.

In the division of the training set and the test set, 10%, 5% and 1% of samples are randomly selected from the IP data set, the PU data set and the SA data set as training samples, and the remaining 90%, 95% and 99% of samples are used as test samples. In the experiment of 3 data sets, the Batch size is set to be 32, the back propagation algorithm adopts Adam (Adaptive motion estimation, Adam), the initial learning rate is 0.0001, the iteration times are set to be 300 times, L2 regularization is added in all convolution operations, and the overfitting of the network is reduced by adopting a Dropout random discarding mode at the end.

3. Comparison of Experimental results

TABLE-2-IP data set test results

TABLE-3 PU data set test results

table-4-SA data set test results

In the comparison experiment of the IP data set, 10% of training samples are randomly selected, the rest 90% of training samples are used as test samples, the classification conditions of different models in the IP data set are shown in fig. 6 and table 2, and it can be seen that the SATN has the worst classification effect and has a large amount of noise. Compared with the other 4 models, the accuracy rate is over 98%, HybirdSN well combines space and spectrum characteristics through 3D convolution, DFFTN utilizes a convolution network with bilateral fusion to achieve detailed classification of hyperspectral images, the GSSARN model provided by the invention obtains the highest classification accuracy, and extracts more hyperspectral image refining characteristics through global attention combined with residual error network splicing combination, so that the classification effect on edge pixel points is better.

In comparison experiments carried out on PU and SA data, 5% and 2% of samples are randomly selected as training samples respectively, the rest 95% and 98% of samples are used as testing samples, and tables 3, 7, 4 and 8 show the classification results and the visual effect graphs of the models on PU and SA data sets respectively.

Claims

1. A hyperspectral image classification method based on a global attention residual error network is characterized by comprising the following steps:

s5: deepening the network layer number through a global attention residual error network, refining feature extraction, sending output results into a classifier through global pooling for final classification, and outputting the results.

2. The method for classifying hyperspectral images based on a global attention residual error network according to claim 1, wherein the multi-scale feature extraction in the step S2 is as follows:

a2: and carrying out concatate splicing operation on the convolved feature map.

3. The hyperspectral image classification method based on global attention residual error network according to claim 1, wherein the operation mechanism of the spatial attention module in the step S3 comprises operations of squeezing and activating, specifically:

wherein the content of the first and second substances,

wherein W is convolution kernel, and convolution bias value b is initialized to 0, not participating in convolution operation, F_sqShowing the course of extrusion, F_exRepresenting the activation process, U is the original feature map, q represents the feature map U at each spatial position along the spectral channelLinear activation combination, sigma is Sigmoid activation function,

is the pixel level corresponding dot product.

4. The hyperspectral image classification method based on global attention residual error network according to claim 1, wherein the operation mechanism of the spectral attention module in the step S3 is as follows: the spectrum attention module obtains global information through matrix multiplication of original data and the convolved features, weight values among pixel points are redistributed, and finally the weight values are distributed to the pixel points through channel-wise dependency relationship conversion and feature fusion.

5. The hyperspectral image classification method based on the global attention residual error network according to claim 4, wherein the operation process of the spectral attention module in the step S3 is as follows:

6. The hyperspectral image classification method based on global attention residual error network according to claim 1 is characterized in that in the step S3, the spatial attention module and the spectral attention module are combined to obtain a spatial-spectral joint global attention network mechanism, and the formula is as follows:

U_{Spe_Spa}＝λ*U_Spe+(1-λ)*U_Spa

where λ is a trainable parameter.

7. The hyperspectral image classification method based on global attention residual error network according to claim 1 is characterized in that the residual error network modules improved in the step S4 are divided into a starting residual error network, a middle residual error network and an ending residual error network.

8. The method for classifying hyperspectral images based on a global attention residual error network according to claim 6, wherein the learning process of the spatio-spectral joint global attention network comprises the following sub-steps:

b5: for convolution of spectral modules, the input Cube is first convolved

B7：for the original

The height and the width are combined to become

9. The hyperspectral image classification method based on global attention residual error network according to claim 7 is characterized in that the learning process of the improved residual error network module comprises the following sub-steps: