CN115019178A

CN115019178A - Hyperspectral image classification method based on large kernel convolution attention

Info

Publication number: CN115019178A
Application number: CN202210883826.1A
Authority: CN
Inventors: 孙根云; 王凯; 陈勇; 董震
Original assignee: Qingdao Xingke Ruisheng Information Technology Co ltd
Current assignee: Qingdao Xingke Ruisheng Information Technology Co ltd
Priority date: 2022-07-26
Filing date: 2022-07-26
Publication date: 2022-09-06

Abstract

The invention relates to the technical field of hyperspectral remote sensing images, in particular to a hyperspectral image classification method based on large-kernel convolution attention, which comprises the steps of model parameter setting, hyperspectral image preprocessing and data set manufacturing, model pre-training, spatial spectral feature combined mining and processing and classification based on spatial-spectral combined features.

Description

Hyperspectral image classification method based on large kernel convolution attention

Technical Field

The invention relates to the technical field of hyperspectral remote sensing images, in particular to a hyperspectral image classification method based on large kernel convolution attention.

Background

With the development of sensor technology, the hyperspectral remote sensing images play an important role in the fields of environment monitoring, national and local resource survey and evaluation, urban planning and the like, in the process, the classification precision of the remote sensing images directly determines the effective degree of the application of the hyperspectral remote sensing images, although the abundant spectral characteristics of the hyperspectral remote sensing images can more accurately describe ground coverage information, the ground surface coverage is complex, the phenomenon of spectral isomerism is serious, and the high-precision image classification is difficult to realize only by utilizing the spectral characteristics.

The spatial characteristics of hyperspectrum can play an important role in influencing classification, so that how to mine high-quality spatial and spectral characteristics from hyperspectral images and realize high-precision image classification becomes a key problem in hyperspectral image application.

MAP, 2DSSA and superpixel segmentation are common hyperspectral spatial feature extraction methods, good results are obtained in hyperspectral image classification, hyperspectral space and spectral features can complement each other, algorithms for spatial spectral feature combined processing are continuously developed, the algorithm combining spatial and spectral information mainly takes adjacent pixel information extracted by spatial operators such as Gabor filtering and Markov Random Fields (MRFs) as a certain feature channel, and then classification is carried out by using a spectral feature processing method, and the mode of separating spectral spatial information utilizes hyperspectral space information but does not accord with the characteristics of hyperspectral space spectrum integration. Later, in order to solve the problem, researchers put forward 3D hyperspectral image processing methods such as 3D scattered wavelets and 3DGabor, and effective improvement of spatial spectrum information mining capability is achieved. However, these conventional methods extract shallow features, and it is difficult to mine and utilize deep features inherent to hyperspectrum.

The deep Convolutional Neural Network (CNN) can utilize the unique local connection to gracefully integrate the spectral features with the spatial context information from the HSIs data and simultaneously mine the deep hyperspectral features. Many researchers utilize CNN to excavate the spectral space information of HSIs, can divide into two kinds of structures of double branch and single branch, the double branch has promoted the efficiency that space and spectral feature excavated through different sub-networks, but this method is not conform to the characteristic that the high spectrum space spectrum unifies, this makes it neglect HSI's spectral space associated information inherently, the single branch regards high spectrum influence piece as the input, can effectual excavation space spectrum associated information, however, whether single branch or double branch network treat the characteristic of input equally, it is difficult to extract key space and spectral feature effectively.

Inspired by the human visual system, researchers in the field of computer vision have developed a mechanism of attention. The method can prompt the network to pay attention to key information and compress noise by weighting the characteristics. Due to this characteristic, attention mechanism has also gained wide attention in the hyperspectral field. However, the existing attention network in the hyperspectral field is only different combinations of spectral attention and spatial attention, and still does not conform to the characteristics of hyperspectral image and spatial-spectral unification.

Therefore, a hyperspectral image classification method based on large-kernel convolution attention needs to be designed, hyperspectral space spectrum correlation features are effectively utilized, the blank of the research is solved, a space spectrum weight graph is constructed through the large-kernel convolution attention, weighting processing is carried out on three-dimensional hyperspectral image blocks, a space-spectrum integrated structure of images is effectively protected, and high-quality space spectrum correlation features are finally obtained to obtain a high-precision hyperspectral classification result.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides a hyperspectral image classification method based on large-kernel convolution attention, effectively utilizes hyperspectral space spectrum correlation characteristics, solves the blank of the research, constructs a space spectrum weight map through the large-kernel convolution attention, performs weighting processing on a three-dimensional hyperspectral image block, effectively protects a space-spectrum integrated structure of images, and finally obtains high-quality space-spectrum correlation characteristics to obtain a high-precision hyperspectral classification result.

In order to achieve the above object, the present invention provides a hyperspectral image classification method based on large kernel convolution attention, which comprises the following steps:

s1, setting model parameters:

s1-1: selecting original hyperspectral images HSIs to be subjected to image classification, determining the total wave band number C and the category total number S of the hyperspectral images HSIs, and determining the image segmentation size img _ size and the number of training samples;

s1-2: setting the image size of an input spectrum space attention characteristic extractor according to the data determined in S1-1, determining the number D of VAN processing modules, and determining the times n of image processing by using Large Kernel Attention (LKA) and convolution feed-forward (convolution feed-forward) in each VAN processing module _i Wherein i is more than or equal to 1 and less than or equal to D, and the number of image block channels c processed by each processing module _i (ii) a S1-3: determining a network learning rate, an optimization iteration number and a model optimizer Adam;

s1-4: setting the number of one-dimensional vector elements output by the feature classifier according to the determined category total number in the S1-1;

s2, hyperspectral image preprocessing and data set production:

s2-1: normalizing the hyperspectral image, and adjusting the image radiation value to 0-1;

s2-2: randomly selecting training sample points based on the training sample proportion set in the S1-1, then carrying out image segmentation by taking the sample points as the center and img _ size as the diameter to obtain an image patch, and taking the rest samples as test samples for testing the model precision;

s2-3: in order to improve the stability of the model, sample expansion is carried out by using a method of mirroring, rotating and adding salt and pepper noise;

s3, model pre-training:

s3-1: constructing a network model based on the parameters set by the S1-2;

s3-2: performing model training based on the parameters determined in the step S1-3, and storing the model parameters corresponding to the optimal results as model pre-training weights after the training is completed;

s4, jointly mining and processing spatial spectral features:

s4-1: constructing a network model based on the parameters set in the S1-2, and loading model pre-training weights;

s4-2: processing the image patch by using 2D convolution keeps the space size of the hyperspectral patch unchanged, and the number of channels is changed into c _i ；

S4-3: jointly mining the image patch space and spectrum information by using LKA;

s4-4: fusing information of image patch space and spectrum dimension by using CFF;

s4-5: back to S3 until the number of iterations equals ni;

s4-6: image data normalization is carried out by utilizing layer normalization so as to inhibit overfitting caused by small data scale after the image is processed;

s4-7: returning to S4-2 until the iteration number is equal to the number of VAN processing modules, and outputting the spatial spectrum joint characteristics;

s5, classification based on space spectrum joint features:

s5-1: performing information compression on the space spectrum correlation characteristics by utilizing global average pooling, and outputting a one-dimensional vector T;

s5-2: and transforming the size of the T by using MLP to obtain a one-dimensional vector class, wherein index corresponding to the largest element in the class is the ground object type corresponding to the central pixel of the image patch.

S4-3 is specifically:

s4-3-1: the characteristic map obtained by the processing of step S4-2

w x w represents the spatial size of the feature map, c _i Representing the number of channels of the feature map;

s4-3-2: preprocessing the feature map by using 1 × 1 convolution and GELU activation function to obtain the feature mapB, the process can be represented as: b ═ GELU (Conv) _1×1 (A))；

S4-3-3: conveying B to LKA to finish spatial spectrum attention operation, realizing large kernel convolution operation step by utilizing depth convolution, depth cavity convolution and dot product to obtain three-dimensional spatial spectrum weight graph W _spe-spa The process can be expressed as: w _spe-spa ＝Con _1×1 (DW _d (DW(B)))，W _spe-spa Each weight in the convolution kernel is independent and related to all pixels in the large convolution kernel receptive field;

s4-3-4: b and W are _spe-spa The new feature map C after weighting is obtained by multiplying the corresponding elements, and the process can be expressed as:

representing corresponding element multiplication, which reinforces spectral blocks that provide more information while suppressing unnecessary spectral blocks;

s4-3-5: in order to further mine the spectrum and space information, 1 × 1 convolution processing is carried out to complete the weighted fusion of the space and spectrum information;

s4-3-6: and residual error connection is carried out to obtain LKA output D: d is A + Con _1×1 (C)。

S4-4 is specifically:

different from the attention-weighted spatial spectral feature enhancement mode of LKA, CFF self-adaptively mines significant spatial and spectral features by utilizing convolution combination;

s4-4-1: performing channel expansion on the feature map D by using 1 × 1 convolution;

s4-4-2: performing channel feature enhancement by using deep convolution, and processing the introduced nonlinear features by using a GELU activation function;

s4-4-3: and finally, utilizing 1 × 1 convolution to realize channel number restoration and fusion of spatial and spectral characteristics, wherein the process can be expressed as:

E＝Con _1×1 (GELU(DW(Conv _1×1 (D))))

wherein

Conv _1×1 (g) Is a 2D convolution with convolution kernel size of 1, DW (g) is a deep convolution with convolution kernel size of 3 × 3, padding of 1, and group of c _i 。

Compared with the prior art, the method has the advantages that based on the characteristic that the large convolution kernel attention network accords with the space-spectrum unification of the hyperspectral images, the complete hyperspectral image block is used as input, the feature classifier which is simple and effective in design perfectly utilizes the feature extraction result, and the space and spectrum features in the image are jointly excavated and extracted, so that the problems of model overfitting, low classification precision and the like caused by insufficient excavation of the hyperspectral remote sensing image features are solved.

Drawings

Fig. 1 is a general structural view of the present invention.

Fig. 2 is a schematic diagram of a flow of jointly mining HSI spatial and spectral information by LKA according to the present invention.

FIG. 3 is a schematic diagram of CFF fusion and mining of HSI spatial and spectral information in accordance with the present invention.

FIG. 4 is a diagram illustrating the number of training sets and validation sets for each category according to an embodiment of the present invention.

FIG. 5 is a diagram illustrating an optimal performance of a classification algorithm according to an embodiment of the present invention.

Detailed Description

The invention will now be further described with reference to the accompanying drawings.

As shown in fig. 1 to 5, the present invention provides a hyperspectral image classification method based on large kernel convolution attention, which includes the following steps:

s1, setting model parameters:

s1-2: setting the size of an image input into the spectral space attention feature extractor according to the data determined in S1-1, determining the number D of VAN processing modules, and determining the number of VAN processing modulesNumber of times n of image processing by LKA and CFF _i Wherein i is more than or equal to 1 and less than or equal to D, and the number of image block channels c processed by each processing module _i ；

S1-3: determining a network learning rate, an optimization iteration number and a model optimizer Adam;

s2, hyperspectral image preprocessing and data set production:

s3, model pre-training:

s3-1: constructing a network model based on the parameters set by the S1-2;

s4, jointly mining and processing spatial spectral features:

s4-5: back to S3 until the number of iterations equals ni;

s5, classification based on space spectrum joint features:

S4-3 is specifically:

s4-3-1: the characteristic map obtained by the processing of step S4-2

s4-3-2: the feature map is preprocessed by using 1 × 1 convolution and the GELU activation function to obtain a feature map B, and the process can be expressed as: b ═ GELU (Conv) _1×1 (A))；

representing multiplication of corresponding elements, which process intensifies the provision of more informative spectral blocks, as wellUnnecessary spectral blocks are suppressed;

S4-4 is specifically:

E＝Con _1×1 (GELU(DW(Conv _1×1 (D))))

wherein

Example (b):

the method is suitable for processing the hyperspectral remote sensing images of Indian Pine areas acquired by an AVIRIS (aircraft Visible Imaging spectrometer) sensor. Raw HSI data the database contains 145 x 145 pixels with a resolution of 20m per pixel and a wavelength range of 400-2500nm, containing 220 spectral bands. After removing 20 noise and water absorption bands from the entire data set, the spectral bands were reduced to 200, which became a 145 × 145 × 200 data cube. The surface feature types include 16 identifiable surface features (mainly different crops). The data set contains 10366 tags in total, with the remaining unlabeled portions being considered as background.

As shown in fig. 1, 2 and 3, the present invention provides a hyperspectral image classification method based on large kernel convolution attention, which includes the following steps:

step 1, setting model parameters:

1) selecting original hyper-spectral images HSIs to be subjected to image classification, determining the total wave band number C of the hyper-spectral images HSIs as 200 and the category total number S as 16, determining the image segmentation size img _ size as 13 multiplied by 13, and determining the number of training samples and test samples as shown in fig. 4.

2) Setting the size of an image input into a spectral space attention feature extractor according to the data determined in 1), determining the number of VAN processing modules D to be 4, and determining the times of image processing by using LKA and CFF in each VAN processing module:

n _i i is not less than 1 and not more than D, wherein i is 4, n ₁ ＝3,n ₂ ＝4,n ₃ ＝6,n ₄ ＝3

And the number of image block channels processed by each processing module is as follows:

c _i where i is 4, c ₁ ＝64,c ₁ ＝128,c ₁ ＝256,c ₁ ＝512

3) Determining a network learning rate of 0.005, an optimization iteration number of 50 and a model optimizer Adam;

4) and setting the number S of the one-dimensional vector elements output by the feature classifier as 16 according to the number of the categories determined in the step 1).

Step 2, hyperspectral image preprocessing and data set production:

1) normalizing the highlight image, and adjusting the image radiation value to 0-1;

2) randomly selecting training sample points based on the training sample proportion set in the step one, then carrying out image segmentation by taking the sample points as the center and img _ size as the diameter to obtain an image patch, and taking the rest samples as test samples for testing the model precision;

3) in order to improve the stability of the model, the sample is expanded by using methods such as mirroring, rotation, adding salt and pepper noise and the like.

Step 3, model pre-training:

1) constructing a network model based on the parameters set in the step 1) 2);

2) and (3) carrying out model training based on the parameters determined in the step (1) and storing the model parameters corresponding to the optimal results as model pre-training weights after the training is finished.

Step 4, jointly mining and processing spatial spectral characteristics:

1) constructing a network model based on the parameters set in the step 1) 2), and loading model pre-training weights;

2) processing the image patch by using 2D convolution keeps the space size of the hyperspectral patch unchanged, and the number of channels is changed into c _i (ii) a Obtaining a feature map

Where w x w represents the spatial size of the feature map 13 x 13, c _i The number of channels of the signature is indicated.

3) The LKA is utilized to jointly mine the image patch space and spectrum information, and the specific operation is as follows:

first, preprocessing the feature map by using 1 × 1 convolution and the GELU activation function to obtain a feature map B, which can be expressed as:

B＝GELU(Conv _1×1 (A))

thereafter, B is sent to LKA to complete spatial spectroscopy attention.

The large-kernel convolution parameter is high, so that the large-kernel convolution operation is realized step by utilizing the depth convolution, the depth void convolution and the dot product, and the three-dimensional space spectrum weight map W is obtained _spe-spa The process can be expressed as:

W _spe-spa ＝Con _1×1 (DW _d (DW(B)))

W _spe-spa each weight in (1) is independent of but related to all pixels in the large kernel convolution field. Then B and W are _spe-spa And multiplying corresponding elements to obtain a new feature graph C after weighting, wherein the process can be represented as follows:

wherein the content of the first and second substances,

representing the multiplication of the corresponding elements, C focuses on spectral blocks that provide more information while suppressing unnecessary spectral blocks. In order to further mine the spectral and spatial information, 1 × 1 convolution processing is performed to complete weighted fusion of the spatial and spectral information. And finally, carrying out residual error connection to obtain LKA output D:

D＝A+Con _1×1 (C)

4) fusing information of image patch space and spectrum dimension by using CFF; the specific operation is as follows:

firstly, performing channel expansion on a feature map D by using 1 × 1 convolution, then performing channel feature enhancement by using deep convolution, processing the introduced nonlinear features by using a GELU activation function, and finally, realizing channel number restoration and fusion of spatial and spectral features by using 1 × 1 convolution, wherein the process can be expressed as follows:

E＝Con _1×1 (GELU(DW(Conv _1×1 (D))))

wherein

Conv _1×1 (g) Is a 2D convolution with convolution kernel size of 1, DW (g) represents a deep convolution with convolution kernel size of 3 × 3, padding of 1, and group of c _i 。

5) Returning to the step 3) until the iteration number is equal to n _i ；

6) Image data normalization is carried out by utilizing layer normalization so as to inhibit overfitting caused by small data scale after the image is processed;

7) returning to the step 2) until the iteration number is equal to the VAN processing module number D, and outputting the air-drop spectrum joint characteristic. In this case, it is considered that the feature can characterize the empty spectrum characteristics of the deep layer and contribute to obtaining excellent classification results.

And 5, classification based on the spatial spectrum joint characteristics:

1) performing information compression on the space spectrum correlation characteristics by utilizing global average pooling, and outputting a one-dimensional vector T;

2) and transforming the size of the T by using MLP to obtain a one-dimensional vector class, wherein index corresponding to the largest element in the class is the ground object type corresponding to the central pixel of the image patch.

In order to verify the performance of the invention, the experiment quantitatively compares the invention with a mainstream hyperspectral classification algorithm. Fig. 5 shows the classification accuracy, the Overall Accuracy (OA), the Average Accuracy (AA) and the kappa coefficient of various types of ground objects of the method and four hyperspectral algorithms 3DCNN, SSRN (spectral-spatial residual Network), DBMA (Double-Branch multi-orientation Mechanism Network) and DBDA (Double-Branch Dual-orientation Mechanism Network), which shows that the classification algorithm provided by the present invention performs optimally.

The above is only a preferred embodiment of the present invention, and is only used to help understand the method and the core idea of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

The invention integrally solves the problems of model overfitting and low classification precision caused by insufficient excavation of high-spectrum remote sensing image characteristics in the prior art and the fact that the existing attention network is not in accordance with the empty-spectrum-combined structure of the high-spectrum images, constructs an empty-spectrum weight graph through large convolution kernel attention, and performs weighting processing on a three-dimensional high-spectrum image block, thereby effectively protecting the empty-spectrum-combined structure of the images, effectively utilizing the high-spectrum empty-spectrum correlation characteristics and finally obtaining the high-quality empty-spectrum correlation characteristics to obtain the high-precision high-spectrum classification result.

Claims

1. A hyperspectral image classification method based on large kernel convolution attention is characterized by comprising the following steps:

s1, setting model parameters:

s1-2: setting the image size of an input spectrum space attention characteristic extractor according to the data determined in the S1-1, determining the number D of VAN processing modules, and determining the times n of image processing by using Large Kernel Attention (LKA) and convolution feed-forward (convolution feed-forward) in each VAN processing module _i Wherein i is more than or equal to 1 and less than or equal to D, and the number of image block channels c processed by each processing module _i ；

s2, hyperspectral image preprocessing and data set production:

s3, model pre-training:

s3-1: constructing a network model based on the parameters set by the S1-2;

s3-2: performing model training based on the parameters determined in the step S1-3, and storing model parameters corresponding to the optimal results as model pre-training weights after the training is completed;

s4, jointly mining and processing spatial spectral features:

s4-5: returning to said S3 until the number of iterations equals ni;

s4-7: returning to the S4-2 until the iteration number is equal to the number of VAN processing modules, and outputting the spatial spectrum joint characteristics;

s5, classification based on space spectrum joint features:

2. The hyperspectral image classification method based on large-kernel convolution attention according to claim 1, wherein the S4-3 is specifically:

s4-3-1: the characteristic diagram obtained by the processing of the step S4-2

S4-3-3: conveying the B to LKA to finish spatial spectrum attention operation, realizing large kernel convolution operation step by utilizing depth convolution, depth cavity convolution and dot product to obtain a three-dimensional spatial spectrum weight graph W _spe-spa The process can be represented as: w _spe-spa ＝Con _1×1 (DW _d (DW (B))), said W _spe-spa Each weight in the convolution kernel exists independently and is matched with all pixels in the large convolution kernel receptive field(ii) related;

the above-mentioned

3. The hyperspectral image classification method based on large-kernel convolution attention according to claim 1, wherein the S4-4 is specifically:

E＝Con _1×1 (GELU(DW(Conv _1×1 (D))))

wherein