CN116977723A - Hyperspectral image classification method based on space-spectrum hybrid self-attention mechanism - Google Patents
Hyperspectral image classification method based on space-spectrum hybrid self-attention mechanism Download PDFInfo
- Publication number
- CN116977723A CN116977723A CN202310902900.4A CN202310902900A CN116977723A CN 116977723 A CN116977723 A CN 116977723A CN 202310902900 A CN202310902900 A CN 202310902900A CN 116977723 A CN116977723 A CN 116977723A
- Authority
- CN
- China
- Prior art keywords
- hyperspectral
- layer
- attention
- self
- block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000001228 spectrum Methods 0.000 title claims abstract description 48
- 230000007246 mechanism Effects 0.000 title claims abstract description 36
- 238000012549 training Methods 0.000 claims abstract description 26
- 238000012360 testing method Methods 0.000 claims abstract description 15
- 238000012795 verification Methods 0.000 claims abstract description 11
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000000605 extraction Methods 0.000 claims description 34
- 238000010606 normalization Methods 0.000 claims description 28
- 238000010586 diagram Methods 0.000 claims description 25
- 230000003595 spectral effect Effects 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 20
- 230000004913 activation Effects 0.000 claims description 18
- 230000008447 perception Effects 0.000 claims description 16
- 230000005284 excitation Effects 0.000 claims description 11
- 239000013598 vector Substances 0.000 claims description 7
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 230000009191 jumping Effects 0.000 claims description 6
- 230000006835 compression Effects 0.000 claims description 5
- 238000007906 compression Methods 0.000 claims description 5
- 230000002093 peripheral effect Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 abstract 1
- 238000013527 convolutional neural network Methods 0.000 description 9
- 230000000694 effects Effects 0.000 description 6
- 241001466077 Salina Species 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 4
- 238000013136 deep learning model Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002059 diagnostic imaging Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/58—Extraction of image or video features relating to hyperspectral data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/194—Terrestrial scenes using hyperspectral data, i.e. more or other wavelengths than RGB
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Remote Sensing (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a hyperspectral image classification method based on a space-spectrum hybrid self-attention mechanism, which comprises the following steps of: determining a data set: selecting several published hyperspectral image datasets; preprocessing hyperspectral data: carrying out sample block taking operation on the hyperspectral data set, and dividing a training set, a verification set and a test set; and (3) network construction: constructing a hyperspectral image classification network based on a spatial-spectral mixed self-attention mechanism; training a network: inputting hyperspectral training samples into a constructed network in batches for network training, and verifying the current classification performance of the network through a hyperspectral verification sample set after each batch of training is completed; sample classification: and inputting the hyperspectral test sample into a hyperspectral image classification network based on a space-spectrum hybrid self-attention mechanism after training to obtain a classification result. The method can fully utilize inherent characteristics in the hyperspectral image, classify the image with high accuracy, and can be used in the field of detecting the types of the ground features of the hyperspectral image.
Description
Technical Field
The invention relates to a hyperspectral image classification method based on a space-spectrum hybrid self-attention mechanism, in particular to a hyperspectral image classification method related to a remote sensing technology, and belongs to the technical field of image processing.
Background
The hyperspectral image is data obtained by shooting and imaging the ground by a hyperspectral imager carried on a satellite or other space-based carrier, and is different from three-channel images common in life, and the hyperspectral image can provide richer spectral information to make up for the defect of the spectral resolution of the traditional image data. Hyperspectral remote sensing is used as a leading field of the development of the current remote sensing technology, and has great application potential, and hyperspectral images are widely applied to the fields of agriculture, geological exploration, environmental monitoring, medical imaging and the like and have unique advantages. The object is to classify the pixels in the hyperspectral image into different ground object categories. Hyperspectral image classification is a challenging task because hyperspectral image data is high in dimensionality, large in data volume, complex in feature space, and conventional image classification algorithms often have difficulty processing the data.
Traditional hyperspectral image classification methods generally employ manual feature extraction and classifier training methods. This approach requires the design of feature extraction algorithms that rely on expert prior knowledge, and then uses a classifier to classify the extracted features. However, this method has many problems. First, manual feature extraction algorithms require expertise and experience, and the extracted features often do not adequately reflect the essential features of the image. Second, these problems may affect the classification effect due to the large amount of noise and redundant information contained in the hyperspectral image. Finally, the accuracy of the classifier is also greatly limited due to interference of human factors and insufficient feature extraction.
In recent years, a deep learning-based method has made remarkable progress in hyperspectral image classification. The deep learning model can automatically learn the characteristic representation of the image, and avoids the problem in a manual characteristic extraction algorithm, so that the deep learning model has wide application prospect in hyperspectral image classification. Among deep learning models, convolutional Neural Networks (CNNs) are a widely used model. The CNN can automatically extract the characteristics from the image, and has better performance and calculation efficiency. However, due to the specificity of hyperspectral image data, the conventional 2D convolutional neural network and 3D CNN method have certain limitations in terms of classification effect and computational efficiency. 2DCNN can only extract the spatial features of the image, but cannot fully utilize the spectrum information; although the 3D CNN can extract spatial and spectral features at the same time, the 3D CNN is too computationally intensive to process due to the high dimensionality of the hyperspectral image data. Therefore, it is of great importance to research a hyperspectral image classification algorithm which is high in calculation efficiency and can fully utilize space and spectrum information.
Attention mechanisms play an important role in deep learning. The method is a signal processing mechanism, and the model can pay attention to important information better by weighting and distributing attention to input data, so that the performance of the model is improved. The attention mechanism not only can improve the precision of the model, but also can help us understand the decision process of the model, and enhance the interpretability of the model. Unlike conventional attention mechanisms, self-attention is a variant of the attention mechanism in which the object being processed is a feature vector for each position in the input sequence, and not an element in the sequence. The self-attention mechanism uses the eigenvectors of all positions in the sequence to weight average in computing the weights, indicating the importance of that position, and therefore, the self-attention mechanism is better at capturing the correlations inside the sequence quickly, optimizing the model performance.
Based on the background, the invention provides a simple and effective hyperspectral image classification method for realizing rapid and accurate hyperspectral pixel classification.
Disclosure of Invention
The invention provides a hyperspectral image classification method based on a space-spectrum hybrid self-attention mechanism, which can fully play respective advantages of self-attention and convolutional neural networks, effectively capture long-short range information in hyperspectral images and fully fuse the long-short range information, and realize rapid and accurate hyperspectral image classification.
The technical scheme of the invention is as follows: a hyperspectral image classification method based on a space-spectrum hybrid self-attention mechanism comprises the following specific steps:
step1, preparing a plurality of hyperspectral image data sets of general disclosure for network training;
step2, preprocessing a hyperspectral image, extracting a hyperspectral image block by taking each pixel as a central point, and dividing the hyperspectral image block into a hyperspectral training sample set, a hyperspectral verification sample set and a hyperspectral test sample set which are not coincident;
step3, constructing a hyperspectral image classification network based on a space-spectrum hybrid self-attention mechanism, wherein the whole network consists of an attention main branch and two hyperspectral local information extraction blocks; the attention main branch comprises four hyperspectral channel attention modules and four space-spectrum mixed self-attention modules; finally, connecting the hyperspectral local information extraction block to a main branch;
step4, training a hyperspectral image classification network based on a space-spectrum hybrid self-attention mechanism by using a hyperspectral training sample set, verifying the trained network by using a hyperspectral verification sample set after each batch of training is completed, and checking the state and convergence condition of the current method;
step5, inputting the hyperspectral test sample set into a trained hyperspectral image classification network based on a space-spectrum hybrid self-attention mechanism to obtain class labels of each pixel in the test sample, and completing hyperspectral image classification.
As a further scheme of the invention, the Step2 comprises the following specific steps:
step2.1, hyperspectral image has three-dimensional properties, its data are expressed as S.epsilon.R H×W×C Filling pixels with the size of n and the pixel value of 0 into the peripheral edges of the original hyperspectral image; extracting hyperspectral image blocks from the filled image;
step2.2, classifying the hyperspectral image block into a class set to which the image block belongs according to the class of the central pixel of the hyperspectral image block;
step2.3, selecting hyperspectral image blocks in different sizes of data sets according to different proportions from each category as a training set, then selecting hyperspectral image blocks with the same proportions as a verification set, and finally taking the rest hyperspectral image blocks in each category set as a test set.
As a further scheme of the invention, the specific steps of Step3 are as follows:
step3.1, constructing a hyperspectral local information extraction block formed by connecting a two-dimensional convolution layer, a three-dimensional convolution layer, a normalization layer and a Relu activation function layer in series;
step3.2, constructing a main attention branch consisting of 4 hyperspectral channel attention modules and 4 spatial-spectral hybrid self-attention modules.
As a further scheme of the invention, the specific steps of the step3.2 are as follows:
step3.2.1, constructing a hyperspectral channel attention module consisting of a 1-layer laminated excitation layer and a 1-layer two-dimensional convolution layer;
step3.2.2, building a spatial-spectral hybrid self-attention module consisting of 1 spatial self-attention block and 1 spectral self-attention block, wherein the spatial self-attention block consists of 2 linear normalization layers, 1 spatial self-attention layer and 1 multi-layer perception layer, and the spectral self-attention block consists of 2 linear normalization layers, 1 spectral self-attention layer and 1 multi-layer perception layer.
As a further aspect of the present invention, in the step3.1, the hyperspectral local information extraction block includes 2 two-dimensional convolution blocks and 3 three-dimensional convolution blocks, and the convolution block structure is fixed as follows: convolution layer- & gt normalization layer- & gt activation function layer; the structure of the hyperspectral local information extraction block is as follows: first two-dimensional convolution block- & gt first three-dimensional convolution block- & gt second three-dimensional convolution block- & gt third three-dimensional convolution block- & gt second two-dimensional convolution block;
the convolution kernel sizes of the two-dimensional convolution layers are set to 1×1, the convolution kernel sizes of the first three-dimensional convolution layer and the third three-dimensional convolution layer are set to 1×1×3, the convolution kernel size of the second three-dimensional convolution layer is set to 1×1×7, and the activation function of each activation layer is set to a Relu activation function.
As a further aspect of the present invention, in the step3.2, the main attention path is composed of 4 hyperspectral channel attention modules and 4 spatial-spectral hybrid self-attention modules, and the structure is fixed as follows: first hyperspectral channel attention module→first spatial-spectral mixed self-attention module→second hyperspectral channel attention module→second spatial-spectral mixed self-attention module→third hyperspectral channel attention module→third spatial-spectral mixed self-attention module→fourth hyperspectral channel attention module→fourth spatial-spectral mixed self-attention module;
the hyperspectral channel attention module consists of a 1-layer laminated excitation layer and a 1-layer two-dimensional convolution layer, and the structure is fixed as follows: two-dimensional convolution layer→compression excitation layer; wherein the convolution kernel size of the two-dimensional convolution layer is set to 1×1;
the spatial-spectral hybrid self-attention module consists of 1 spatial self-attention block and 1 spectral self-attention block, and the structure is fixed as follows: spectral self-attention block→spatial self-attention block; wherein:
the spatial self-attention block is formed by connecting 2 linear normalization layers, 1 spatial self-attention layer and 1 multi-layer perception layer in series, and the specific structure is as follows: the method comprises the steps of a first linear normalization layer, a spatial self-attention layer, a second linear normalization layer and a multi-layer perception layer, wherein an output characteristic diagram of a spatial self-attention block is added with an output characteristic diagram of the spatial self-attention layer in a jumping connection mode according to elements, and the output characteristic diagram of the spatial self-attention layer is added with the output characteristic diagram of the multi-layer perception layer in a jumping connection mode according to elements;
the spectrum self-focusing block consists of 2 linear normalization layers, 1 spectrum self-focusing layer and 1 multi-layer sensing layer, and the connection mode is the same as that of the space self-focusing block, and the specific structure is as follows: first linear normalization layer→spectral self-attention layer→second linear normalization layer→multilayer perception layer.
As a further aspect of the present invention, in Step 4:
updating model parameters by using a random gradient descent algorithm, calculating a loss value by a cross entropy function, and expressing the loss value as follows:
where x represents the input, y represents the tag value, L represents the total loss, w c The category weight for category C, C for total number of categories, N for batch size, x n,c Representing an observation sample x n Prediction probability belonging to class c, y n,c Representing tag vector elements, if sample x n The true category of (2) is equal to c taking 1, otherwise taking 0.
The hyperspectral image classification network based on the space-spectrum hybrid self-attention mechanism consists of an attention main branch and two hyperspectral local information extraction blocks, wherein the hyperspectral local information extraction blocks are connected with the attention main branch in parallel; the hyperspectral image block is simultaneously input into an attention main branch and a first hyperspectral local information extraction block, in the main branch, the image block sequentially passes through a first hyperspectral channel attention module, a first space-spectrum mixed self-attention module, a second hyperspectral channel attention module, a second space-spectrum mixed self-attention module, at the moment, the output characteristic diagram of the first hyperspectral local information extraction block and the output characteristic diagram of the second space-spectrum mixed self-attention module are added according to elements, meanwhile, the image block is input into a third hyperspectral channel attention module and the second hyperspectral local information extraction block, in the main branch, the characteristic diagram sequentially passes through a third hyperspectral channel attention module, a third space-spectrum mixed self-attention module, a fourth hyperspectral channel attention module, a fourth space-spectrum mixed self-attention module, and finally, the output characteristic diagram of the second hyperspectral local information extraction block and the output characteristic diagram of the fourth space-spectrum mixed self-attention module are added according to elements, and the hyperspectral image block characteristic diagram is finally used for classification.
The beneficial effects of the invention are as follows:
1. the novel hyperspectral image classification network based on the space-spectrum hybrid self-attention mechanism can fully play respective advantages of the self-attention and convolutional neural network, effectively capture long-distance information in the hyperspectral image and fully fuse the long-distance information.
2. In order to utilize the inherent three-dimensional structure of the hyperspectral image, a brand new space-spectrum hybrid self-attention module is designed, and a self-attention mechanism can be applied from two dimensions of a spectrum and a space according to the characteristics of the hyperspectral image, so that long-range dependence on the two dimensions of the hyperspectral image can be respectively extracted. The spatial-spectral mixed self-attention module can extract spatial local information under the condition of considering prior information of the spatial positions of pixels in the hyperspectral image, and overcomes the defect that the self-attention classification method can only uniformly establish dependence on all pixels.
3. The hyperspectral local information extraction block with simple concept and strong function provided by the invention can be used for extracting the local information of the spectrum, simultaneously preserving the spatial information of the original image to a certain extent, and transmitting the spatial characteristics from the shallow layer to the deep layer, so that the classification effect is further improved.
Drawings
FIG. 1 is a flow chart of the present method;
FIG. 2 is a diagram of a hyperspectral channel attention module in the present method;
FIG. 3 is a schematic representation of a spatial-spectral hybrid self-attention module in the present method;
FIG. 4 is a graph of the classification results on the Indian pins dataset using the present method and two existing advanced other methods, respectively.
FIG. 5 is a graph of the classification results on the Salinas Valley dataset using the present method and two prior art methods, respectively.
Detailed Description
Embodiments and effects of the present invention are further described below with reference to the accompanying drawings.
Example 1, referring to fig. 1, a method for classifying hyperspectral images based on a spatial-spectral hybrid self-attention mechanism, the method comprises the following specific steps:
step1, preparing a plurality of hyperspectral image data sets of general disclosure for network training;
step2, preprocessing a hyperspectral image, extracting a hyperspectral image block by taking each pixel as a central point, and dividing the hyperspectral image block into a hyperspectral training sample set, a hyperspectral verification sample set and a hyperspectral test sample set which are not coincident;
step3, constructing a hyperspectral image classification network based on a space-spectrum hybrid self-attention mechanism, wherein the whole network consists of an attention main branch and two hyperspectral local information extraction blocks; the attention main branch comprises four hyperspectral channel attention modules and four space-spectrum mixed self-attention modules; finally, connecting the hyperspectral local information extraction block to a main branch;
step4, training a hyperspectral image classification network based on a space-spectrum hybrid self-attention mechanism by using a hyperspectral training sample set, verifying the trained network by using a hyperspectral verification sample set after each batch of training is completed, and checking the state and convergence condition of the current method;
step5, inputting the hyperspectral test sample set into a trained hyperspectral image classification network based on a space-spectrum hybrid self-attention mechanism to obtain class labels of each pixel in the test sample, and completing hyperspectral image classification.
Further, the Step2 specifically comprises the following steps:
step2.1, hyperspectral image has three-dimensional properties, and its data can be expressed as S.epsilon.R H×W×C Wherein H and W represent the length and width in the hyperspectral image space, and C represents the number of hyperspectral image channels; filling pixels with the size of n and the pixel value of 0 into the peripheral edges of the original hyperspectral image; extracting hyperspectral image blocks from the filled image, wherein the data can be used forRepresented as X.epsilon.R P×P×C Where P represents the size of the extracted hyperspectral image block, i.e. the hyperspectral image block with the spatial size (2n+1) × (2n+1) and the channel number C is selected with each original pixel point as the center, where the present example is taken but not limited to n=5.
Step2.2, classifying the hyperspectral image block into a class set to which the image block belongs according to the class of the central pixel of the hyperspectral image block;
step2.3, selecting hyperspectral image blocks in each category from data sets with different sizes according to different proportions to serve as a training set, wherein a large data set is 0.01, a small data set is 0.03, then selecting hyperspectral image blocks with the same proportions to serve as a verification set, and finally taking the rest hyperspectral image blocks in each category set as a test set.
Further, the Step3 specifically comprises the following steps:
step3.1, constructing a hyperspectral local information extraction block;
the hyperspectral local information extraction block is formed by serially connecting a two-dimensional convolution layer, a three-dimensional convolution layer, a normalization layer and a Relu activation function layer, wherein:
the hyperspectral local information extraction block comprises 2 two-dimensional convolution blocks and 3 three-dimensional convolution blocks, and the convolution block structure is fixed as follows: convolution layer- & gt normalization layer- & gt activation function layer; the structure of the hyperspectral local information extraction block is as follows: first two-dimensional convolution block- & gt first three-dimensional convolution block- & gt second three-dimensional convolution block- & gt third three-dimensional convolution block- & gt second two-dimensional convolution block;
the convolution kernel sizes of the two-dimensional convolution layers are set to 1×1, the convolution kernel sizes of the first three-dimensional convolution layer and the third three-dimensional convolution layer are set to 1×1×3, the convolution kernel size of the second three-dimensional convolution layer is set to 1×1×7, and the activation function of each activation layer is set to a Relu activation function.
Step3.2, constructing a main attention branch;
the main attention router is composed of 4 hyperspectral channel attention modules and 4 space-spectrum mixed self-attention modules, and the structure is fixed as follows: first hyperspectral channel attention module→first spatial-spectral mixed self-attention module→second hyperspectral channel attention module→second spatial-spectral mixed self-attention module→third hyperspectral channel attention module→third spatial-spectral mixed self-attention module→fourth hyperspectral channel attention module→fourth spatial-spectral mixed self-attention module.
Further, the step3.2 specifically comprises the following steps:
step3.2.1, build hyperspectral channel attention module
Referring to fig. 2, the hyperspectral channel attention module is composed of a 1-layer laminated excitation layer and a 1-layer two-dimensional convolution layer, and the structure is fixed as follows: two-dimensional convolution layer→compression excitation layer; wherein the convolution kernel size of the two-dimensional convolution layer is set to 1×1; the compression feature map portion in the compression excitation layer is completed through global average pooling, and is expressed as follows:
wherein U is c Each layer of channels representing an input feature map, Z c Representing the global average value of each layer of channels, wherein W and H represent the length and width of an input feature map;
the excitation profile portion in the compressed excitation layer is represented as follows:
S=F ex (z,W)=σ(g(z,W))=σ(g(W 2 δ(W 1 z)))
where S represents a weight vector, delta represents a Relu activation function, sigma represents a sigmoid activation function,the weight matrix of two fully connected layers is represented, r represents the number of hidden layer nodes in the middle layer, and r is 16 in this embodiment.
The final output X is represented as the vector product of S and U, and its data can be represented as X ε R P×P×D Where P is the size of the output feature map and D is the convolution kernel number of the two-dimensional convolution layer.
Step3.2.2, build spatial-spectral hybrid self-attention module
With reference to figure 3 of the drawings,the spatial-spectral hybrid self-attention module consists of 1 spatial self-attention block and 1 spectral self-attention block, and the structure is fixed as follows: spectral self-attention block→spatial self-attention block; assume that the feature map sequence segmented by the hyperspectral channel attention module is X ε R P×C Wherein P represents the total number of pixels in the feature map and C represents the dimension of the input feature map, wherein:
the spatial self-attention block consists of 2 linear normalization layers, 1 spatial self-attention layer and 1 multi-layer perception layer; the spatial self-attention layer uniformly divides the input feature map X in a non-overlapping manner, and the number of divided blocks is assumed to be N d Each block contains P d The number of pixels is p=n, the number of pixels in the feature map is input d *P d . Multiplying the segmented feature images by 3 different learnable weight matrices W q 、W k 、W v Three different vectors Q, K, V are obtained, respectively, and Q, K, V is then divided into h parts along the spectral dimension, h being the number of heads of the multi-head attention:
Q={Q 1 ,Q 2 ,…,Q i ,…,Q h }
K={K 1 ,K 2 ,…,K i ,…,K h }
V={V 1 ,V 2 ,…,V i ,…,V h }
using each Q i For each K i The transpose of the data is used for dot product operation to obtain Q i And K is equal to i Due to the degree of similarity of Q i ×K i The value of Q will be i And K i The increase in the initial dimension d increases, we need to divide by ∈d k The problem of controlling the disappearance of the gradient. Then normalizing the softmax function to obtain an attention matrix, and finally combining the attention matrix with V i Multiplication results in a single head output:
splicing the output results of the plurality of heads to obtain the output of the single cut feature block:
A(Q,K,V)=Concat(X 1 ,X 2 ,…,X i ,…,X h )W o
where W0 is the output projection matrix. The global spatial self-attention layer can be expressed as:
the multi-layer perceptual layer comprises a linear projection layer implemented by 2 fully connected layers and 1 GELU activation function, which can be expressed as:
MLP(X)=FC 2 (GELU(FC 1 (X)))
the spatial self-attention blocks are connected in series, and the specific structure is as follows: the method comprises the steps of a first linear normalization layer, a spatial self-attention layer, a second linear normalization layer and a multi-layer perception layer, wherein an output characteristic diagram of a spatial self-attention block is added with an output characteristic diagram of the spatial self-attention layer in a jumping connection mode in an element mode, the output characteristic diagram of the spatial self-attention layer is added with the output characteristic diagram of the multi-layer perception layer in a jumping connection mode in an element mode, and a mathematical formula of the process is expressed as follows:
wherein the method comprises the steps ofAnd Output represent the Output characteristics of the spatial self-attention layer and the multi-layer perceptual layer, respectively.
The spectrum self-focusing block consists of 2 linear normalization layers, 1 spectrum self-focusing layer and 1 multi-layer sensing layer, and the connection mode is the same as that of the space self-focusing block, and the specific structure is as follows: first linear normalization layer→spectral self-attention layer→second linear normalization layer→multilayer perception layer. The spectral self-attention block transposes the input feature map and then sends the transposed input feature map into the self-attention block, and the number of heads of the self-attention layer is set to be 1, and the spectral self-attention layer can be expressed as:
A spectral (Q,K,V)=A(Q i ,K i ,V i )
wherein Q is i 、K i 、V i ∈R C×P Q, K, V, which is a transposed feature map;
the hyperspectral image classification network based on the space-spectrum hybrid self-attention mechanism consists of an attention main branch and two hyperspectral local information extraction blocks, wherein the hyperspectral local information extraction blocks are connected with the attention main branch in parallel; the hyperspectral image block is simultaneously input into an attention main branch and a first hyperspectral local information extraction block, in the main branch, the image block sequentially passes through a first hyperspectral channel attention module, a first space-spectrum mixed self-attention module, a second hyperspectral channel attention module, a second space-spectrum mixed self-attention module, at the moment, the output characteristic diagram of the first hyperspectral local information extraction block and the output characteristic diagram of the second space-spectrum mixed self-attention module are added according to elements, meanwhile, the image block is input into a third hyperspectral channel attention module and the second hyperspectral local information extraction block, in the main branch, the characteristic diagram sequentially passes through a third hyperspectral channel attention module, a third space-spectrum mixed self-attention module, a fourth hyperspectral channel attention module, a fourth space-spectrum mixed self-attention module, and finally, the output characteristic diagram of the second hyperspectral local information extraction block and the output characteristic diagram of the fourth space-spectrum mixed self-attention module are added according to elements, and the hyperspectral image block characteristic diagram is finally used for classification.
Further, in Step 4:
updating model parameters by using a random gradient descent algorithm, calculating a loss value by a cross entropy function, and expressing the loss value as follows:
where x represents the input, y represents the tag value, L represents the total loss, w c The category weight for category C, C for total number of categories, N for batch size, x n,c Representing an observation sample x n Prediction probability belonging to class c, y n,c Representing tag vector elements, if sample x n The true category of (2) is equal to c taking 1, otherwise taking 0.
Further, in this embodiment, the learning rate of the network is set to 0.008, the batch size is 64, and 100 rounds of iteration are trained, so as to obtain the paranoid parameters and the weight file of the final network.
The hardware platform of the simulation experiment of this embodiment is: a CPU of model 12th Gen Intel (R) Core (TM) i9-12900KF and a NVIDIA GeForce RTX 3090GPU of 24GB memory; with the Ubuntu 20.04.5LTS operating system, the configured virtual environment includes: python3.9.16, pytorch1.13.1, cuda11.6, etc.
Further, the Step5 specifically comprises the following steps:
sending the hyperspectral test sample set into the hyperspectral classification network after training, and calculating three general evaluation indexes: the larger these three indices represent the better the classification effect, the overall classification accuracy (OA), the Average Accuracy (AA), and the Kappa coefficient (K).
To evaluate the effectiveness of the present method, two existing advanced methods SSFTT and GAHT were used to classify the ground object targets in two common hyperspectral datasets, indian pins and Salinas Valley, respectively.
The SSFTT method is as follows: the hyperspectral classification method proposed by Sun et al in Spectrum-spatial feature tokenization transformer for hyperspectral image classification, abbreviated as SSFTT;
the GAHT method refers to: the hyperspectral classification method proposed by Mei S et al in "Hyperspectral image classification using group-aware hierarchical transformer", abbreviated GAHT;
table 1 comparison of classification results for three networks under two data sets
Experiments on two main stream data sets show that compared with an advanced hyperspectral image classification method, the method is more powerful in performance and can more accurately predict the pixel sample types of hyperspectral images.
FIG. 4 (a) is a graph of classification results on the Indian pins dataset using the SSFTT method;
FIG. 4 (b) is a graph of classification results on the Indian pins dataset using the GHIT method;
FIG. 4 (c) is a graph of the classification result on the Indian pins dataset using the present method;
FIG. 5 (a) is a graph of classification results on the Salinas Valley dataset using the SSFTT method;
FIG. 5 (b) is a graph of the classification result on the Salinas Valley dataset using the GHIT method;
FIG. 5 (c) is a graph of the classification result on the Salinas Valley dataset using the present method;
as is clear from the figure, the method has the least misclassified pixels, and obtains good classification effect in noisy classes and boundary areas, especially on Indian pins data sets with relatively unsmooth pixel distribution, and still shows strong classification capability compared with other methods.
While the present invention has been described in detail with reference to the drawings, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.
Claims (7)
1. The hyperspectral image classification method based on the space-spectrum hybrid self-attention mechanism is characterized by comprising the following steps of: the method comprises the following steps:
step1, preparing a plurality of hyperspectral image data sets of general disclosure for network training;
step2, preprocessing a hyperspectral image, extracting a hyperspectral image block by taking each pixel as a central point, and dividing the hyperspectral image block into a hyperspectral training sample set, a hyperspectral verification sample set and a hyperspectral test sample set which are not coincident;
step3, constructing a hyperspectral image classification network based on a space-spectrum hybrid self-attention mechanism, wherein the whole network consists of an attention main branch and two hyperspectral local information extraction blocks; the attention main branch comprises four hyperspectral channel attention modules and four space-spectrum mixed self-attention modules; finally, connecting the hyperspectral local information extraction block to a main branch;
step4, training a hyperspectral image classification network based on a space-spectrum hybrid self-attention mechanism by using a hyperspectral training sample set, verifying the trained network by using a hyperspectral verification sample set after each batch of training is completed, and checking the state and convergence condition of the current method;
step5, inputting the hyperspectral test sample set into a trained hyperspectral image classification network based on a space-spectrum hybrid self-attention mechanism to obtain class labels of each pixel in the test sample, and completing hyperspectral image classification.
2. The method for classifying hyperspectral images based on a spatial-spectral mixed self-attention mechanism as recited in claim 1, wherein: the specific steps of Step2 are as follows:
step2.1, hyperspectral image has three-dimensional properties, its data are expressed as S.epsilon.R H×W×C Filling pixels with the size of n and the pixel value of 0 into the peripheral edges of the original hyperspectral image; extracting hyperspectral image blocks from the filled image;
step2.2, classifying the hyperspectral image block into a class set to which the image block belongs according to the class of the central pixel of the hyperspectral image block;
step2.3, selecting hyperspectral image blocks in different sizes of data sets according to different proportions from each category as a training set, then selecting hyperspectral image blocks with the same proportions as a verification set, and finally taking the rest hyperspectral image blocks in each category set as a test set.
3. The method for classifying hyperspectral images based on a spatial-spectral mixed self-attention mechanism as recited in claim 1, wherein: the specific steps of Step3 are as follows:
step3.1, constructing a hyperspectral local information extraction block formed by connecting a two-dimensional convolution layer, a three-dimensional convolution layer, a normalization layer and a Relu activation function layer in series;
step3.2, constructing a main attention branch consisting of 4 hyperspectral channel attention modules and 4 spatial-spectral hybrid self-attention modules.
4. A method of classifying hyperspectral images based on a spatio-spectral mixed self-attention mechanism as claimed in claim 3, characterized in that: the specific steps of the step3.2 are as follows:
step3.2.1, constructing a hyperspectral channel attention module consisting of a 1-layer laminated excitation layer and a 1-layer two-dimensional convolution layer;
step3.2.2, building a spatial-spectral hybrid self-attention module consisting of 1 spatial self-attention block and 1 spectral self-attention block, wherein the spatial self-attention block consists of 2 linear normalization layers, 1 spatial self-attention layer and 1 multi-layer perception layer, and the spectral self-attention block consists of 2 linear normalization layers, 1 spectral self-attention layer and 1 multi-layer perception layer.
5. A method of classifying hyperspectral images based on a spatio-spectral mixed self-attention mechanism as claimed in claim 3, characterized in that: in step3.1, the hyperspectral local information extraction block includes 2 two-dimensional convolution blocks and 3 three-dimensional convolution blocks, and the convolution block structure is fixed as follows: convolution layer- & gt normalization layer- & gt activation function layer; the structure of the hyperspectral local information extraction block is as follows: first two-dimensional convolution block- & gt first three-dimensional convolution block- & gt second three-dimensional convolution block- & gt third three-dimensional convolution block- & gt second two-dimensional convolution block;
the convolution kernel sizes of the two-dimensional convolution layers are set to 1×1, the convolution kernel sizes of the first three-dimensional convolution layer and the third three-dimensional convolution layer are set to 1×1×3, the convolution kernel size of the second three-dimensional convolution layer is set to 1×1×7, and the activation function of each activation layer is set to a Relu activation function.
6. A method of classifying hyperspectral images based on a spatio-spectral mixed self-attention mechanism as claimed in claim 3, characterized in that: in step3.2, the main attention path is composed of 4 hyperspectral channel attention modules and 4 space-spectrum mixed self-attention modules, and the structure is fixed as follows: first hyperspectral channel attention module→first spatial-spectral mixed self-attention module→second hyperspectral channel attention module→second spatial-spectral mixed self-attention module→third hyperspectral channel attention module→third spatial-spectral mixed self-attention module→fourth hyperspectral channel attention module→fourth spatial-spectral mixed self-attention module;
the hyperspectral channel attention module consists of a 1-layer laminated excitation layer and a 1-layer two-dimensional convolution layer, and the structure is fixed as follows: two-dimensional convolution layer→compression excitation layer; wherein the convolution kernel size of the two-dimensional convolution layer is set to 1×1;
the spatial-spectral hybrid self-attention module consists of 1 spatial self-attention block and 1 spectral self-attention block, and the structure is fixed as follows: spectral self-attention block→spatial self-attention block; wherein:
the spatial self-attention block is formed by connecting 2 linear normalization layers, 1 spatial self-attention layer and 1 multi-layer perception layer in series, and the specific structure is as follows: the method comprises the steps of a first linear normalization layer, a spatial self-attention layer, a second linear normalization layer and a multi-layer perception layer, wherein an output characteristic diagram of a spatial self-attention block is added with an output characteristic diagram of the spatial self-attention layer in a jumping connection mode according to elements, and the output characteristic diagram of the spatial self-attention layer is added with the output characteristic diagram of the multi-layer perception layer in a jumping connection mode according to elements;
the spectrum self-focusing block consists of 2 linear normalization layers, 1 spectrum self-focusing layer and 1 multi-layer sensing layer, and the connection mode is the same as that of the space self-focusing block, and the specific structure is as follows: first linear normalization layer→spectral self-attention layer→second linear normalization layer→multilayer perception layer.
7. The method for classifying hyperspectral images based on a spatial-spectral mixed self-attention mechanism as recited in claim 1, wherein: in Step 4:
updating model parameters by using a random gradient descent algorithm, calculating a loss value by a cross entropy function, and expressing the loss value as follows:
where x represents the input, y represents the tag value, L represents the total loss, w c The category weight for category C, C for total number of categories, N for batch size, x n,c Representing an observation sample x n Prediction probability belonging to class c, y n,c Representing the tag vector elements.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310902900.4A CN116977723A (en) | 2023-07-21 | 2023-07-21 | Hyperspectral image classification method based on space-spectrum hybrid self-attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310902900.4A CN116977723A (en) | 2023-07-21 | 2023-07-21 | Hyperspectral image classification method based on space-spectrum hybrid self-attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116977723A true CN116977723A (en) | 2023-10-31 |
Family
ID=88470656
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310902900.4A Pending CN116977723A (en) | 2023-07-21 | 2023-07-21 | Hyperspectral image classification method based on space-spectrum hybrid self-attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116977723A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118141356A (en) * | 2024-04-30 | 2024-06-07 | 天津工业大学 | Depth ADMM unfolding EIT imaging method based on model driving |
CN118230023A (en) * | 2024-02-19 | 2024-06-21 | 南京信息工程大学 | Hyperspectral image classification method and system based on secondary training |
-
2023
- 2023-07-21 CN CN202310902900.4A patent/CN116977723A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118230023A (en) * | 2024-02-19 | 2024-06-21 | 南京信息工程大学 | Hyperspectral image classification method and system based on secondary training |
CN118141356A (en) * | 2024-04-30 | 2024-06-07 | 天津工业大学 | Depth ADMM unfolding EIT imaging method based on model driving |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Juefei-Xu et al. | Local binary convolutional neural networks | |
CN110135267B (en) | Large-scene SAR image fine target detection method | |
CN110348399B (en) | Hyperspectral intelligent classification method based on prototype learning mechanism and multidimensional residual error network | |
CN108491849B (en) | Hyperspectral image classification method based on three-dimensional dense connection convolution neural network | |
CN110084159A (en) | Hyperspectral image classification method based on the multistage empty spectrum information CNN of joint | |
CN114821164B (en) | Hyperspectral image classification method based on twin network | |
CN113095409B (en) | Hyperspectral image classification method based on attention mechanism and weight sharing | |
CN111695467A (en) | Spatial spectrum full convolution hyperspectral image classification method based on superpixel sample expansion | |
CN113486851B (en) | Hyperspectral image classification method based on double-branch spectrum multi-scale attention network | |
CN107145836B (en) | Hyperspectral image classification method based on stacked boundary identification self-encoder | |
Yang et al. | Dual-channel densenet for hyperspectral image classification | |
CN109145992A (en) | Cooperation generates confrontation network and sky composes united hyperspectral image classification method | |
CN110717553A (en) | Traffic contraband identification method based on self-attenuation weight and multiple local constraints | |
CN111814685B (en) | Hyperspectral image classification method based on double-branch convolution self-encoder | |
CN116977723A (en) | Hyperspectral image classification method based on space-spectrum hybrid self-attention mechanism | |
CN109344698A (en) | EO-1 hyperion band selection method based on separable convolution sum hard threshold function | |
CN113705580B (en) | Hyperspectral image classification method based on deep migration learning | |
CN108229551B (en) | Hyperspectral remote sensing image classification method based on compact dictionary sparse representation | |
CN109190511B (en) | Hyperspectral classification method based on local and structural constraint low-rank representation | |
CN108460391A (en) | Based on the unsupervised feature extracting method of high spectrum image for generating confrontation network | |
CN111814607A (en) | Deep learning model suitable for small sample hyperspectral image classification | |
CN108734199A (en) | High spectrum image robust classification method based on segmentation depth characteristic and low-rank representation | |
CN110598594A (en) | Hyperspectral classification method based on space spectrum self-adaptive bidirectional long-time and short-time memory model | |
CN108268890A (en) | A kind of hyperspectral image classification method | |
CN107451562A (en) | A kind of band selection method based on Chaotic Binary gravitation search algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |