CN116977723A

CN116977723A - Hyperspectral image classification method based on space-spectrum hybrid self-attention mechanism

Info

Publication number: CN116977723A
Application number: CN202310902900.4A
Authority: CN
Inventors: 舒振球; 王雨阳; 余正涛
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2023-07-21
Filing date: 2023-07-21
Publication date: 2023-10-31

Abstract

The invention discloses a hyperspectral image classification method based on a space-spectrum hybrid self-attention mechanism, which comprises the following steps of: determining a data set: selecting several published hyperspectral image datasets; preprocessing hyperspectral data: carrying out sample block taking operation on the hyperspectral data set, and dividing a training set, a verification set and a test set; and (3) network construction: constructing a hyperspectral image classification network based on a spatial-spectral mixed self-attention mechanism; training a network: inputting hyperspectral training samples into a constructed network in batches for network training, and verifying the current classification performance of the network through a hyperspectral verification sample set after each batch of training is completed; sample classification: and inputting the hyperspectral test sample into a hyperspectral image classification network based on a space-spectrum hybrid self-attention mechanism after training to obtain a classification result. The method can fully utilize inherent characteristics in the hyperspectral image, classify the image with high accuracy, and can be used in the field of detecting the types of the ground features of the hyperspectral image.

Description

Hyperspectral image classification method based on space-spectrum hybrid self-attention mechanism

Technical Field

The invention relates to a hyperspectral image classification method based on a space-spectrum hybrid self-attention mechanism, in particular to a hyperspectral image classification method related to a remote sensing technology, and belongs to the technical field of image processing.

Background

The hyperspectral image is data obtained by shooting and imaging the ground by a hyperspectral imager carried on a satellite or other space-based carrier, and is different from three-channel images common in life, and the hyperspectral image can provide richer spectral information to make up for the defect of the spectral resolution of the traditional image data. Hyperspectral remote sensing is used as a leading field of the development of the current remote sensing technology, and has great application potential, and hyperspectral images are widely applied to the fields of agriculture, geological exploration, environmental monitoring, medical imaging and the like and have unique advantages. The object is to classify the pixels in the hyperspectral image into different ground object categories. Hyperspectral image classification is a challenging task because hyperspectral image data is high in dimensionality, large in data volume, complex in feature space, and conventional image classification algorithms often have difficulty processing the data.

Traditional hyperspectral image classification methods generally employ manual feature extraction and classifier training methods. This approach requires the design of feature extraction algorithms that rely on expert prior knowledge, and then uses a classifier to classify the extracted features. However, this method has many problems. First, manual feature extraction algorithms require expertise and experience, and the extracted features often do not adequately reflect the essential features of the image. Second, these problems may affect the classification effect due to the large amount of noise and redundant information contained in the hyperspectral image. Finally, the accuracy of the classifier is also greatly limited due to interference of human factors and insufficient feature extraction.

In recent years, a deep learning-based method has made remarkable progress in hyperspectral image classification. The deep learning model can automatically learn the characteristic representation of the image, and avoids the problem in a manual characteristic extraction algorithm, so that the deep learning model has wide application prospect in hyperspectral image classification. Among deep learning models, convolutional Neural Networks (CNNs) are a widely used model. The CNN can automatically extract the characteristics from the image, and has better performance and calculation efficiency. However, due to the specificity of hyperspectral image data, the conventional 2D convolutional neural network and 3D CNN method have certain limitations in terms of classification effect and computational efficiency. 2DCNN can only extract the spatial features of the image, but cannot fully utilize the spectrum information; although the 3D CNN can extract spatial and spectral features at the same time, the 3D CNN is too computationally intensive to process due to the high dimensionality of the hyperspectral image data. Therefore, it is of great importance to research a hyperspectral image classification algorithm which is high in calculation efficiency and can fully utilize space and spectrum information.

Attention mechanisms play an important role in deep learning. The method is a signal processing mechanism, and the model can pay attention to important information better by weighting and distributing attention to input data, so that the performance of the model is improved. The attention mechanism not only can improve the precision of the model, but also can help us understand the decision process of the model, and enhance the interpretability of the model. Unlike conventional attention mechanisms, self-attention is a variant of the attention mechanism in which the object being processed is a feature vector for each position in the input sequence, and not an element in the sequence. The self-attention mechanism uses the eigenvectors of all positions in the sequence to weight average in computing the weights, indicating the importance of that position, and therefore, the self-attention mechanism is better at capturing the correlations inside the sequence quickly, optimizing the model performance.

Based on the background, the invention provides a simple and effective hyperspectral image classification method for realizing rapid and accurate hyperspectral pixel classification.

Disclosure of Invention

The invention provides a hyperspectral image classification method based on a space-spectrum hybrid self-attention mechanism, which can fully play respective advantages of self-attention and convolutional neural networks, effectively capture long-short range information in hyperspectral images and fully fuse the long-short range information, and realize rapid and accurate hyperspectral image classification.

The technical scheme of the invention is as follows: a hyperspectral image classification method based on a space-spectrum hybrid self-attention mechanism comprises the following specific steps:

step1, preparing a plurality of hyperspectral image data sets of general disclosure for network training;

step2, preprocessing a hyperspectral image, extracting a hyperspectral image block by taking each pixel as a central point, and dividing the hyperspectral image block into a hyperspectral training sample set, a hyperspectral verification sample set and a hyperspectral test sample set which are not coincident;

step3, constructing a hyperspectral image classification network based on a space-spectrum hybrid self-attention mechanism, wherein the whole network consists of an attention main branch and two hyperspectral local information extraction blocks; the attention main branch comprises four hyperspectral channel attention modules and four space-spectrum mixed self-attention modules; finally, connecting the hyperspectral local information extraction block to a main branch;

step4, training a hyperspectral image classification network based on a space-spectrum hybrid self-attention mechanism by using a hyperspectral training sample set, verifying the trained network by using a hyperspectral verification sample set after each batch of training is completed, and checking the state and convergence condition of the current method;

step5, inputting the hyperspectral test sample set into a trained hyperspectral image classification network based on a space-spectrum hybrid self-attention mechanism to obtain class labels of each pixel in the test sample, and completing hyperspectral image classification.

As a further scheme of the invention, the Step2 comprises the following specific steps:

step2.1, hyperspectral image has three-dimensional properties, its data are expressed as S.epsilon.R ^H×W×C Filling pixels with the size of n and the pixel value of 0 into the peripheral edges of the original hyperspectral image; extracting hyperspectral image blocks from the filled image;

step2.2, classifying the hyperspectral image block into a class set to which the image block belongs according to the class of the central pixel of the hyperspectral image block;

step2.3, selecting hyperspectral image blocks in different sizes of data sets according to different proportions from each category as a training set, then selecting hyperspectral image blocks with the same proportions as a verification set, and finally taking the rest hyperspectral image blocks in each category set as a test set.

As a further scheme of the invention, the specific steps of Step3 are as follows:

step3.1, constructing a hyperspectral local information extraction block formed by connecting a two-dimensional convolution layer, a three-dimensional convolution layer, a normalization layer and a Relu activation function layer in series;

step3.2, constructing a main attention branch consisting of 4 hyperspectral channel attention modules and 4 spatial-spectral hybrid self-attention modules.

As a further scheme of the invention, the specific steps of the step3.2 are as follows:

step3.2.1, constructing a hyperspectral channel attention module consisting of a 1-layer laminated excitation layer and a 1-layer two-dimensional convolution layer;

step3.2.2, building a spatial-spectral hybrid self-attention module consisting of 1 spatial self-attention block and 1 spectral self-attention block, wherein the spatial self-attention block consists of 2 linear normalization layers, 1 spatial self-attention layer and 1 multi-layer perception layer, and the spectral self-attention block consists of 2 linear normalization layers, 1 spectral self-attention layer and 1 multi-layer perception layer.

As a further aspect of the present invention, in the step3.1, the hyperspectral local information extraction block includes 2 two-dimensional convolution blocks and 3 three-dimensional convolution blocks, and the convolution block structure is fixed as follows: convolution layer- & gt normalization layer- & gt activation function layer; the structure of the hyperspectral local information extraction block is as follows: first two-dimensional convolution block- & gt first three-dimensional convolution block- & gt second three-dimensional convolution block- & gt third three-dimensional convolution block- & gt second two-dimensional convolution block;

the convolution kernel sizes of the two-dimensional convolution layers are set to 1×1, the convolution kernel sizes of the first three-dimensional convolution layer and the third three-dimensional convolution layer are set to 1×1×3, the convolution kernel size of the second three-dimensional convolution layer is set to 1×1×7, and the activation function of each activation layer is set to a Relu activation function.

As a further aspect of the present invention, in the step3.2, the main attention path is composed of 4 hyperspectral channel attention modules and 4 spatial-spectral hybrid self-attention modules, and the structure is fixed as follows: first hyperspectral channel attention module→first spatial-spectral mixed self-attention module→second hyperspectral channel attention module→second spatial-spectral mixed self-attention module→third hyperspectral channel attention module→third spatial-spectral mixed self-attention module→fourth hyperspectral channel attention module→fourth spatial-spectral mixed self-attention module;

the hyperspectral channel attention module consists of a 1-layer laminated excitation layer and a 1-layer two-dimensional convolution layer, and the structure is fixed as follows: two-dimensional convolution layer→compression excitation layer; wherein the convolution kernel size of the two-dimensional convolution layer is set to 1×1;

the spatial-spectral hybrid self-attention module consists of 1 spatial self-attention block and 1 spectral self-attention block, and the structure is fixed as follows: spectral self-attention block→spatial self-attention block; wherein:

the spatial self-attention block is formed by connecting 2 linear normalization layers, 1 spatial self-attention layer and 1 multi-layer perception layer in series, and the specific structure is as follows: the method comprises the steps of a first linear normalization layer, a spatial self-attention layer, a second linear normalization layer and a multi-layer perception layer, wherein an output characteristic diagram of a spatial self-attention block is added with an output characteristic diagram of the spatial self-attention layer in a jumping connection mode according to elements, and the output characteristic diagram of the spatial self-attention layer is added with the output characteristic diagram of the multi-layer perception layer in a jumping connection mode according to elements;

the spectrum self-focusing block consists of 2 linear normalization layers, 1 spectrum self-focusing layer and 1 multi-layer sensing layer, and the connection mode is the same as that of the space self-focusing block, and the specific structure is as follows: first linear normalization layer→spectral self-attention layer→second linear normalization layer→multilayer perception layer.

As a further aspect of the present invention, in Step 4:

updating model parameters by using a random gradient descent algorithm, calculating a loss value by a cross entropy function, and expressing the loss value as follows:

where x represents the input, y represents the tag value, L represents the total loss, w _c The category weight for category C, C for total number of categories, N for batch size, x _n,c Representing an observation sample x _n Prediction probability belonging to class c, y _n,c Representing tag vector elements, if sample x _n The true category of (2) is equal to c taking 1, otherwise taking 0.

The hyperspectral image classification network based on the space-spectrum hybrid self-attention mechanism consists of an attention main branch and two hyperspectral local information extraction blocks, wherein the hyperspectral local information extraction blocks are connected with the attention main branch in parallel; the hyperspectral image block is simultaneously input into an attention main branch and a first hyperspectral local information extraction block, in the main branch, the image block sequentially passes through a first hyperspectral channel attention module, a first space-spectrum mixed self-attention module, a second hyperspectral channel attention module, a second space-spectrum mixed self-attention module, at the moment, the output characteristic diagram of the first hyperspectral local information extraction block and the output characteristic diagram of the second space-spectrum mixed self-attention module are added according to elements, meanwhile, the image block is input into a third hyperspectral channel attention module and the second hyperspectral local information extraction block, in the main branch, the characteristic diagram sequentially passes through a third hyperspectral channel attention module, a third space-spectrum mixed self-attention module, a fourth hyperspectral channel attention module, a fourth space-spectrum mixed self-attention module, and finally, the output characteristic diagram of the second hyperspectral local information extraction block and the output characteristic diagram of the fourth space-spectrum mixed self-attention module are added according to elements, and the hyperspectral image block characteristic diagram is finally used for classification.

The beneficial effects of the invention are as follows:

1. the novel hyperspectral image classification network based on the space-spectrum hybrid self-attention mechanism can fully play respective advantages of the self-attention and convolutional neural network, effectively capture long-distance information in the hyperspectral image and fully fuse the long-distance information.

2. In order to utilize the inherent three-dimensional structure of the hyperspectral image, a brand new space-spectrum hybrid self-attention module is designed, and a self-attention mechanism can be applied from two dimensions of a spectrum and a space according to the characteristics of the hyperspectral image, so that long-range dependence on the two dimensions of the hyperspectral image can be respectively extracted. The spatial-spectral mixed self-attention module can extract spatial local information under the condition of considering prior information of the spatial positions of pixels in the hyperspectral image, and overcomes the defect that the self-attention classification method can only uniformly establish dependence on all pixels.

3. The hyperspectral local information extraction block with simple concept and strong function provided by the invention can be used for extracting the local information of the spectrum, simultaneously preserving the spatial information of the original image to a certain extent, and transmitting the spatial characteristics from the shallow layer to the deep layer, so that the classification effect is further improved.

Drawings

FIG. 1 is a flow chart of the present method;

FIG. 2 is a diagram of a hyperspectral channel attention module in the present method;

FIG. 3 is a schematic representation of a spatial-spectral hybrid self-attention module in the present method;

FIG. 4 is a graph of the classification results on the Indian pins dataset using the present method and two existing advanced other methods, respectively.

FIG. 5 is a graph of the classification results on the Salinas Valley dataset using the present method and two prior art methods, respectively.

Detailed Description

Embodiments and effects of the present invention are further described below with reference to the accompanying drawings.

Example 1, referring to fig. 1, a method for classifying hyperspectral images based on a spatial-spectral hybrid self-attention mechanism, the method comprises the following specific steps:

Further, the Step2 specifically comprises the following steps:

step2.1, hyperspectral image has three-dimensional properties, and its data can be expressed as S.epsilon.R ^H×W×C Wherein H and W represent the length and width in the hyperspectral image space, and C represents the number of hyperspectral image channels; filling pixels with the size of n and the pixel value of 0 into the peripheral edges of the original hyperspectral image; extracting hyperspectral image blocks from the filled image, wherein the data can be used forRepresented as X.epsilon.R ^P×P×C Where P represents the size of the extracted hyperspectral image block, i.e. the hyperspectral image block with the spatial size (2n+1) × (2n+1) and the channel number C is selected with each original pixel point as the center, where the present example is taken but not limited to n=5.

step2.3, selecting hyperspectral image blocks in each category from data sets with different sizes according to different proportions to serve as a training set, wherein a large data set is 0.01, a small data set is 0.03, then selecting hyperspectral image blocks with the same proportions to serve as a verification set, and finally taking the rest hyperspectral image blocks in each category set as a test set.

Further, the Step3 specifically comprises the following steps:

step3.1, constructing a hyperspectral local information extraction block;

the hyperspectral local information extraction block is formed by serially connecting a two-dimensional convolution layer, a three-dimensional convolution layer, a normalization layer and a Relu activation function layer, wherein:

the hyperspectral local information extraction block comprises 2 two-dimensional convolution blocks and 3 three-dimensional convolution blocks, and the convolution block structure is fixed as follows: convolution layer- & gt normalization layer- & gt activation function layer; the structure of the hyperspectral local information extraction block is as follows: first two-dimensional convolution block- & gt first three-dimensional convolution block- & gt second three-dimensional convolution block- & gt third three-dimensional convolution block- & gt second two-dimensional convolution block;

Step3.2, constructing a main attention branch;

the main attention router is composed of 4 hyperspectral channel attention modules and 4 space-spectrum mixed self-attention modules, and the structure is fixed as follows: first hyperspectral channel attention module→first spatial-spectral mixed self-attention module→second hyperspectral channel attention module→second spatial-spectral mixed self-attention module→third hyperspectral channel attention module→third spatial-spectral mixed self-attention module→fourth hyperspectral channel attention module→fourth spatial-spectral mixed self-attention module.

Further, the step3.2 specifically comprises the following steps:

step3.2.1, build hyperspectral channel attention module

Referring to fig. 2, the hyperspectral channel attention module is composed of a 1-layer laminated excitation layer and a 1-layer two-dimensional convolution layer, and the structure is fixed as follows: two-dimensional convolution layer→compression excitation layer; wherein the convolution kernel size of the two-dimensional convolution layer is set to 1×1; the compression feature map portion in the compression excitation layer is completed through global average pooling, and is expressed as follows:

wherein U is _c Each layer of channels representing an input feature map, Z _c Representing the global average value of each layer of channels, wherein W and H represent the length and width of an input feature map;

the excitation profile portion in the compressed excitation layer is represented as follows:

S＝F _ex (z,W)＝σ(g(z,W))＝σ(g(W ₂ δ(W ₁ z)))

where S represents a weight vector, delta represents a Relu activation function, sigma represents a sigmoid activation function,the weight matrix of two fully connected layers is represented, r represents the number of hidden layer nodes in the middle layer, and r is 16 in this embodiment.

The final output X is represented as the vector product of S and U, and its data can be represented as X ε R ^P×P×D Where P is the size of the output feature map and D is the convolution kernel number of the two-dimensional convolution layer.

Step3.2.2, build spatial-spectral hybrid self-attention module

With reference to figure 3 of the drawings,the spatial-spectral hybrid self-attention module consists of 1 spatial self-attention block and 1 spectral self-attention block, and the structure is fixed as follows: spectral self-attention block→spatial self-attention block; assume that the feature map sequence segmented by the hyperspectral channel attention module is X ε R ^P×C Wherein P represents the total number of pixels in the feature map and C represents the dimension of the input feature map, wherein:

the spatial self-attention block consists of 2 linear normalization layers, 1 spatial self-attention layer and 1 multi-layer perception layer; the spatial self-attention layer uniformly divides the input feature map X in a non-overlapping manner, and the number of divided blocks is assumed to be N _d Each block contains P _d The number of pixels is p=n, the number of pixels in the feature map is input _d *P _d . Multiplying the segmented feature images by 3 different learnable weight matrices W _q 、W _k 、W _v Three different vectors Q, K, V are obtained, respectively, and Q, K, V is then divided into h parts along the spectral dimension, h being the number of heads of the multi-head attention:

Q＝{Q ₁ ,Q ₂ ,…,Q _i ,…,Q _h }

K＝{K ₁ ,K ₂ ,…,K _i ,…,K _h }

V＝{V ₁ ,V ₂ ,…,V _i ,…,V _h }

using each Q _i For each K _i The transpose of the data is used for dot product operation to obtain Q _i And K is equal to _i Due to the degree of similarity of Q _i ×K _i The value of Q will be _i And K _i The increase in the initial dimension d increases, we need to divide by ∈d _k The problem of controlling the disappearance of the gradient. Then normalizing the softmax function to obtain an attention matrix, and finally combining the attention matrix with V _i Multiplication results in a single head output:

splicing the output results of the plurality of heads to obtain the output of the single cut feature block:

A(Q,K,V)＝Concat(X ₁ ,X ₂ ,…,X _i ,…,X _h )W ^o

where W0 is the output projection matrix. The global spatial self-attention layer can be expressed as:

the multi-layer perceptual layer comprises a linear projection layer implemented by 2 fully connected layers and 1 GELU activation function, which can be expressed as:

MLP(X)＝FC ₂ (GELU(FC ₁ (X)))

the spatial self-attention blocks are connected in series, and the specific structure is as follows: the method comprises the steps of a first linear normalization layer, a spatial self-attention layer, a second linear normalization layer and a multi-layer perception layer, wherein an output characteristic diagram of a spatial self-attention block is added with an output characteristic diagram of the spatial self-attention layer in a jumping connection mode in an element mode, the output characteristic diagram of the spatial self-attention layer is added with the output characteristic diagram of the multi-layer perception layer in a jumping connection mode in an element mode, and a mathematical formula of the process is expressed as follows:

wherein the method comprises the steps ofAnd Output represent the Output characteristics of the spatial self-attention layer and the multi-layer perceptual layer, respectively.

The spectrum self-focusing block consists of 2 linear normalization layers, 1 spectrum self-focusing layer and 1 multi-layer sensing layer, and the connection mode is the same as that of the space self-focusing block, and the specific structure is as follows: first linear normalization layer→spectral self-attention layer→second linear normalization layer→multilayer perception layer. The spectral self-attention block transposes the input feature map and then sends the transposed input feature map into the self-attention block, and the number of heads of the self-attention layer is set to be 1, and the spectral self-attention layer can be expressed as:

A _spectral (Q,K,V)＝A(Q _i ,K _i ,V _i )

wherein Q is _i 、K _i 、V _i ∈R ^C×P Q, K, V, which is a transposed feature map;

Further, in Step 4:

Further, in this embodiment, the learning rate of the network is set to 0.008, the batch size is 64, and 100 rounds of iteration are trained, so as to obtain the paranoid parameters and the weight file of the final network.

The hardware platform of the simulation experiment of this embodiment is: a CPU of model 12th Gen Intel (R) Core (TM) i9-12900KF and a NVIDIA GeForce RTX 3090GPU of 24GB memory; with the Ubuntu 20.04.5LTS operating system, the configured virtual environment includes: python3.9.16, pytorch1.13.1, cuda11.6, etc.

Further, the Step5 specifically comprises the following steps:

sending the hyperspectral test sample set into the hyperspectral classification network after training, and calculating three general evaluation indexes: the larger these three indices represent the better the classification effect, the overall classification accuracy (OA), the Average Accuracy (AA), and the Kappa coefficient (K).

To evaluate the effectiveness of the present method, two existing advanced methods SSFTT and GAHT were used to classify the ground object targets in two common hyperspectral datasets, indian pins and Salinas Valley, respectively.

The SSFTT method is as follows: the hyperspectral classification method proposed by Sun et al in Spectrum-spatial feature tokenization transformer for hyperspectral image classification, abbreviated as SSFTT;

the GAHT method refers to: the hyperspectral classification method proposed by Mei S et al in "Hyperspectral image classification using group-aware hierarchical transformer", abbreviated GAHT;

table 1 comparison of classification results for three networks under two data sets

Experiments on two main stream data sets show that compared with an advanced hyperspectral image classification method, the method is more powerful in performance and can more accurately predict the pixel sample types of hyperspectral images.

FIG. 4 (a) is a graph of classification results on the Indian pins dataset using the SSFTT method;

FIG. 4 (b) is a graph of classification results on the Indian pins dataset using the GHIT method;

FIG. 4 (c) is a graph of the classification result on the Indian pins dataset using the present method;

FIG. 5 (a) is a graph of classification results on the Salinas Valley dataset using the SSFTT method;

FIG. 5 (b) is a graph of the classification result on the Salinas Valley dataset using the GHIT method;

FIG. 5 (c) is a graph of the classification result on the Salinas Valley dataset using the present method;

as is clear from the figure, the method has the least misclassified pixels, and obtains good classification effect in noisy classes and boundary areas, especially on Indian pins data sets with relatively unsmooth pixel distribution, and still shows strong classification capability compared with other methods.

While the present invention has been described in detail with reference to the drawings, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims

1. The hyperspectral image classification method based on the space-spectrum hybrid self-attention mechanism is characterized by comprising the following steps of: the method comprises the following steps:

2. The method for classifying hyperspectral images based on a spatial-spectral mixed self-attention mechanism as recited in claim 1, wherein: the specific steps of Step2 are as follows:

3. The method for classifying hyperspectral images based on a spatial-spectral mixed self-attention mechanism as recited in claim 1, wherein: the specific steps of Step3 are as follows:

4. A method of classifying hyperspectral images based on a spatio-spectral mixed self-attention mechanism as claimed in claim 3, characterized in that: the specific steps of the step3.2 are as follows:

5. A method of classifying hyperspectral images based on a spatio-spectral mixed self-attention mechanism as claimed in claim 3, characterized in that: in step3.1, the hyperspectral local information extraction block includes 2 two-dimensional convolution blocks and 3 three-dimensional convolution blocks, and the convolution block structure is fixed as follows: convolution layer- & gt normalization layer- & gt activation function layer; the structure of the hyperspectral local information extraction block is as follows: first two-dimensional convolution block- & gt first three-dimensional convolution block- & gt second three-dimensional convolution block- & gt third three-dimensional convolution block- & gt second two-dimensional convolution block;

6. A method of classifying hyperspectral images based on a spatio-spectral mixed self-attention mechanism as claimed in claim 3, characterized in that: in step3.2, the main attention path is composed of 4 hyperspectral channel attention modules and 4 space-spectrum mixed self-attention modules, and the structure is fixed as follows: first hyperspectral channel attention module→first spatial-spectral mixed self-attention module→second hyperspectral channel attention module→second spatial-spectral mixed self-attention module→third hyperspectral channel attention module→third spatial-spectral mixed self-attention module→fourth hyperspectral channel attention module→fourth spatial-spectral mixed self-attention module;

7. The method for classifying hyperspectral images based on a spatial-spectral mixed self-attention mechanism as recited in claim 1, wherein: in Step 4:

where x represents the input, y represents the tag value, L represents the total loss, w _c The category weight for category C, C for total number of categories, N for batch size, x _n,c Representing an observation sample x _n Prediction probability belonging to class c, y _n,c Representing the tag vector elements.