CN111563520B

CN111563520B - Hyperspectral image classification method based on space-spectrum combined attention mechanism

Info

Publication number: CN111563520B
Application number: CN202010044989.1A
Authority: CN
Inventors: 尹继豪; 李磊; 刘雨晨; 黄浦; 王麒雄
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2020-01-16
Filing date: 2020-01-16
Publication date: 2023-01-13
Anticipated expiration: 2040-01-16
Also published as: CN111563520A

Abstract

The algorithm of the present invention is dedicated to solving the problem of insufficient performance of the traditional convolutional neural network for fine-grained image classification tasks represented by hyperspectral images, and proposes a hyperspectral image classification algorithm based on the space-spectral joint attention mechanism. The network can effectively capture the global features of the image, and adaptively focus on the spatial local features with large differences between similar images; at the same time, it can evaluate the contribution of different bands to the task, so that the neural network can pay more attention to the spectral bands with large contributions and extract the image spectrum. Local difference features improve the classification accuracy of hyperspectral images, and have a wide range of applications in the field of fine-grained image classification represented by hyperspectral images.

Description

A Hyperspectral Image Classification Method Based on Spatial-Spectral Joint Attention Mechanism

技术领域technical field

本发明是一种基于空间-光谱联合注意力机制的高光谱图像分类方法，通过将该方法提供的空间-光谱联合注意力机制模块嵌入卷积神经网络中，可有效地捕捉图像全局特征，并自适应地关注图像之间差异性较大的局部区域，大幅度提升卷积神经网络的特征提取和表达能力，实现对高光谱图像分类。该方法可用于遥感图像处理领域。The present invention is a hyperspectral image classification method based on the space-spectrum joint attention mechanism. By embedding the space-spectrum joint attention mechanism module provided by the method into the convolutional neural network, the global features of the image can be effectively captured, and Adaptively focus on local areas with large differences between images, greatly improve the feature extraction and expression capabilities of convolutional neural networks, and realize the classification of hyperspectral images. This method can be used in the field of remote sensing image processing.

背景技术Background technique

高光谱遥感技术是二十世纪末人类在机载观测系统和星载观测系统领域最重要的技术突破之一，高光谱图像克服了传统单波段、多光谱遥感在波段范围、波段数量以及精细地面目标观测、识别等方面局限性，在遥感对地观测领域有其独特优势。高光谱图像分类在实际中是一项重要且有意义的任务，具体来说，高光谱图像分类是根据不同光谱特征或空间特征对给定图像进行识别并标记图像中每个像素点种类的任务。Hyperspectral remote sensing technology is one of the most important technological breakthroughs in the field of airborne observation systems and spaceborne observation systems at the end of the 20th century. Hyperspectral images overcome traditional single-band and multi-spectral remote sensing in terms of band range, number of bands, and fine ground targets. It has its unique advantages in the field of remote sensing and earth observation due to limitations in observation and identification. Hyperspectral image classification is an important and meaningful task in practice. Specifically, hyperspectral image classification is the task of identifying a given image and labeling each pixel in the image according to different spectral features or spatial features. .

与普通图像分类任务相比，高光谱图像由于光谱域的“维数灾难”和“同谱异物”特点，使得分类任务变得更加困难。在这种情况下，传统单纯依靠光谱信息的高光谱图像分类算法性能有限，基于联合空间-光谱信息的分类算法一直是近年来的研究热点。Compared with ordinary image classification tasks, hyperspectral images make classification tasks more difficult due to the "curse of dimensionality" and "same spectrum foreign objects" in the spectral domain. In this case, traditional hyperspectral image classification algorithms relying solely on spectral information have limited performance, and classification algorithms based on joint spatial-spectral information have been a research hotspot in recent years.

自2012年以来，以卷积神经网络(Convolution Neural Network,CNN) 为代表的深度学习技术异军突起在计算机视觉任务中取得了巨大成就。卷积神经网络非常适用于处理图像空间域信息，并在普通图像分类任务中取得了巨大成功，卷积神经网络最早于2016年用于高光谱图像分类任务。随后，各种用于高光谱图像分类任务的卷积神经网络算法层出不穷，但这些算法由于卷积网络“感受野”大小有限难以提取到图像全局特征。更糟糕的是，由于高光谱图像数据的特殊性，在分类之前需要将高光谱数据进行预处理，即以每个像素为中心分割成立方体(一般大小取27*27)，并以中间像素标签为每个立方体分类标签，使得相近而异类的像素立方体在空间特征非常相似，通常称这种整体空间特征冗余，局部特征有微小差异的图像为细粒度图像。而传统卷积神经网络处理这种具有空间冗余特征的细粒度图像能力很弱，严重制约着卷积神经网络在高光谱等细腻度图像分类任务中性能进一步提高。Since 2012, the deep learning technology represented by Convolution Neural Network (CNN) has sprung up and made great achievements in computer vision tasks. Convolutional neural networks are very suitable for processing image spatial domain information and have achieved great success in common image classification tasks. Convolutional neural networks were first used in hyperspectral image classification tasks in 2016. Subsequently, various convolutional neural network algorithms for hyperspectral image classification tasks emerged in an endless stream, but these algorithms are difficult to extract global image features due to the limited size of the convolutional network "receptive field". To make matters worse, due to the particularity of hyperspectral image data, it is necessary to preprocess the hyperspectral data before classification, that is, divide each pixel into cubes (the general size is 27*27), and use the middle pixel label Classify labels for each cube, so that similar and heterogeneous pixel cubes are very similar in spatial characteristics, and images with redundant overall spatial characteristics and small differences in local characteristics are usually called fine-grained images. However, the ability of traditional convolutional neural networks to process such fine-grained images with spatial redundancy features is very weak, which seriously restricts the further improvement of the performance of convolutional neural networks in fine-grained image classification tasks such as hyperspectral.

另外，不同于普通图像，高光谱图像有着非常丰富的光谱信息，大部分传统分类算法认为不同光谱波段对算法任务贡献一样，但实际上由于光照、大气等物理因素影响，导致有些波段趋于噪声化，对当前任务基本没有贡献，甚至造成干扰。In addition, unlike ordinary images, hyperspectral images have very rich spectral information. Most traditional classification algorithms believe that different spectral bands contribute equally to algorithm tasks, but in fact, due to physical factors such as illumination and atmosphere, some bands tend to be noisy. It basically does not contribute to the current task, and even causes interference.

基于此，设计一种能够有效捕捉图像全局特征，自适应地聚焦相似细腻度图像之间差异性较大空间局域特征；同时评估不同波段对任务贡献，使得神经网络更多地关注贡献大的光谱波段，提取图像光谱局部差异特征，提升高光谱图像分类精度，是一个非常值得研究的问题。Based on this, we design a method that can effectively capture the global features of the image, and adaptively focus on the spatial local features with large differences between images of similar fineness; at the same time, evaluate the contribution of different bands to the task, so that the neural network can pay more attention to the large contribution. Spectral bands, extracting image spectral local difference features, and improving the classification accuracy of hyperspectral images are very worthy of research.

发明内容Contents of the invention

本算法致力于解决传统卷积神经网络对以高光谱图像为代表的细腻度图像分类性能不足问题，我们提出一种基于空间-光谱联合注意力机制的高光谱图像分类方法，配合卷积神经网络可以有效地捕捉图像全局特征，自适应地聚焦相似图像之间差异性较大空间局域特征；同时评估不同波段对任务贡献，使得神经网络更多地关注贡献大的光谱波段，提取图像局部差异特征，提升高光谱图像分类精度，在高光谱等细腻度图像分类领域具有广泛的应用。This algorithm is dedicated to solving the problem of insufficient classification performance of traditional convolutional neural networks for fine-grained images represented by hyperspectral images. We propose a hyperspectral image classification method based on a spatial-spectral joint attention mechanism, combined with convolutional neural networks. It can effectively capture the global features of the image, and adaptively focus on the spatial local features with large differences between similar images; at the same time, it can evaluate the contribution of different bands to the task, so that the neural network can pay more attention to the spectral bands with large contributions, and extract the local differences in the image Features, improve the accuracy of hyperspectral image classification, and have a wide range of applications in the field of hyperspectral and other fine-grained image classification.

本发明算法提出了空间-光谱联合注意力机制模块，具备以下三个优势：The algorithm of the present invention proposes a spatial-spectral joint attention mechanism module, which has the following three advantages:

(1)算法可移植性强，可以随意地嵌入现有各种卷积神经网络中。(1) The algorithm is highly portable and can be embedded in various existing convolutional neural networks at will.

(2)算法通用性好，根据任务需求，灵活选择注意力机制模块。例如面对没有光谱特征的普通细腻度图像分类任务时，灵活选用空间注意力机制模块即可。(2) The algorithm has good versatility, and the attention mechanism module can be flexibly selected according to the task requirements. For example, when facing ordinary fine-grained image classification tasks without spectral features, the spatial attention mechanism module can be flexibly selected.

(3)算法性能强大，可以有效提升卷积神经网络性能；(3) The performance of the algorithm is powerful, which can effectively improve the performance of the convolutional neural network;

附图说明Description of drawings

图1为空间-光谱联合注意力机制模块图；Figure 1 is a block diagram of the spatial-spectral joint attention mechanism;

图2为嵌入空间-光谱联合注意力机制模块卷积神经网络的三种结构框图；Fig. 2 is three structural block diagrams of embedded space-spectral joint attention mechanism module convolutional neural network;

图3为不同算法在高光谱数据集实验结果对比。注：实验中将空间-光谱联合注意力机制模块简称为Joint Spatial-Spectral Attention Module,简称JSSAM，将采用串联嵌入方式的卷积神经网络称为CNN-JSSAM-A，将采用并联嵌入方式的卷积神经网络称为CNN-JSSAM-B，将采用串并联嵌入方式的卷积神经网络称为CNN-JSSAM-C。以Indian Pine数据为实验中高光谱数据集，并取10％为训练集，以上各种CNN的网络参数和层数全部保持一致，区别在于是否嵌入空间-光谱联合注意力机制模块JSSAM。Figure 3 is a comparison of experimental results of different algorithms in hyperspectral datasets. Note: In the experiment, the spatial-spectral joint attention mechanism module is referred to as the Joint Spatial-Spectral Attention Module, referred to as JSSAM, and the convolutional neural network using the serial embedding method is called CNN-JSSAM-A, and the convolutional neural network using the parallel embedding method is called CNN-JSSAM-A. The product neural network is called CNN-JSSAM-B, and the convolutional neural network using serial-parallel embedding is called CNN-JSSAM-C. Taking the Indian Pine data as the hyperspectral data set in the experiment, and taking 10% as the training set, the network parameters and the number of layers of the above CNNs are all consistent, the difference is whether to embed the spatial-spectral joint attention mechanism module JSSAM.

具体实施方式detailed description

如附图1所示，空间-光谱联合注意力机制模块主要由三个子模块组成：空间注意力分数提取子模块、光谱注意力分数提取子模块和注意力分数分配子模块。其中，空间注意力分数提取子模块主要提取空间中任意两个像素间相似性特征，获取空间注意力分数图；光谱注意力分数提取子模块主要提取不同光谱波段中相关依赖性，获取光谱波段的注意力分数图；注意力分数分配分支将分别提取的空间注意力分数和光谱注意力分数分配到原来的特征空间中，获得包含不同空间域、不同波段注意力特征的注意力分数立方体。As shown in Figure 1, the spatial-spectral joint attention mechanism module is mainly composed of three submodules: the spatial attention score extraction submodule, the spectral attention score extraction submodule and the attention score distribution submodule. Among them, the spatial attention score extraction sub-module mainly extracts the similarity features between any two pixels in the space, and obtains the spatial attention score map; the spectral attention score extraction sub-module mainly extracts the correlation dependence in different spectral bands, and obtains the Attention score map; the attention score assignment branch assigns the extracted spatial attention scores and spectral attention scores to the original feature space, and obtains an attention score cube containing attention features of different spatial domains and different bands.

(1)空间注意力分数提取子模块(1) Spatial attention score extraction sub-module

将输入网络的高光谱立方体用下面X表示：The hyperspectral cube that will be fed into the network is denoted by X below:

其中，H是输入高光谱立方体的长度；where H is the length of the input hyperspectral cube;

W是输入高光谱立方体的宽度；W is the width of the input hyperspectral cube;

C是输入高光谱立方体的光谱维数；C is the spectral dimension of the input hyperspectral cube;

并且N＝H×W；And N=H×W;

步骤一：将根据公式(1)输入图像X分别映射到嵌入光谱特征空间中，以得到两个新特征图θ(X)和φ(X)；Step 1: Map the input image X according to formula (1) into the embedded spectral feature space to obtain two new feature maps θ(X) and φ(X);

其中，i和j是特征图中像素的编号；Among them, i and j are the numbers of pixels in the feature map;

和

是线性映射矩阵，它们都是神经网络中可以学习的参数；

and

is a linear mapping matrix, they are all parameters that can be learned in the neural network;

D是映射到嵌入光谱空间中新特征图θ(X)和φ(X)的光谱维数；D is the spectral dimensionality mapped to new feature maps θ(X) and φ(X) in the embedded spectral space;

步骤二：利用嵌入空间内的高斯函数计算任意两个像素相似性s_ij，获得空间注意力分数图S，具体计算过程图公式(2)和图1所示：Step 2: Use the Gaussian function in the embedded space to calculate the similarity s _ij of any two pixels, and obtain the spatial attention score map S. The specific calculation process is shown in formula (2) and Figure 1:

其中，s_ij表示第i个和第j个像素之间的相似性；where s _ij represents the similarity between the i-th and j-th pixels;

程序中，W_θ和W_φ是可学习的网络参数，采用1*1卷积层实现；公式(2) 中先对_θ(xi)进行转置得到θ(x_i)^T，再将θ(x_i)^T与φ(x_j)进行矩阵乘法运算，最后再利用神经网络softmax层进行归一化操作。In the program, W _θ and W _φ are learnable network parameters, implemented by 1*1 convolutional layer; in formula (2), first transpose _θ(xi) to get θ( _xi ) ^T , and then convert θ( x _i ) ^T and φ(x _j ) perform matrix multiplication, and finally use the softmax layer of the neural network for normalization.

(2)光谱注意力分数提取子模块(2) Spectral attention score extraction sub-module

将输入网络的高光谱立方体用下面X表示The hyperspectral cube that will be input to the network is represented by X below

步骤一：将根据公式(4)输入图像X分别映射到嵌入空间特征空间中，以得到两个新特征图υ(X)和ω(X)；Step 1: Map the input image X according to the formula (4) into the embedding space feature space to obtain two new feature maps υ(X) and ω(X);

其中，i和j是特征图对应光谱波段的编号；Among them, i and j are the numbers of the spectral bands corresponding to the feature map;

W_υ和W_ω是线性映射矩阵，它们都是神经网络中可以学习的参数；W _υ and W _ω are linear mapping matrices, which are parameters that can be learned in neural networks;

步骤二：利用嵌入空间内的高斯函数计算任意两个光谱波段对应特征图的相似性q_ij，获得空间注意力分数图Q，具体计算过程图公式(5)和图1 所示：Step 2: Use the Gaussian function embedded in the space to calculate the similarity q _ij of the feature maps corresponding to any two spectral bands, and obtain the spatial attention score map Q. The specific calculation process is shown in formula (5) and Figure 1:

其中，q_ij表示第i个和第j个光谱波段对应特征图之间的相似性；Among them, q _ij represents the similarity between the feature maps corresponding to the i-th and j-th spectral bands;

程序中，W_υ和W_ω是可学习的网络参数，采用3*3Depth-wise卷积层实现；公式(5)中先对υ(x_i)进行转置得到υ(x_i)^T，再将υ(x_i)^T与ω(x_i)进行矩阵乘法运算，最后再利用神经网络softmax层进行归一化操作。In the program, W _υ and W _ω are learnable network parameters, which are realized by 3*3Depth-wise convolutional layer; in formula (5), first transpose υ( _xi ) to get υ( _xi ) ^T , and then Perform matrix multiplication between υ( _xi ) ^T and ω( _xi ), and finally use the softmax layer of the neural network for normalization.

(3)注意力分数分配子模块(3) Attention score allocation sub-module

注意力分数分配子模块的主要作用是将注意力分数分配分支将分别提取的空间注意力分数和光谱注意力分数分配到原来的特征空间中，获得包含不同空间域、不同波段注意力特征的注意力分数立方体。The main function of the attention score allocation sub-module is to assign the attention score allocation branch to the extracted spatial attention score and spectral attention score into the original feature space, and obtain attention features that include different spatial domains and different bands. Force Fraction Cube.

输入图像X如下表示：The input image X is represented as follows:

步骤一：为了确保注意力机制模块可以根据任务需求，自适应地聚焦于特征图的局域空间和局部光谱波段，首先在特征空间内进行映射，得到一个全新特征图

如公式(7)；程序中，公式(7)是利用3*3 卷积层实现，其中W_ζ是3*3卷积核参数。Step 1: In order to ensure that the attention mechanism module can adaptively focus on the local space and local spectral bands of the feature map according to the task requirements, first map in the feature space to obtain a new feature map

Such as formula (7); in the program, formula (7) is implemented using a 3*3 convolutional layer, where W _ζ is a 3*3 convolution kernel parameter.

A＝S·ζ(X)·Q (8)A＝S·ζ(X)·Q (8)

步骤二：通过公式(8)将空间注意力分数S和光谱注意力分数Q分配到原来的特征空间中，获得注意力机制分数立方体A。Step 2: Distribute the spatial attention score S and the spectral attention score Q into the original feature space through formula (8), and obtain the attention mechanism score cube A.

另外，算法还设计了一套空间-光谱联合注意力机制模块与卷积神经网络嵌入方式，主要有以下三种嵌入方式：In addition, the algorithm also designs a set of spatial-spectral joint attention mechanism modules and convolutional neural network embedding methods. There are mainly three embedding methods as follows:

(1)串联嵌入方式(1) Embedded in series

(2)并联嵌入方式(2) Parallel embedded mode

(3)串并联嵌入方式(3) Embedded in series and parallel

详细见图2为嵌入空间-光谱联合注意力机制模块卷积神经网络的三种结构框图。See Figure 2 for details, which are three structural block diagrams of the convolutional neural network embedded in the spatial-spectral joint attention mechanism module.

Claims

1. A hyperspectral image classification method based on the spatial-spectral joint attention mechanism is mainly composed of two parts: the spatial-spectral joint attention mechanism module and the embedded convolutional neural network method:

1. The spatial-spectral joint attention mechanism module consists of three submodules, the spatial attention score extraction submodule, the spectral attention score extraction submodule and the attention score allocation submodule; among them, the spatial attention score extraction branch extracts the spatial The similarity feature between any two pixels obtains the spatial attention score map; the spectral attention score extraction branch extracts the correlation dependence in different spectral bands, and obtains the attention score map of the spectral band; then extracts the attention score distribution sub-module separately The obtained spatial attention score map and spectral attention score map are assigned to the original feature space pixel by pixel and spectrum, and the attention score cube containing different pixel points and different band attention features is obtained; the details are as follows:

(1) Spatial attention score extraction sub-module

Step 1: Map the input image X to the embedded spectral feature space to obtain two new feature maps θ(X) and φ(X);

Step 2: Use the Gaussian function in the embedded space to calculate the similarity s _ij of any two pixels, obtain the spatial attention score map S, and finally use the softmax layer of the neural network to perform normalization;

(2) Spectral attention score extraction sub-module

Step 3: Map the input image X to the embedding space feature space to obtain two new feature maps u(X) and ω(X);

Step 4: Use the Gaussian function embedded in the space to calculate the similarity q _ij of the feature maps corresponding to any two spectral bands, and obtain the spectral attention score map Q. In the experiment, a 3*3 layered (Depth-wise) convolutional layer is used Realize; finally use the neural network Softmax layer to perform normalization operation;

(3) Attention score allocation sub-module

The role of the attention score assignment sub-module is to assign the extracted spatial attention scores and spectral attention scores to the original feature space, and obtain attention score cubes containing different spatial domains and different band attention features;

Step 5: In order to ensure that the attention mechanism module can adaptively focus on the local space and local spectral bands of the feature map according to the task requirements, first map in the feature space, and use 3*3 convolution operation to obtain a new feature map

Step 6: Assign the spatial attention score S and the spectral attention score Q to the original feature space, and obtain the attention mechanism score cube A

2. There are three ways to embed the spatial-spectral joint attention mechanism module into the convolutional neural network:

(1) Embedded in series;

(2) Parallel embedded mode;

(3) Embedded in series and parallel.