CN115661635A

CN115661635A - Hyperspectral image reconstruction method based on Transformer fusion convolutional neural network

Info

Publication number: CN115661635A
Application number: CN202211161991.2A
Authority: CN
Inventors: 徐萌; 彭焱鑫; 贾森
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2022-09-22
Filing date: 2022-09-22
Publication date: 2023-01-31

Abstract

The invention discloses a hyperspectral image reconstruction method based on a Transformer fusion convolutional neural network, which comprises the following steps of: constructing a CNN-Transformer model fused by a Transformer and a Convolutional Neural Network (CNN), wherein the CNN-Transformer model comprises a first convolutional layer, a CNN-Transformer encoder, a second convolutional layer, a CNN-Transformer decoder and a third convolutional layer which are sequentially connected; inputting an RGB image to be acquired into the CNN-Transformer model, and acquiring a hyperspectral image corresponding to the RGB image based on the CNN-Transformer model. The invention also discloses hyperspectral image reconstruction equipment and a storage medium. The hyperspectral image reconstruction method based on the model has the advantages that the hyperspectral image reconstruction is carried out by the model, the applicability is wider, additional auxiliary information is not needed, meanwhile, the spatial information and the spectral information are effectively extracted, the efficiency of hyperspectral image reconstruction is improved, and a better spectrum hyper-resolution effect can be achieved through fewer floating point numbers and parameter numbers.

Description

Hyperspectral image reconstruction method based on Transformer fusion convolutional neural network

Technical Field

The invention relates to the technical field of hyperspectral image reconstruction, in particular to a method, equipment and a storage medium for hyperspectral image reconstruction based on a Transformer fused convolutional neural network.

Background

Hyperspectral Image (Hyperspectral Image) refers to a spectral Image with a spectral resolution in the order of magnitude of 10-2 λ. With the development of scientific technology, hyperspectral images are widely used in earth observation applications, such as resource exploration, precision agriculture and disaster monitoring. However, the hyperspectral image has two problems, one is: the hyperspectral image has low spatial resolution, and the hyperspectral image simultaneously comprises spatial information and spectral information, but the hyperspectral imaging spectrometer can only collect the spatial information and the spectral information of the hyperspectral image respectively, so that the hyperspectral image has high spatial resolution and also has high spectral resolution, and the hyperspectral image is imaged in a mode of replacing the hyperspectral resolution by sacrificing the spatial resolution; another aspect is: the hyperspectral image imaging equipment is expensive and has certain professional technical requirements on personnel using the hyperspectral equipment. The application of the hyperspectral images is limited to a great extent by the two problems, so that a low-cost and efficient method for acquiring the hyperspectral images is urgently needed.

In order to solve the problems of low spatial resolution and high acquisition cost of hyperspectral images, students are turning to research on how to reconstruct hyperspectral images from RGB image spectral images, and these methods can be summarized into two broad categories, including: a conventional method and a deep learning method. Representative conventional methods include: RSC (RGB to spectral Conversion) is a classic hyperspectral image reconstruction algorithm, and RSC performs spectral reconstruction by using a linear function relationship between an RGB image and a hyperspectral image, but the algorithm ignores a nonlinear relationship between the RGB image and the hyperspectral image, so that the reconstruction effect is poor. In recent years, a deep learning-based method shows great advantages in the fields of computer vision, image processing and the like, training of a deep learning network usually requires a large amount of data, and spectrum reconstruction by deep learning gradually becomes mainstream along with the continuous increase of hyperspectral data sets, wherein the representative deep learning method comprises the following steps: HSCNN (hyper spectral Reconstruction Neural network) that samples RGB images in spectral dimensions by simple interpolation, learns end-to-end mappings from a large number of up-sampled/true hyper-spectral image pairs, which are expressed as a deep Convolutional Neural network that takes the spectrally up-sampled image as input and outputs an enhanced hyper-spectral image.

However, the upsampling operation in HSCNN requires knowledge of an explicit spectral response function corresponding to the integration of the hyperspectral radiance with the RGB values, and therefore, HSCNN fails when the spectral response function is unknown or difficult to obtain. In other deep learning methods: a technical scheme based on generation of a countermeasure network is only suitable for a data set with a high score of five and has certain requirements on experimental data; a technical scheme based on a combination of a spectral response function and a convolutional neural network depends on the spectral response function and is invalid for a data set without the spectral response function; other technical schemes based on the algorithm of the convolutional neural network mainly extract the spatial information between the RGB image and the hyperspectral, and have limited extraction capability on spectral information.

Therefore, a technical scheme for reconstructing a hyperspectral image, which has wider applicability, does not need additional auxiliary information, and has efficient spatial information and spectral information extraction capability, needs to be provided.

Disclosure of Invention

The invention mainly aims to provide a hyperspectral image reconstruction method based on a Transformer fusion convolutional neural network, and aims to solve the technical problems that the existing hyperspectral image reconstruction technology is not wide in applicability, needs additional auxiliary information, and does not have efficient spatial information and spectral information extraction capability.

In order to achieve the purpose, the invention provides a hyperspectral image reconstruction method based on a Transformer fusion convolutional neural network, which comprises the following steps of:

constructing a CNN-Transformer model fusing a Transformer and a Convolutional Neural Network (CNN), wherein the CNN-Transformer model comprises a first convolutional layer, a CNN-Transformer encoder, a second convolutional layer, a CNN-Transformer decoder and a third convolutional layer which are sequentially connected;

inputting an RGB image to be collected into the CNN-Transformer model, and obtaining a hyperspectral image corresponding to the RGB image based on the CNN-Transformer model.

Preferably, the step of obtaining a hyperspectral image corresponding to the RGB image based on the CNN-fransformer model includes:

acquiring a first feature mapping corresponding to the RGB image based on the first convolution layer;

inputting the first feature mapping into the CNN-Transformer encoder to generate a first jump connection, and acquiring a second feature mapping corresponding to the RGB image based on the CNN-Transformer encoder, the second convolutional layer, the CNN-Transformer decoder and the third convolutional layer which are connected in sequence;

and performing element-by-element addition on the first feature map and the second feature map based on the first jump connection to obtain the hyperspectral image.

Preferably, the step of obtaining a second feature map corresponding to the RGB image based on the CNN-fransformer encoder, the second convolution layer, the CNN-fransformer decoder, and the third convolution layer which are sequentially connected includes:

acquiring a high-dimensional feature mapping corresponding to the RGB image based on the CNN-Transformer encoder and the second convolution layer which are sequentially connected, and inputting the high-dimensional feature mapping into the CNN-Transformer decoder;

and inputting the output result of the CNN-Transformer decoder into the third convolution layer, and acquiring the second feature mapping based on the third convolution layer.

Preferably, the step of obtaining the high-dimensional feature map corresponding to the RGB image based on the sequentially connected CNN-Transformer encoder and the second convolution layer includes:

acquiring deep space and spectral characteristics corresponding to the RGB image based on the CNN-Transformer encoder, and generating a second jump connection;

inputting the deep space and the spectral feature into the second convolution layer, and acquiring a third feature mapping corresponding to the RGB image based on the second convolution layer;

and performing cascade splicing on the deep space, the spectral feature and the third feature map based on the second jump connection to obtain the high-dimensional feature map.

Preferably, the CNN-Transformer Encoder includes n encoding modules Encoder, and the CNN-Transformer Decoder includes n decoding modules Decoder, where n is any integer greater than or equal to 1.

Preferably, each of the encoding modules Encoder includes a 3 × 3 convolutional layer and a convolutional spectral self-attention module connected in sequence.

Preferably, each of the decoding modules Decoder includes a spectral down-sampling layer, a convolution spectral self-attention module and a 3 × 3 convolution layer connected in sequence.

Preferably, the hyperspectral image reconstruction method further comprises:

each coding module generates corresponding jump connection when obtaining an output result;

and performing cascade splicing on the output result of the n-k coding module Encoder and the output result of the k-1 decoding module Decoder based on the jump connection corresponding to the n-k coding module Encoder to obtain the input value of the k decoding module Decoder, wherein k is any integer greater than or equal to 1 and less than or equal to n.

In addition, to achieve the above object, the present invention also provides a hyperspectral image reconstruction apparatus including: the hyperspectral image reconstruction method comprises a memory, a processor and a hyperspectral image reconstruction program stored on the memory and capable of running on the processor, wherein the hyperspectral image reconstruction program realizes the steps of the hyperspectral image reconstruction method when being executed by the processor.

In addition, to achieve the above object, the present invention also provides a computer readable storage medium, wherein the readable storage medium stores a hyperspectral image reconstruction program, and the hyperspectral image reconstruction program when executed by a processor implements the steps of the hyperspectral image reconstruction method as described above.

The hyperspectral image reconstruction method based on the Transformer fusion convolutional neural network comprises the steps of constructing a CNN-Transformer model with the Transformer and the convolutional neural network CNN fused, wherein the CNN-Transformer model comprises a first convolutional layer, a CNN-Transformer encoder, a second convolutional layer, a CNN-Transformer decoder and a third convolutional layer which are sequentially connected; inputting an RGB image to be acquired into the CNN-Transformer model, and acquiring a hyperspectral image corresponding to the RGB image based on the CNN-Transformer model. The hyperspectral image reconstruction method based on the model has the advantages that the hyperspectral image reconstruction method based on the hyperspectral image model is used for reconstructing hyperspectral images, so that the applicability is wider, additional auxiliary information is not needed, meanwhile, space information and spectral information are effectively extracted, the efficiency of hyperspectral image reconstruction is improved, and better spectrum hyper-resolution effect can be achieved through fewer floating point numbers and parameter numbers.

Drawings

FIG. 1 is a schematic structural diagram of a hyperspectral image reconstruction device in a hardware operating environment according to an embodiment of the invention;

FIG. 2 is a schematic flow chart of a hyperspectral image reconstruction method based on a Transformer fusion convolutional neural network according to a first embodiment of the invention;

FIG. 3 is a schematic diagram of a network structure of a CNN-transform model in each embodiment of the hyperspectral image reconstruction method based on a transform fusion convolutional neural network of the invention;

FIG. 4 is a schematic diagram of a network structure of a convolution spectrum self-attention module in an embodiment of a hyperspectral image reconstruction method based on a Transformer fusion convolution neural network of the present invention;

fig. 5 is a schematic diagram of a network structure of convolution spectrum self-attention when N =1 in an embodiment of a hyperspectral image reconstruction method based on a Transformer fused convolution neural network according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

As shown in fig. 1, fig. 1 is a schematic structural diagram of a hyperspectral image reconstruction device in a hardware operating environment according to an embodiment of the present invention.

The terminal of the embodiment of the invention can be a PC, and can also be a mobile terminal device with a display function, such as a smart phone, a tablet computer, a portable computer and the like.

As shown in fig. 1, the hyperspectral image reconstruction apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., a WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory such as a disk memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.

Optionally, the hyperspectral image reconstruction device may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Such as light sensors, motion sensors, and other sensors, will not be described in detail herein.

It will be appreciated by those skilled in the art that the terminal structure shown in FIG. 1 does not constitute a limitation of the hyperspectral image reconstruction apparatus and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a hyper-spectral image reconstruction program.

In the hyperspectral image reconstruction device shown in fig. 1, the network interface 1004 is mainly used for connecting with a background server and communicating data with the background server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to invoke a hyperspectral image reconstruction procedure stored in the memory 1005.

In this embodiment, the hyperspectral image reconstruction apparatus includes: the hyperspectral image reconstruction method comprises a memory 1005, a processor 1001 and a hyperspectral image reconstruction program stored on the memory 1005 and capable of running on the processor 1001, wherein when the processor 1001 calls the hyperspectral image reconstruction program stored in the memory 1005, the steps of the hyperspectral image reconstruction method in each of the following embodiments are executed.

The invention also provides a hyperspectral image reconstruction method based on a Transformer fusion convolutional neural network, and referring to fig. 2, fig. 2 is a flow diagram of a hyperspectral image reconstruction method according to a first embodiment of the invention.

In this embodiment, the method includes the steps of:

step S101, constructing a CNN-Transformer model fused by a Transformer and a Convolutional Neural Network (CNN), wherein the CNN-Transformer model comprises a first convolutional layer, a CNN-Transformer encoder, a second convolutional layer, a CNN-Transformer decoder and a third convolutional layer which are sequentially connected;

it should be noted that the Transformer is a deep learning model based on the self-attention mechanism, and mainly consists of two parts, namely Encoders (Encoders) and Decoders (Decoders), and the Transformer can improve the training speed of the model by using the attention mechanism, and has excellent precision and performance because of being suitable for parallelization calculation and the complexity of the model itself.

In this embodiment, in order to perform hyperspectral image reconstruction, first, a CNN-Transformer model in which a Transformer and a convolutional neural network CNN are fused needs to be constructed, where the CNN-Transformer model includes a first convolutional layer, a CNN-Transformer encoder, a second convolutional layer, a CNN-Transformer decoder, and a third convolutional layer, which are connected in sequence.

Specifically, in order to express a relationship between an RGB image and a hyperspectral image, it may be set

Representing the RGB image to be acquired

Representing a hyperspectral image reconstructed from the RGB image, wherein C is the channel number of the hyperspectral image, and W and H are the height and the width of the hyperspectral image respectively, so that the relationship between the RGB image to be acquired and the reconstructed hyperspectral image can be represented by the following formula:

AY＝X

wherein, the first and the second end of the pipe are connected with each other,

representing a transfer matrix from a hyperspectral image to an RGB image, and restoring the hyperspectral image from the RGB image is a serious ill-posed problem, especially when the number of bands to be restored is large, and therefore, it is necessary to construct a CNN-Transformer model that can solve the above problem for reconstructing the hyperspectral image, referring to fig. 3, the CNN-Transformer model mainly consists of a CNN-Transformer encoder and a CNN-Transformer decoder, and at the same time, there is a 3 × 3 convolutional layer before the CNN-Transformer encoder as a first convolutional layer; a 3 x 3 convolutional layer exists between the CNN-fransformer encoder and the CNN-fransformer decoder as a second convolutional layer; and a convolution layer of 3 multiplied by 3 exists behind the CNN-Transformer decoder to serve as a third convolution layer, and the RGB image to be acquired is input into the CNN-Transformer model subsequently, so that a reconstructed hyperspectral image can be obtained according to the CNN-Transformer model.

And S102, inputting an RGB image to be acquired into the CNN-Transformer model, and acquiring a hyperspectral image corresponding to the RGB image based on the CNN-Transformer model.

It should be noted that, when training a deep neural network, as the depth of the neural network increases, the performance of the model may decrease, which is called a degradation problem, and the reasons may include: a. overfitting; b. gradient extinction and/or gradient detonation. In order to solve the problem of gradient disappearance, a jump connection is generally adopted, and the jump connection is used for jumping certain layers in the neural network and taking the output of one layer as the input of the next layer.

In this embodiment, after an RGB image to be acquired is input into a CNN-Transformer model constructed, a hyperspectral image reconstructed from the RGB image can be finally obtained according to the CNN-Transformer model, for example, after the RGB image to be acquired is input into the CNN-Transformer model, before the RGB image enters a CNN-Transformer encoder, first, a first skip connection is generated for storing original low-frequency information of the RGB image by a 3 × 3 first convolutional layer, shallow layer feature information of space and spectrum is obtained, the shallow layer feature information is input into the CNN-Transformer encoder, when the shallow layer feature information enters the CNN-Transformer encoder, first, a first skip connection is generated for storing the original low-frequency information of the RGB image, meanwhile, deep layer feature information of space and spectrum is obtained through an output result of the CNN-Transformer encoder, a second skip connection is generated, the shallow layer feature information is input into a 3 × 3 second convolutional layer for feature extraction, a feature mapping is output according to the second convolutional layer, the deep layer is connected with the deep layer mapping information, a third skip connection is obtained, and a third skip decoder is connected with the third skip decoder for obtaining a hyperspectral image according to the output mapping, and a third skip decoder, and a third decoder is obtained after the third skip decoder.

In this embodiment, a CNN-Transformer model in which a Transformer and a convolutional neural network CNN are fused is constructed, where the CNN-Transformer model includes a first convolutional layer, a CNN-Transformer encoder, a second convolutional layer, a CNN-Transformer decoder, and a third convolutional layer, which are connected in sequence; inputting an RGB image to be collected into the CNN-Transformer model, and obtaining a hyperspectral image corresponding to the RGB image based on the CNN-Transformer model. The model is used for hyperspectral image reconstruction, so that the applicability is wider, additional auxiliary information is not needed, space information and spectral information are effectively extracted, the efficiency of hyperspectral image reconstruction is improved, and a better spectrum hyper-resolution effect can be achieved through fewer floating point numbers and parameter numbers.

Based on the first embodiment, a second embodiment of the hyperspectral image reconstruction method of the invention is provided, in this embodiment, in step S102, a method for obtaining a hyperspectral image corresponding to the RGB image based on the CNN-fransformer model includes:

step S201, acquiring a first feature map corresponding to the RGB image based on the first convolution layer;

step S202, inputting the first feature mapping into the CNN-Transformer encoder, generating a first jump connection, and acquiring a second feature mapping corresponding to the RGB image based on the CNN-Transformer encoder, the second convolutional layer, the CNN-Transformer decoder and the third convolutional layer which are connected in sequence;

step S203, performing element-by-element addition on the first feature map and the second feature map based on the first jump connection to obtain the hyperspectral image.

In this embodiment, referring to fig. 3, after an RGB image to be acquired is input into a CNN-Transformer model, a first feature map corresponding to the RGB image is obtained according to a first convolution layer, then the first feature map is input into a CNN-Transformer encoder, a first jump connection is generated, a second feature map corresponding to the RGB image is obtained according to the CNN-Transformer encoder, a second convolution layer, a CNN-Transformer decoder, and a third convolution layer which are sequentially connected, and finally, the first feature map and the second feature map are added element by element according to the first jump connection, so that a reconstructed hyperspectral image is obtained.

Specifically, the first convolution layer is a 3 × 3 convolution layer, which is used as an input map, extracts shallow layer feature information of space and spectrum from an RGB image to be acquired, and uses the extracted shallow layer feature information as a first feature map, and after the first feature map enters an encoder, a first jump connection is first generated to store original low-frequency information of the RGB image, and then the first jump connection, the second jump connection, the CNN-Transformer encoder, the second convolution layer, the CNN-Transformer decoder, and the third convolution layer are sequentially connected, so that a second feature map is output.

Optionally, in step S202, a method for obtaining a second feature map corresponding to the RGB image based on the CNN-Transformer encoder, the second convolutional layer, the CNN-Transformer decoder, and the third convolutional layer, which are connected in sequence, specifically includes:

step S301, acquiring a high-dimensional feature map corresponding to the RGB image based on the sequentially connected CNN-Transformer encoder and the second convolution layer, and inputting the high-dimensional feature map into the CNN-Transformer decoder;

step S302, inputting the output result of the CNN-Transformer decoder into the third convolutional layer, and acquiring the second feature map based on the third convolutional layer.

In this embodiment, referring to fig. 3, after a first feature map output by a first convolutional layer is input to a CNN-Transformer encoder and a skip connection is established, a high-dimensional feature map corresponding to an RGB image is output according to the CNN-Transformer encoder and a second convolutional layer which are connected in sequence, the high-dimensional feature map is input to a CNN-Transformer decoder, an output result of the CNN-Transformer decoder is input to a third convolutional layer, and a second feature map is output according to the third convolutional layer.

Specifically, the second convolutional layer and the third convolutional layer are also 3 × 3 convolutional layers, the second convolutional layer is used as a transition layer between the CNN-Transformer encoder and the CNN-Transformer decoder, and performs feature mapping on the output result of the CNN-Transformer encoder, where the CNN-Transformer encoder may generate a jump connection during output, so that the output result may be cascade-spliced with the feature mapping output by the second convolutional layer to obtain a high-dimensional feature map, and then inputs the high-dimensional feature map into the CNN-Transformer decoder, and the third convolutional layer is used to perform feature mapping on the output result of the CNN-Transformer decoder to obtain a decoded second feature map.

In this embodiment, a high-dimensional feature map corresponding to the RGB image is obtained based on the CNN-Transformer encoder and the second convolution layer which are connected in sequence, and the high-dimensional feature map is input to the CNN-Transformer decoder; and inputting the output result of the CNN-Transformer decoder into the third convolutional layer, and acquiring the second feature mapping based on the third convolutional layer. The CNN-Transformer model can effectively extract spatial information and spectral information, the efficiency of hyperspectral image reconstruction is improved, and a better spectrum hyper-resolution effect is achieved through fewer floating point numbers and parameter numbers.

Optionally, in step S301, a method for obtaining a high-dimensional feature map corresponding to the RGB image based on the sequentially connected CNN-Transformer encoder and the second convolution layer includes:

step S401, deep space and spectral characteristics corresponding to the RGB image are obtained based on the CNN-Transformer encoder, and a second jump connection is generated;

step S402, inputting the deep space and the spectral feature into the second convolution layer, and acquiring a third feature map corresponding to the RGB image based on the second convolution layer;

step S403, based on the second jump connection, performing cascade splicing on the deep space, the spectral feature and the third feature map to obtain the high-dimensional feature map.

In this embodiment, referring to fig. 3, a deep space and a spectral feature corresponding to an RGB image are obtained according to a CNN-Transformer encoder, and when the CNN-Transformer encoder outputs the deep space and the spectral feature, a second jump connection is generated, and then the deep space and the spectral feature are input into a second convolutional layer, a third feature map is output according to the second convolutional layer, and the deep space, the spectral feature and the third feature map are cascade-spliced through the second jump connection, so as to obtain a high-dimensional feature map, which is used as an input of a subsequent CNN-Transformer decoder.

Specifically, a CNN-Transformer encoder is used to extract deep spatial and spectral information of an RGB image, the output result is deep spatial and spectral characteristics corresponding to the RGB image, and a generated second jump connection is used to retain the deep spatial and spectral characteristics to prevent the deep spatial and spectral information from being lost.

In this embodiment, deep spatial and spectral features corresponding to the RGB image are obtained based on the CNN-Transformer encoder, and a second jump connection is generated; inputting the deep space and the spectral feature into the second convolution layer, and acquiring a third feature mapping corresponding to the RGB image based on the second convolution layer; and performing cascade splicing on the deep space, the spectral feature and the third feature map based on the second jump connection to obtain the high-dimensional feature map. And deep spatial and spectral information is reserved by utilizing jump connection, information loss is prevented, and high-dimensional feature mapping of a subsequent input CNN-Transformer decoder is obtained, so that the CNN-Transformer model can effectively extract the spatial information and the spectral information, and the efficiency of hyperspectral image reconstruction is improved.

In this embodiment, a first feature map corresponding to the RGB image is obtained based on the first convolution layer; inputting the first feature mapping into the CNN-Transformer encoder to generate a first jump connection, and acquiring a second feature mapping corresponding to the RGB image based on the CNN-Transformer encoder, the second convolutional layer, the CNN-Transformer decoder and the third convolutional layer which are connected in sequence; and performing element-by-element addition on the first feature map and the second feature map based on the first jump connection to obtain the hyperspectral image. The hyperspectral image reconstruction is carried out by combining the convolutional neural network and the transform, deep space and spectrum information are reserved by utilizing jump connection to prevent information loss, meanwhile, the applicability is wider, additional auxiliary information is not needed, the space information and the spectrum information can be effectively extracted, the efficiency of hyperspectral image reconstruction is improved, and a better spectrum hyper-resolution effect is achieved through less memories and parameters.

Based on the first embodiment, a third embodiment of the hyperspectral image reconstruction method according to the present invention is proposed, in this embodiment, the CNN-Transformer Encoder includes n encoding modules encoders, and the CNN-Transformer Decoder includes n decoding modules decoders, where n is any integer greater than or equal to 1.

In this embodiment, the number of channels of the CNN-Transformer model may be changed according to the number of channels of the input hyperspectral image, and therefore n is set to be an arbitrary integer greater than or equal to 1, the CNN-Transformer Encoder is composed of n encoders, the CNN-Transformer Decoder is composed of n decoders, n corresponds to the number of channels of the input hyperspectral image, where each Encoder represents one encoding module in the CNN-Transformer Encoder, and each Decoder represents one decoding module in the CNN-Transformer Decoder, and details will not be described later.

Specifically, the CNN-Transformer Encoder comprises n encoders, and the size of the output result passing through the 1 st encoding module is H × W × C ₁ The output size through the nth Encoder is H multiplied by W multiplied by C _n If k is an arbitrary integer of 1 or more and n or less, the size of the output result from the kth Encoder is H × W × C _k Similarly, the CNN-Transformer Decoder comprises n decoders, the number of decoders in the CNN-Transformer Encoder is corresponding to the number of decoders in the CNN-Transformer Encoder, the CNN-Transformer Decoder is used for decoding the spatial and spectral information of deep layers extracted by the Encoder in the CNN-Transformer Encoder, and the output result of the 1 st Decoder is H multiplied by W multiplied by C _n The output size through the n-th Decoder is H × W × C ₁ The output result size through the kth Decoder is H × W × C _n-k 。

Optionally, each of the encoding modules Encoder includes a 3 × 3 convolutional layer and a convolutional spectral self-attention module connected in sequence.

It should be noted that the convolution spectrum self-attention module shouldThe input tensor is expanded from H × W × C to HW/s × K by the Unfold function using CSSA (Convolution Spectral Self-Attention) ² C, changing the obtained feature mapping shape into HW/s multiplied by C multiplied by k ² Then, 3 × 3 packet convolution is used to calculate the mapping between Query (Q), key (K) and Value (V), so as to obtain: q = Conv ^Q (X)，K＝Conv ^K (X)，V＝Conv ^V (X) wherein, conv ^Q ，Conv ^K ，Conv ^V Represent the non-linear mapping weights of Q, K, V, respectively, and then divide Q, K, V into N heads in the spectral dimension such that Q = [ Q ] ₁ ，Q ₂ ，...，Q _N ]，K＝[K ₁ ，K ₂ ，...，K _N ]，V＝[V ₁ ，V ₂ ，...，V _N ]After division, the size of Q, K, V is NxHW/sxCxK/nxk ² Referring to fig. 5, fig. 5 is a schematic diagram of a network structure of convolution spectrum self-attention when N = 1. Then, in order to make the model more stable during training, the L2 norm of Q and K needs to be calculated, and the normalized Q and V are K in the space spectral dimension ² Computing covariance attention matrix in the dimension, i.e. for Q ^T And V, dot multiplication: (NxHW/sxC/nxk) ² )·(k ² xHW/sxC/NxN), generating an attention matrix of NxHW/sxC/Nxc C/N covariance and a learnable null spectral weight matrix

Point multiplication, obtaining an attention score matrix through a Softmax function, multiplying the obtained attention score matrix by V, and changing the output obtained by the multiplication into HW/s multiplied by k ² X C, finally, the output obtained by the fold function

And adding the position code to obtain the final output, wherein the position code can be obtained by utilizing convolutional neural network learning.

In this embodiment, each Encoder in the CNN-Transformer Encoder has the same composition, and includes a 3 × 3 convolutional layer and a convolutional spectrum self-attention module connected in sequence, and each Encoder is used to extract deep spatial and spectral features.

Specifically, the input of the Decoder is firstly subjected to feature mapping through a 3 × 3 convolutional Layer, and then is input into a convolutional spectrum self-Attention module, referring to fig. 4, fig. 4 is a network structure schematic diagram of the convolutional spectrum self-Attention module, in the convolutional spectrum self-Attention module, a jump connection is firstly added to prevent information loss, then an input value is normalized in a Channel direction through a Layer Norm Layer, then deep spatial and spectral information is extracted through a CSSA, the information is summed with the output of the jump connection, so that a null spectral feature is obtained and is regenerated into a jump connection, then the null spectral feature is further processed through the Layer Norm Layer, a GELU Layer and the 1 × 1 convolutional Layer, in order to learn the weight of each Channel, an ECA (Efficient Channel Attention), the ECA firstly generates the jump connection, then performs spatial pooling on the jump input value, then the obtained weight vector is input into a Sigmoid activation function, and a final Channel weight coefficient vector is obtained through one-dimensional convolution, so that the Channel weight vector is multiplied with the Channel weight vector to be output weight vector of the Channel. Finally, the outputs of the CELU layer and the 1 × 1 convolutional layer and the jump connection are summed to obtain the final high-dimensional feature, which is used as the deep spatial and spectral feature of the output.

In this embodiment, the Encoder comprises a 3 × 3 convolutional layer and a convolutional spectrum self-attention module connected in sequence through each of the encoding modules. The method has wider applicability and can effectively extract spatial information and spectral information without additional auxiliary information, thereby improving the efficiency of the final hyperspectral image reconstruction.

Optionally, each of the decoding modules Decoder includes a spectral down-sampling layer, a convolutional spectral self-attention module and a 3 × 3 convolutional layer connected in sequence.

In this embodiment, each Decoder in the CNN-Transformer Decoder has the same composition, and includes a spectral down-sampling layer, a convolutional spectral self-attention module, and a 3 × 3 convolutional layer, which are connected in sequence, and each Decoder is used to decode deep spatial and spectral features output by an Encoder in the CNN-Transformer Decoder.

Specifically, the convolutional spectrum self-attention module in Decoder is the same as that in Encoder, and the spectrum downsampling layer can be understood as a pooling layer for reducing the amount of calculation, preventing over-fitting, and increasing the receptive field, so that the subsequent convolutional spectrum self-attention module and the 3 × 3 convolutional layer can learn more global information.

In this embodiment, the Decoder comprises a spectral down-sampling layer, a convolution spectral self-attention module and a 3 × 3 convolution layer connected in sequence. The method has wider applicability and can effectively extract spatial information and spectral information without additional auxiliary information, thereby improving the efficiency of the final hyperspectral image reconstruction.

In this embodiment, the CNN-Transformer Encoder includes n encoding modules encoders, and the CNN-Transformer Decoder includes n decoding modules decoders, where n is any integer greater than or equal to 1. The number of channels of the CNN-Transformer model can be changed according to the number of channels of the input hyperspectral image, so that the model is wider in applicability, does not need additional auxiliary information, and can achieve a better spectrum hyper-resolution effect through less memory and parameter.

Based on the third embodiment, a fourth embodiment of the hyperspectral image reconstruction method of the invention is provided, in this embodiment, the hyperspectral image reconstruction method further includes:

step S801, generating corresponding jump connection when each coding module Encoder obtains an output result;

and S802, performing cascade splicing on the output result of the n-k coding module Encoder and the output result of the k-1 decoding module Decoder based on the jump connection corresponding to the n-k coding module Encoder to obtain the input value of the k decoding module Decoder, wherein k is any integer greater than or equal to 1 and less than or equal to n.

In this embodiment, n encoders exist in the CNN-Transformer Encoder, each Encoder generates a skip connection corresponding to the Encoder when outputting a result, and n decoders exist in the CNN-Transformer Decoder, where an input value of a kth Decoder is a cascade concatenation of an output result of the kth-1 Decoder and an output of the skip connection corresponding to the nth-k encoders, and k is any integer greater than or equal to 1 and less than or equal to n.

Specifically, the jump link corresponding to each encor is used to retain the spatial and spectral information of the deep layer extracted by the encor, referring to fig. 3, a network structure diagram of the CNN-Transformer model shown in fig. 3, and therefore,

the kth Encoder can be expressed as:

wherein, when k =1

Represents the input of the first Encoder, when k e (1, 2.. N),

represents the input value of the kth coding block,

represents the output value of the kth Encoder. CSSAB stands for convolution spectral self-attention module.

The kth Decoder may be expressed as:

wherein, when k =1

Represents the input of the first Decoder,

represents the output of the nth Encoder, i.e., the output of the jump connection. When k ∈ (1, 2.. N)

And

represents the input value and the output value of the kth Decoder,

and represents the output value of the n-k +1 th Encoder, namely the output of the jump connection.

In this embodiment, a corresponding skip connection is generated when an output result is obtained by each encoding module encor; and performing cascade splicing on the output result of the n-k coding module Encoder and the output result of the k-1 decoding module Decoder based on the jump connection corresponding to the n-k coding module Encoder to obtain the input value of the k decoding module Decoder, wherein k is any integer greater than or equal to 1 and less than or equal to n. The jump connection is utilized to retain information, so that the performance reduction of the model is avoided, the problem of gradient disappearance is solved, the spatial information and the spectral information can be effectively extracted, and the efficiency of the final hyperspectral image reconstruction is improved.

In addition, an embodiment of the present invention further provides a hyperspectral image reconstruction device, where the hyperspectral image reconstruction device includes: the hyperspectral image reconstruction method comprises a memory, a processor and a hyperspectral image reconstruction program stored on the memory and capable of running on the processor, wherein the hyperspectral image reconstruction program realizes the steps of the hyperspectral image reconstruction method when being executed by the processor.

Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where a hyperspectral image reconstruction program is stored, and when being executed by a processor, the hyperspectral image reconstruction program implements the steps of the hyperspectral image reconstruction method described above.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or system comprising the element.

The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages and disadvantages of the embodiments.

Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A hyperspectral image reconstruction method based on a Transformer fusion convolutional neural network is characterized by comprising the following steps of:

constructing a CNN-Transformer model fused by a Transformer and a Convolutional Neural Network (CNN), wherein the CNN-Transformer model comprises a first convolutional layer, a CNN-Transformer encoder, a second convolutional layer, a CNN-Transformer decoder and a third convolutional layer which are sequentially connected;

2. The hyperspectral image reconstruction method according to claim 1, wherein the step of obtaining the hyperspectral image corresponding to the RGB image based on the CNN-fransformer model comprises:

inputting the first feature mapping into the CNN-Transformer encoder to generate a first jump connection, and acquiring a second feature mapping corresponding to the RGB image based on the CNN-Transformer encoder, the second convolution layer, the CNN-Transformer decoder and the third convolution layer which are connected in sequence;

3. The hyperspectral image reconstruction method according to claim 2, wherein the step of obtaining the second feature maps corresponding to the RGB images based on the CNN-Transformer encoder, the second convolutional layer, the CNN-Transformer decoder and the third convolutional layer which are connected in sequence comprises:

and inputting the output result of the CNN-Transformer decoder into the third convolutional layer, and acquiring the second feature mapping based on the third convolutional layer.

4. The hyperspectral image reconstruction method according to claim 3, wherein the step of obtaining the corresponding high-dimensional feature map of the RGB image based on the CNN-Transformer encoder and the second convolutional layer which are connected in sequence comprises:

5. The hyperspectral image reconstruction method of claim 1, wherein the CNN-fransformer Encoder comprises n encoding modules Encoder and the CNN-fransformer Decoder comprises n decoding modules Decoder, wherein n is any integer greater than or equal to 1.

6. The hyperspectral image reconstruction method of claim 5, wherein each of the encoding modules Encoder comprises a 3 x 3 convolutional layer and a convolutional spectral self-attention module connected in sequence.

7. The hyperspectral image reconstruction method of claim 5, wherein each decoding module Decoder comprises a spectral down-sampling layer, a convolved spectral self-attention module and a 3 x 3 convolution layer connected in sequence.

8. The hyperspectral image reconstruction method according to any of claims 5 to 7, further comprising:

generating corresponding jump connection when each coding module Encoder obtains an output result;

9. A hyperspectral image reconstruction apparatus, characterized by comprising: a memory, a processor and a hyperspectral image reconstruction program stored on the memory and executable on the processor, the hyperspectral image reconstruction program when executed by the processor implementing the steps of the hyperspectral image reconstruction method according to any of claims 1 to 8.

10. A computer-readable storage medium, characterized in that the readable storage medium has stored thereon a hyperspectral image reconstruction program which, when executed by a processor, implements the steps of the hyperspectral image reconstruction method according to any of the claims 1 to 8.