CN114092327A

CN114092327A - Hyperspectral image super-resolution method by utilizing heterogeneous knowledge distillation

Info

Publication number: CN114092327A
Application number: CN202111288667.2A
Authority: CN
Inventors: 江俊君; 刘子仟; 马清; 刘贤明
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2021-11-02
Filing date: 2021-11-02
Publication date: 2022-02-25
Anticipated expiration: 2041-11-02
Also published as: CN114092327B

Abstract

The invention provides a hyperspectral image super-resolution method by utilizing heterogeneous knowledge distillation, which gives a low-resolution hyperspectral image input I^LR∈R^L×H×WPerforming shallow feature extraction, nonlinear mapping on distillation-oriented double-branch module DODB, and upsampling to finally output a high-resolution hyperspectral image I^SR∈R^L×sH×sW(ii) a The heterogeneous knowledge distillation is used for improving the model performance, the distillation acts between the 2D characteristics of the two models, the heterogeneous knowledge distillation problem is transferred to the fusion problem in the SHSR model, the transmitted information is taken as feedback information, the characteristics of each frequency band are respectively refined, and the characteristics are divided into a distillation part and a reserved part; obtaining better performance quantitatively and qualitatively, and reconstructing high spectrum with relatively high qualityAnd (4) an image.

Description

Hyperspectral image super-resolution method by utilizing heterogeneous knowledge distillation

Technical Field

The invention belongs to the technical field of image super-resolution, and particularly relates to a hyperspectral image super-resolution method by utilizing heterogeneous knowledge distillation.

Background

The hyperspectral imaging sensor receives light with different wavelengths reflected by an object to obtain a hyperspectral image of a multispectral band. Therefore, each pixel of the hyperspectral image contains continuous spectral bands varying from tens to thousands, unlike the grayscale image or the RGB image. The abundance of spectral information in hyperspectral images makes them extremely beneficial in many tasks of computer vision and remote sensing, such as image classification, anomaly detection and medical diagnostics. However, due to hardware limitations, the spatial resolution of hyperspectral images is relatively low and it is difficult to improve hardware systems. Therefore, super-resolution (SR), a post-processing technique, is widely used to reconstruct a high-spatial-resolution hyperspectral image from a low-resolution (LR) version. One class of classical hyperspectral super-resolution methods is fusion-based methods (FHSR), which require some high-resolution multispectral image (HR-MSI), such as an RGB or full-color (PAN) image, and fuse information from both sources. The main drawback of fusion-based methods is that it is difficult, and in some cases even impossible, to collect well-registered high-resolution multispectral images. Another approach is single hyperspectral image super resolution (SHSR), which only uses information from low resolution hyperspectral images. However, since there is no complementary spatial information, this model is highly dependent on an a priori human design, such as low rank and sparsity. With the coming of the deep learning era, a single hyperspectral image super-resolution model based on a convolutional neural network makes great progress, but the lack of spatial detail still limits the capability of the model. Furthermore, they do not take full advantage of expensive well-aligned hyper-spectral-multi-spectral pairs, such as common datasets.

Disclosure of Invention

In order to solve the problems, the invention provides a hyperspectral image super-resolution method by utilizing heterogeneous knowledge distillation, and a distillation-oriented double-branch network DODN is designed; and a new mixed 2D/3D convolution module, namely a distillation-oriented double-branch module (DODB) is provided, and the information of the high-resolution multispectral image HR-MSI is transmitted to a single hyperspectral image super-resolution model SHSR through knowledge distillation to improve the model performance.

The invention is realized by the following scheme:

a hyperspectral image super-resolution method by utilizing heterogeneous knowledge distillation comprises the following steps: the method specifically comprises the following steps:

the method specifically comprises the following steps:

the method comprises the following steps: given a low-resolution hyperspectral image input I^LR，I^LR∈R^L×H×WWherein L, H and W represent the number of spectral bands, height and width of the input image, respectively;

step two: shallow feature extraction is carried out on given image input, image information is respectively sent into a 2D processing branch and a 3D processing branch, the 3D processing branch is processed through 3D convolution, spatial spectrum information of a low-resolution hyperspectral input image is extracted, and shallow 3D features are obtained

The 2D processing branch is processed by 2D convolution to obtain shallow 2D characteristics

Step three: will be provided with

And

sent to a distillation oriented double branch module DODB, using a cascaded DODB: h_DODBGenerating a non-linear mapping; and discarding 2D features at the kth, i.e. last DODB module

Simultaneously obtaining 3D characteristics of Kth DODB module

Obtaining shallow 3D features

Adding;

step four: performing heterogeneous knowledge distillation and loss function calculation, performing distillation on half of the 2D signature, i.e. for each 2D signature of DODB, will

The first C/2 channels are used as output parts for distillation, and the rest parts are used as retention parts; finally, a high-resolution hyperspectral image I is output through an up-sampler^SR∈R^L ^×sH×sWAnd s is a scale factor.

Further, in the second step, the first step,

in the 3D processing branch, a low resolution hyperspectral image input I^LRDecompressed to 1 × L × H × W size, and then 3 × 3 × 3D convolved to obtain shallow 3D features

The expression of (a) is:

in the 2D processing branch, a low-resolution hyperspectral image is input I^LRUpsampling to LxsH × sW to adapt to the spatial resolution of the spectral super-resolution SSR model input, and then obtaining shallow 2D features by 3 × 3 2D convolution

Wherein s is a scale factor, and wherein,

the expression of (a) is:

further, in the third step,

for the Kth DODB module, there are:

using the transposed 3D convolution and the 1 x 1 3D convolution as upsamplers,

before passing through the upsampler and

adding to improve the robustness of the model;

the distillation-oriented double-branch module DODB consists of a 3D module, a 2D module and a feedback fusion module;

wherein the 3D module and the 2D module respectively extract low-resolution 3D features according to the structure of the residual block

And high resolution 2D features

Wherein C' and C represent the number of channels of the 3D feature and the 2D feature, respectively, and B represents the batch size;

further, in a feedback fusion module of the DODB, the 3D features are firstly up-sampled to the same size as the 2D features and fused, and then down-sampled to the original size of the 3D features;

after upsampling the 3D features, the 2D features and the 3D features are fused in a band-by-band manner: correcting 3D features according to spectral bands using 2D features as feedback information to obtain high resolution 3D features

Will be provided with

The separation into L spectral bands in the spectral dimension is:

wherein F_lThe size of (a) is b × C' × sH × sW; the 2D features are connected to each spectral band separately and fused features are generated using 2D convolution:

and for all bands, the 2D convolution is the same, the fused features are decompressed to b × 1 × C' × sH × sW, and then stacked together to obtain new 3D features

Size and

the same, namely:

in the downsampling process, cascaded 3 × 3 × 3 convolution pairs are used

Carrying out down-sampling;

the final output of the distillation oriented double branch module DODB is then:

further, in the fourth step,

the total loss function included the reconstitution loss and distillation loss, the L1 loss was chosen as the reconstitution loss,

wherein N represents the number of samples,

and

the ith spectral band of the reconstructed image and the real high-resolution image respectively;

for distillation loss, the L1 norm was used to measure the distance between features from the SHSR model and the features forming the SSR model;

wherein S represents the characteristic quantity used in the distillation,

and

is a feature of the j-th layer of the SHSR model and SSR model, and G_iIs a transformation, namely 1 × 1 convolution, which is used for ensuring that the number of the corresponding two characteristic channels is the same;

the total loss function is then:

L_total＝L_rec+λL_output (14)

where λ is a hyper-parameter for balancing the two parts, set to 0.05 in practical experiments.

The invention has the beneficial effects

The invention provides a new double-branch single-hyperspectral image super-resolution model and a new module for effectively combining 2D convolution and 3D convolution, wherein the model comprises the following components:

extracting spatial spectrum information of a low-resolution hyperspectral input image by the 3D branch through 3D convolution; the 2D branch is designed to be similar to a spectral super-resolution model and receives information transmitted from the spectral super-resolution model;

in each block, the 3D features are segmented in the spectral dimension and corrected band by 2D features in a feedback manner; applying distillation on half the channel of the 2D feature is beneficial to reduce negative migration, a technique known as semi-distillation;

the invention takes the first place to utilize the privilege information from the spectrum super-resolution task and designs a model for heterogeneous knowledge distillation, and the introduction of long residual connection makes the model more robust.

Drawings

FIG. 1 is a schematic diagram of a DODN in accordance with the present invention, the upper half being a DODN network and the lower half being an AWAN SSR model;

FIG. 2 is a DODB schematic of the present invention;

FIG. 3 is a graph of the reconstruction and absolute error of an image, an integer _ state 630nm band in a CAVE dataset;

fig. 4 is a graph of the reconstruction and absolute error for the 1 st band of the ARAD _0463 image in the ntie 2020 dataset.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments; all other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

Step three: will be provided with

And

Simultaneously obtaining 3D characteristics of Kth DODB module

Obtaining shallow 3D features

Adding;

In the second step, the first step is carried out,

The expression of (a) is:

Wherein s is a scale factor, and wherein,

the expression of (a) is:

in the third step, the first step is carried out,

for the Kth DODB module, there are:

using the transposed 3D convolution and the 1 x 1 3D convolution as upsamplers,

before passing through the upsampler and

adding to improve the robustness of the model;

the distillation-oriented double-branch module DODB consists of a 3D module, a 2D module and a feedback fusion module; the DODB is made partially similar to the 2D SSR model because of the addition of the 2D branch; using a feedback mechanism to fuse 2D and 3D features in a band-by-band manner, each band of HSI is only sensitive to a portion of the photographic subject due to the limited energy of light in a particular range, while the RGB image contains spatial information of the entire scene.

And high resolution 2D features

Wherein C' and C represent the number of channels of the 3D feature and the 2D feature, respectively, and B represents the batch size; by pseudo-3DThe convolution replaces the common 3D convolution to reduce the computational complexity;

in a feedback fusion module of the DODB, 3D features are firstly up-sampled to the same size as the 2D features and fused, and then down-sampled to the original size of the 3D features; the reason for fusion in the high resolution space is to fuse the 2D features and the 3D features in a band-by-band manner, preserving as much detail as possible in the 2D features. The 2D feature accepts information transferred from high resolution RGB images, so it contains all spatial details at a lower capacity. While each spectral band of the 3D features can be viewed as a limited view of the high resolution RGB image with richer spectral information.

After upsampling the 3D features, fusing the 2D features and the 3D features in a band-by-band manner; correcting 3D features according to spectral bands using 2D features as feedback information to obtain high resolution 3D features

Will be provided with

The separation into L spectral bands in the spectral dimension is:

and for all spectral bands, the 2D convolution is the same, the 2D features are regarded as feedback information to refine each spectral band of the 3D features, the fused features are decompressed to b × 1 × C' × sH × sW, and then stacked together to obtain new 3D features

Size and

the same, namely:

in the downsampling process, cascaded 3 × 3 × 3 convolution pairs are used

Carrying out down-sampling; d, convolution further extracts spectral spatial correlation;

in the fourth step, (the step is used for improving the performance of the model in the training process and is not applied in the reasoning process)

The total loss function includes the loss of reconstitution and the loss of distillation, following the mainstream of the field, the loss of L1 was chosen as the loss of reconstitution,

wherein N represents the number of samples,

and

wherein S represents the characteristic quantity used in the distillation,

and

the total loss function is then:

L_total＝L_rec+λL_output (14)

CAVE: the CAVE data set was collected by a cooled CCD camera at wavelengths ranging from 400nm to 710nm, separated into 31 bands. These 32 images are divided into five parts: genuine, fake, skin and hair, paintings, food and beverages, and objects, each image being 512 x 512 in size.

NTIRE 2020: for a long time, hyperspectral image processing lacks large-scale datasets. Recently, the NTIRE spectral super-resolution challenge provides a data set containing 510 hyperspectral images, which is one of the largest data sets to date. The size of each picture was 512 × 482, and the number of bands was also 31. Only clean track data is used and since the group route of the test set is not accessible, the comparison is made on the validation set, so there are 450 pictures for training and 10 images for testing.

For the CAVE dataset, 20 images were selected as the training set, and the remaining 12 images were used for testing. Each picture is cropped into blocks of 96 x 96 size, 48 pixels overlap between blocks, and the scale factor is set to 4; randomly select 10% of the blocks as the validation set. Then, a bicubic downsampling is used to generate the low resolution input. Cropping and downsampling operations are applied to both hyperspectral and RGB images simultaneously to obtain a well-registered HSI-RGB image pair. The data augmentation operation is performed using 90 °, 180 °, and 270 ° rotations, vertical and horizontal flips, and combinations thereof, on the training sample. For the NTIRE2020 dataset, the only difference is that the image blocks are randomly cropped, and the number of blocks per image is fixed at 24.

DODN contains 4 DODBs, setting C' ═ C ═ 64. The AWAN was chosen as the SSR model, with the number of modules and number of channels being 8 and 200, respectively. Since the number of modules is different for the AWAN model and the DODN, the outputs of 2, 4, 6 and 8 modules of the AWAN are used for distillation along with all the outputs of the four blocks of DODN. Setting beta by Adam optimization algorithm₁＝0.9，β₂The learning rate is initialized to 10 at 0.999^-4Gradually decrease to 10^-5. The batch size was 12, and the model of the present invention was trained for 200 cycles. SSR model and DODN are optimized alternately: in each small lot, one model with a smaller fixed error acts as a teacher model, while the other updates its parameters. In practical experiments, the convergence speed of the two models is found to have a difference, so that the phenomenon of insufficient training of the SHSR model occurs. Therefore, an official pre-trained AWAN is used and at a smaller learning rate (10)^-5) To perform fine tuning.

Six widely used measurement indices are used to assess the quality of the reconstructed image: peak signal-to-noise ratio (PSNR), Structural Similarity (SSIM), cross-Correlation Coefficient (CC), Spectral Angle Mapping (SAM), Root Mean Square Error (RMSE), and dimensionless error coefficient (ERGAS). And calculating the average value of all frequency bands of the PSNR and the SSIM as an index. SAM is a common index for measuring the spectral difference between two hyperspectral images, and CC and ERGAS are widely used for hyperspectral image fusion. The remaining three metrics are typically used to quantitatively measure image recovery quality. The limit values of the above indexes are + ∞, 1, 0 and 1, respectively;

compared with the prior six SHSR methods, the method comprises Bicubic, EDSR, MCNet, ERCSR, SFCSR and ASFS; and two common data sets CAVE and ntie 2020 are used to verify the validity of the proposed DODN.

CAVE dataset: table 1 shows the quantitative comparison of the advanced SHSR method on CAVE datasets for different scale factors. It is clear that the process of the invention is superior to other processes in all respects. In all of these algorithms, EDSR is a classical model of single natural image super-resolution with pure 2D convolution, MCNet and ERCSR are 2D/3D hybrid convolution neural networks, and SFCSR is a sequence model. ASFS uses adjacent bands to independently reconstruct the central band in turn. Experimental results show that 2D CNN (like EDSR) can restore spatial detail at a reasonably good level, but poor results of SAM show that it causes severe spectral distortion. The 2D/3D hybrid models score relatively low on SSIM and CC, indicating that they cannot generate sufficient spatial detail. Furthermore, the sequence model and the 2D/3D mixture model have similar values on the SAM. However, by introducing knowledge from the SSR model, the model of the invention achieves a significant improvement in PSNR (+0.3dB) compared to the sub-optimal algorithm (MCNet) and also considerably reduces SAM (-0.1 rad), i.e., knowledge distillation improves both spatial and spectral reconstruction. As shown in fig. 3, the method of the present invention has a low absolute error, especially in the area containing rich texture. In addition, the model of the invention well reconstructs the edges of the color blocks on the right side frame, which shows that the model of the invention has strong capability of extracting the spatial correlation.

Table 1: result comparison of SHSR method on CAVE dataset

NTIRE2020 dataset: experiments on the NITRE2020 dataset revealed the performance of the existing SHSR method on large scale data. The quantitative results are summarized in table 2. Surprisingly, the EDSR performed best in all existing models except the process of the present invention. This may be because the EDSR has a more general architecture to leverage the rich data, which also suggests that the reason to limit the performance of EDSR on small HSI datasets may not be a network prior, but rather a lack of data. In contrast, a single-band output model with adjacent bands as input, such as SFCSR and ASFS, converges prematurely with poor results. From table 2 it can be observed that the model of the invention outperforms the other methods in all indexes, which demonstrates the effectiveness of the proposed algorithm. In particular, the method of the present invention improves PSNR and SAM by +0.37dB and-0.13 rad, respectively, compared to a suboptimal method, which indicates that both spatial detail and frequency spectrum are enhanced. The reconstructed spectral bands and the corresponding absolute errors are visualized in fig. 4, and the resulting errors of the invention are smaller overall, especially in regions containing rich texture. For example, the leaf vein in the red rectangle recovers well, which other methods cannot.

Table 2: result comparison of SHSR method on NTIRE2020 dataset

The goal of heterogeneous distillation is to transmit spatial and spectral information from the SSR model so that the SHSR model can gain knowledge of the other perspective and utilize information from both tasks simultaneously. Since the SSR model is a 2D convolutional network, all spatial and spectral information is embedded in the 2D features, which leads to the problem of distillation with heterogeneous knowledge of the 3D features in the SHSR model. The solution of the present invention is to add a 2D branch in the model of the present invention to isolate the 3D SHSR and 2D SSR features and to shift the task of distilling the 3D and 2D features to the task of distilling between the 2D features. The two-dimensional branches of the model of the invention are designed to be similar to the SSR model, which reduces the gap between the two models, thereby reducing the difficulty of heterogeneous knowledge distillation. The two-dimensional branches of the model of the invention are designed to be similar to the SSR model, which reduces the gap between the two models, thereby reducing the difficulty of heterogeneous knowledge distillation. The information of the two views is combined by feedback fusion, where the 2D features refine the 3D features band by band. In this way, spatial detail from the HR RGB image is introduced and the information of one band is not contaminated by other bands. In table 3, the invention demonstrates the performance of the model with and without knowledge distillation on the CAVE data set. In models that do not use knowledge distillation, the present invention trains the model of the present invention using only the L1 loss and keeps all hyper-parameters the same as the original model. It can be observed that the knowledge distillation significantly improved the PSNR and caused little harm to SAM, confirming that the spatial details of HR-MSI are effectively transferred to the model of the present invention. One possible reason for the slight increase in SAM may be the limited ability of the 2D SSR signature to represent full spectrum information, resulting in negative migration in the knowledge distillation process.

Table 3: ablation analysis of knowledge distillation

The hyperspectral image super-resolution method by utilizing heterogeneous knowledge distillation, which is provided by the invention, is described in detail, the principle and the implementation mode of the invention are explained, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in specific embodiments and application ranges, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A hyperspectral image super-resolution method by utilizing heterogeneous knowledge distillation is characterized by comprising the following steps:

the method specifically comprises the following steps:

step two: shallow feature extraction is carried out on given image input, image information is respectively sent into a 2D processing branch and a 3D processing branch, the 3D processing branch is processed through 3D convolution, space spectrum information of a low-resolution hyperspectral input image is extracted, and shallow 3D features are obtained

The 2D processing branch is processed by 2D convolution to obtain shallow layer 2D characteristics

Step three: will be provided with

And

Simultaneously obtaining 3D characteristics of Kth DODB module

Obtaining shallow 3D features

Adding;

step four: performing heterogeneous knowledge distillation and loss function calculation, performing distillation on half the channel of the 2D features, i.e. for each 2D feature of DODB, will

The first C/2 channels are used as output parts for distillation, and the rest parts are used as retention parts; finally, a high-resolution hyperspectral image I is output through an up-sampler^SR∈R^L×sH×sWAnd s is a scale factor.

2. The method of claim 1, further comprising: in the second step, the first step is carried out,

The expression of (a) is:

in the 2D processing branch, a low-resolution hyperspectral image is input I^LRUpsampling to LxsH xsW to adapt to the spatial resolution of the spectral super-resolution SSR model input, and then obtaining shallow layer 2D features through 3 x 3 2D convolution

Wherein s is a scale factor, and wherein,

the expression of (a) is:

3. the method of claim 2, further comprising: in the third step, the first step is carried out,

for the Kth DODB module, there are:

using the transposed 3D convolution and the 1 x 1 3D convolution as upsamplers,

before passing through the upsampler and

adding to improve the robustness of the model;

4. the method of claim 3, further comprising:

And high resolution 2D features

Wherein C' and C represent the number of channels for the 3D feature and the 2D feature, respectively, and B represents the batch size;

5. the method of claim 4, further comprising:

in a feedback fusion module of the DODB, 3D features are firstly up-sampled to the same size as the 2D features and fused, and then down-sampled to the original size of the 3D features;

after upsampling the 3D features, the 2D features and the 3D features are fused in a band-by-band manner: correcting the 3D features according to spectral bands using the 2D features as feedback information to obtain high resolution 3D features

Will be provided with

The separation into L spectral bands in the spectral dimension is:

wherein F_lThe size of (a) is b × C' × sH × sW; the 2D features are connected to each spectral band separately, and fused features are generated using 2D convolution:

and is aligned withThe 2D convolution is the same for all bands, and the fused features are decompressed to b × 1 × C' × sH × sW, and then stacked together to obtain new 3D features

Size and

the same, namely:

in the downsampling process, cascaded 3 × 3 × 3 convolution pairs are used

Carrying out down-sampling;

6. the method of claim 5, further comprising: in the fourth step of the method, the first step of the method,

wherein N represents the number of samples,

and

wherein S represents the characteristic quantity used in the distillation,

and

is a feature of the j-th layer of the SHSR model and SSR model, and G_jIs a transformation, namely 1 × 1 convolution, which is used for ensuring that the number of the corresponding two characteristic channels is the same;

the total loss function is then:

L_total＝L_rec+λL_output (14）