CN108830796B

CN108830796B - Hyperspectral image super-resolution reconstruction method based on spectral-spatial combination and gradient domain loss

Info

Publication number: CN108830796B
Application number: CN201810639042.8A
Authority: CN
Inventors: 王敏全; 丁溢洋; 尚赵伟; 秦安勇; 赵林畅
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2018-06-20
Filing date: 2018-06-20
Publication date: 2021-02-02
Anticipated expiration: 2038-06-20
Also published as: CN108830796A

Abstract

The invention relates to a hyperspectral image super-resolution reconstruction method based on spectral-spatial combination and gradient domain loss, and belongs to the field of image super-resolution reconstruction. The method comprises the following steps: s1: obtaining a hyperspectral image; s2: dividing the hyperspectral images into a training set and a test set; s3: inputting the training set into a neural network with spectrum and space combination, and training by utilizing the joint loss of a space domain and a gradient domain; s4: and (5) passing the test set through a neural network to obtain a final reconstruction result. Compared with the prior art, the method adopted by the invention is lighter in network structure, has higher reconstruction quality and stronger anti-noise performance.

Description

Hyperspectral image super-resolution reconstruction method based on spectral-spatial combination and gradient domain loss

Technical Field

The invention belongs to the field of image super-resolution reconstruction, and relates to a hyperspectral image super-resolution reconstruction method based on a spectrum-space combination network and gradient domain loss.

Background

The spatial resolution of the hyperspectral image is low, and mixed end members are easily generated, so that the spectrum distortion is caused, and the spatial and spectral consistency of the end members is damaged. Distinguished from input information, hyperspectral image super-resolution reconstruction can be roughly divided into two types: a reconstruction method based on image fusion (with an auxiliary image) and a reconstruction method without an auxiliary image. The reconstruction method based on image fusion adopts an RGB image or a Panchromatic (PAN) image or a Multispectral (MS) image as an auxiliary, and uses spatial and spectral information to jointly constrain, so as to unmix end members and reduce spectral distortion. Using an RGB-assisted reconstruction method, the hyperspectral super-resolution reconstruction problem is considered as an optimization problem for solving sparse, non-negative constraints on LR images and corresponding RGB images. Akhtar et al uses a sparse coding strategy to reconstruct a hyperspectral image and fully excavate the space structure and the nonnegativity and sparsity of signals. Dong et al propose to explore spatial correlation of image Sparse Representation based on Non-Negative matrix Structured Sparse coding (NNSR), and reconstruct a HR image using a hyperspectral LR image and a corresponding RGB image. Based on the consistency of end members of a hyperspectral image and a PAN image in the same place, Zhao et al trains an overcomplete dictionary pair through the hyperspectral image and the corresponding PAN image, and uses non-local similarity to express the mapping relation between the HR image and the LR image. Because the PAN image is a gray image formed by mixing visible light wave bands, the spectral resolution is low, and spectral distortion is easily generated in the fusion process of the PAN image and the hyperspectral image. Compared with the PAN image, the MS image has lower spatial resolution but rich spectral information, has several to more than ten wave bands, and can obtain the color information of the ground features. Zhang et al propose a Bayesian fusion framework based on wavelets, and take a Gaussian mixture model as a prior constraint wavelet coefficient, fuse a hyperspectral LR image and a high-resolution MS image, and finally obtain a hyperspectral HR image. Wei et al obtain an overcomplete dictionary by training observation images, fuse hyperspectral images and MS images using an optimization framework with sparse regularization, and solve by alternately optimizing projection target images and coefficient coding. However, in practical application, it is difficult to obtain MS images with communicating spectral coverage in the same scene, and the difference in the wavelength range may cause the quality of the hyperspectral image reconstruction to be degraded. The method of reconstructing an unassisted image has received much attention because of its simple physical requirements.

The reconstruction without the auxiliary image only utilizes the hyperspectral LR image and combines the space and spectrum information of the hyperspectral image to constrain the reconstruction. Xing et al learn a dictionary using hyperspectral images and use the Beta-Bernoulli process to improve the self-consistency of the dictionary, but the computational complexity is high. Akgun et al considers a hyperspectral image into a plurality of Convex sets containing high-dimensional features, uses POCS (project Onto Convex set) to obtain a solution space of an HR image, and adds prior information constraint, so that the edge and detail information of the image can be well maintained, but the solution is too dependent on an initial value and has no uniqueness. Zhang et al propose a hyper-spectral image super-resolution reconstruction algorithm based on MAP (maximum A Posterior), add prior knowledge to guarantee the uniqueness of the solution when solving, divide the wave bands of the hyper-spectral image into three groups and use principal component analysis to reduce the calculated amount and remove redundant information. In recent years, deep learning is widely applied to hyperspectral image reconstruction. Yuan et al use the idea of transfer learning for reference and use the mapping relationship between the LR and HR images of the natural images to help reconstruct the hyperspectral HR images, but the method does not consider the information correlation between the spectrums of the hyperspectral images. The 3D-FCNN extracts spectral-spatial features by using three-dimensional convolution, and the hyperspectral images collected by the same detector can be reconstructed without secondary training. Li et al learn the difference between the spectral bands of the hyperspectral images using a deep convolutional network, initialize the HR image in combination with the difference and the LR image, and iteratively update the HR image until convergence by minimizing the simulation error between the HR image and the LR image obtained by back projection of the HR image using an IBP algorithm.

Generally speaking, the prior art has the following problems in solving the hyperspectral image:

1) the time complexity and reconstruction quality of the method cannot be considered at the same time;

2) the existing method can not capture the spectrum information of the hyperspectral image to realize spectrum-space combination, and the reconstruction quality is poor.

Disclosure of Invention

In view of the above, the present invention provides a hyperspectral image super-resolution reconstruction method based on a spectrum-space combination network and gradient domain loss, wherein all spectrum information of a hyperspectral image is used as input of a neural network, and the link between different spectra is fully utilized by using pseudo-3D convolution and residual learning to improve the reconstruction effect, and the detail quality of a reconstructed image is improved by using a loss function combining gradient domain loss and space domain loss. Compared with the prior art, the method adopted by the invention can obtain a clearer reconstruction effect under the same space, time and other conditions, and is superior to the prior art.

In order to achieve the purpose, the invention provides the following technical scheme:

a hyperspectral image super-resolution reconstruction method based on spectral-spatial combination and gradient domain loss comprises the following steps:

s1: obtaining a hyperspectral image;

s2: dividing the hyperspectral images into a training set and a test set;

s3: inputting the training set into a neural network with spectrum and space combination, and training by utilizing the joint loss of a space domain and a gradient domain;

s4: and (5) passing the test set through a neural network to obtain a final reconstruction result.

Further, in step S3, the spectrum-space combined neural network SSRNet is a convolutional neural network based on deep learning, converts the HR image reconstruction task into fitting of a residual between the HR image and the LR image, and extracts spectrum-space features using pseudo three-dimensional convolution, thereby improving spatial resolution and spectral resolution; the SSRnet uses a loss function combining a pixel domain Charbonnier loss function and a gradient domain loss to improve the quality of a reconstructed image, and adopts a multi-scale training mode to complete an image reconstruction task under various sampling factors.

Further, the SSRNet structure includes: the convolution layer comprises 14 convolution layers, and each convolution layer except the last convolution layer is combined with an activation function layer;

1) deep residual learning: A. jump connection is added between network input and network output, so that residual errors between high/low resolution images are learned by a network, the weight of network parameters is relatively sparse, and the convergence speed can be accelerated; B. a residual block is used in the network, so that the risk of gradient disappearance or explosion caused by the deepening of the network layer number is reduced;

2) introducing a Pseudo-three-dimensional Convolution (P3D) residual block;

the number of the residual blocks is modified according to different computing resources; increasing the number of residual blocks may improve the reconstruction quality, but may increase computational resource consumption;

for a 3x3x3 three-dimensional convolution kernel, P3D was replaced with a 3x1x1 one-dimensional spectral convolution kernel and a 1x3x3 two-dimensional spatial convolution kernel; compared with a two-dimensional convolutional neural network with the same depth, the pseudo three-dimensional convolutional neural network effectively reduces the number of parameters and the size of a model; meanwhile, when the residual block is designed, bottleneck layers with the size of 1x1x1 are respectively added at the head end and the tail end, so that the sizes of input and output characteristic graphs of a two-dimensional space domain convolution kernel and a one-dimensional spectrum convolution kernel are reduced, the calculation cost is reduced, and the network depth is convenient to increase;

3) a BN layer is removed from the pseudo three-dimensional convolution residual module, and the network performance is improved;

the main function of the BN layer is to prevent the gradient from disappearing or exploding, but the present invention removes the BN layer when designing the network structure, mainly for the following two reasons: firstly, a BN layer usually uses a larger batch size, but due to the large hyperspectral image data and high dimensionality, the calculation resource consumption is sharply increased when the larger batch size is adopted, so that the BN layer is not suitable for adopting the smaller batch size when the network structure is designed; the BN layer reduces the Range Flexibility of the network while standardizing the characteristics, namely the network can show or respond to the variable Range; therefore, for image super-resolution reconstruction, the removal of the BN layer does not reduce the image reconstruction quality, but can improve the network performance and reduce the GPU memory utilization rate.

4) The activation functions in the network all adopt Relu activation functions, so that the convergence speed is accelerated, and gradient explosion and gradient disappearance are prevented.

Further, in step S3, the loss function is an objective function optimized by the neural network and is used for evaluating a difference between the predicted value and the true value, and the excellent loss function can improve the network convergence speed and improve the quality of the predicted value, otherwise, the overall performance of the network is reduced;

for the super-resolution reconstruction method based on deep learning, the most common loss function is MSE based on the minimization between pixel pairs, and the minimization of MSE can intuitively promote the PSNR value. Although the MSE optimization is simple, the network returns the average value of a plurality of possible images, and the MSE value is increased sharply due to abnormal points, so that the reconstructed image usually lacks high-frequency content, and the texture information is too smooth and the visual effect is not natural. The method adopts a joint loss function combining Charbonier with gradient domain, which has better robustness and has texture retention.

1) Charbonnier loss function

Wherein the content of the first and second substances,

m is the number of samples,

indicates the prediction resultPixel value of i pixels, X⁽ⁱ⁾The pixel value of the ith pixel of the true value is represented, x represents the true value, and epsilon is that the difference value between each pixel point pair of the LR image and the HR image is larger than a certain threshold value, so that the interference of an abnormal value is reduced; in the invention, the parameter epsilon is 0.001;

2) gradient domain loss function

Firstly, the LR and HR images need to be transferred to the gradient domain by an image gradient algorithm, and the image gradient is calculated by the following formula:

horizontal direction:

d_x(i,j)＝I(i+1,j)-I(i,j)

vertical direction:

d_y(i,j)＝I(i,j+1)-I(i,j)

where I is the image pixel value and (I, j) is the coordinates of the pixel; calculating the Charbonnier loss for the obtained gradient domain feature map, and finally obtaining a joint loss function as follows:

wherein alpha is₁、α₂The weighting coefficients of the gradient losses in the horizontal direction and the vertical direction are respectively, and the selected value is 0.5 in the invention.

Represents the gradient value of the ith pixel of the predicted value in the x direction,

representing the gradient value in the y-direction of the ith pixel of the predicted value, d_x ⁽ⁱ⁾Representing the gradient value in the x-direction of the ith pixel of the true value, d_y ⁽ⁱ⁾Representing the gradient value of the ith pixel of the true value in the y direction.

The invention has the beneficial effects that: aiming at the problem that the prior art is difficult to take account of spectrum information, space domain information and poor reconstruction quality texture effect, the invention provides a hyperspectral image super-resolution reconstruction method based on spectrum-space combination and gradient domain loss. Compared with the prior art, the method adopted by the invention can obtain a clearer reconstruction effect under the same space, time and other conditions, and is superior to the prior art. The network structure is lighter, and has higher reconstruction quality and stronger anti-noise performance.

Drawings

In order to make the object, technical scheme and beneficial effect of the invention more clear, the invention provides the following drawings for explanation:

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a diagram of the SSRNet network architecture according to the present invention;

fig. 3 is a super-resolution reconstruction contrast diagram of hyperspectral images Flowers (sampling factor s is 2);

fig. 4 is a super-resolution reconstruction contrast diagram of hyperspectral images Flowers (sampling factor s is 3);

fig. 5 is a super-resolution reconstruction contrast diagram of hyperspectral images Flowers (sampling factor s is 4);

fig. 6 is a super-resolution reconstruction contrast map of the hyperspectral image Img1 (sampling factor s is 2);

fig. 7 is a super-resolution reconstruction contrast map of the hyperspectral image Img1 (sampling factor s is 3);

fig. 8 is a super-resolution reconstruction contrast map of the hyperspectral image Img1 (sampling factor s is 4);

reference numerals: (a) original image, (b) Bicubic, (c) SRCNN, (D) VDSR, (e) DRCN, (f) EDSR, (g)3D-FCNN, (h) The deployed SSRnet.

Detailed Description

Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

FIG. 1 is a flow chart of a hyperspectral image super-resolution reconstruction method based on spectral-spatial combination and gradient domain loss according to the invention; referring to fig. 1, a hyperspectral image super-resolution reconstruction method based on spectral-spatial combination and gradient domain loss comprises the following steps:

s1: obtaining a hyperspectral image;

s2: dividing the hyperspectral images into a training set and a test set;

Referring to fig. 2, the structure diagram of the neural network in steps S3 and S4, the method of the spectrum space combined neural network includes:

the purpose of the image super-resolution reconstruction task is to learn a mapping relationship from a low-resolution image to a high-resolution image, which can be expressed as the following formula:

y＝f(x)

where y denotes a high-resolution image, x denotes a low-resolution image, and f denotes a mapping relation to be learned. In deep learning, such a mapping relationship is often represented by a convolutional neural network having a plurality of convolutional layers.

The SSRNet is a convolutional neural network designed by the present invention, which includes 14 convolutional layers, and each convolutional layer except CONV _5 and CONV _6 has an activation function layer combined. The core ideas and construction details of the network design are as follows:

1) by using the advantages of deep residual error learning

a. Jump connection is added between network input and network output, so that residual errors between high/low resolution images are learned by a network, the weight of network parameters is relatively sparse, and the convergence speed can be accelerated; b. and a residual block is used in the network, so that the risk of gradient disappearance or explosion caused by the deepening of the network layer number is reduced.

2) CONV _1 is a one-dimensional spectrum convolution with a convolution kernel size of 1x1x7, the number of the convolution kernels is 64, and a Relu activation function is adopted and is mainly used for learning characteristics among spectra.

3) A Pseudo three-dimensional Convolution (P3D) residual block was introduced.

CONV _2 is a bottleneck layer, the size of a convolution kernel is 1x1x1, the number of the convolution kernels is 16, and a Relu activation function is adopted. The sizes of the input and output characteristic graphs of the two-dimensional space domain convolution kernel and the one-dimensional spectrum convolution kernel are reduced, so that the calculation cost is reduced, and the network depth is increased conveniently. CONV _3 and CONV _4 form the core of a pseudo three-dimensional convolution, replacing a 3x3x3 three-dimensional convolution kernel with a 3x1x1 one-dimensional spectral convolution kernel and a 1x3x3 two-dimensional spatial convolution kernel. CONV _3 and CONV _4 use the Relu activation function, and the number of convolution kernels is 16. The pseudo three-dimensional convolution can be used for learning the spectral space characteristics of the hyperspectral image, and compared with a two-dimensional convolution neural network with the same depth, the pseudo three-dimensional convolution neural network effectively reduces the number of parameters and the size of a model. CONV _5 is a bottleneck layer, the size of a convolution kernel is 1x1x1, the number of the convolution kernels is 32, and a Relu activation function is adopted.

4) The BN layer is removed when the pseudo three-dimensional convolution residual module is designed. The BN layer is often used behind the convolutional layer in the residual block to normalize the output of the convolutional layer. The main function of the BN layer is to prevent the gradient from disappearing or exploding, but we remove the BN layer when designing the network structure, mainly for the following two reasons: a. the BN layer usually uses a larger batch size, but due to the fact that hyperspectral image data are large and dimensionality is high, computing resource consumption can be increased sharply when the larger batch size is adopted, the BN layer is not suitable for the BN layer when a network structure is designed by adopting the smaller batch size; b. the BN layer, while standardizing features, also reduces the Range Flexibility of the network, i.e. the Range over which the network can behave or cope with changes. Therefore, for image super-resolution reconstruction, the removal of the BN layer does not reduce the image reconstruction quality, but can improve the network performance and reduce the GPU memory utilization rate.

5) The network repeatedly uses three pseudo three-dimensional convolution residual blocks designed as above, and the number of the residual blocks can be modified according to different computing resources. Increasing the number of residual blocks may improve the reconstruction quality but may increase computational resource consumption.

6) CONV _6 is the last layer of the network, is a convolution layer with convolution kernel size of 1x1x1 and convolution kernel number of 1, and is used for performing nonlinear mapping to obtain a reconstructed image.

Wherein the combined loss of spatial and gradient domains involved in step S3 may be expressed as follows:

the loss function is an objective function of neural network optimization, and aims to evaluate the difference between a predicted value and a true value, and the excellent loss function can improve the network convergence speed and the quality of the predicted value, otherwise, the overall performance of the network is reduced.

For the super-resolution reconstruction method based on deep learning, the most common loss function is MSE based on the minimization between pixel pairs, and the minimization of MSE can intuitively promote the PSNR value. Although the MSE optimization is simple, the network returns the average value of a plurality of possible images, and the MSE value is increased sharply when abnormal points exist, so that the reconstructed image usually lacks high-frequency content, and the texture information is too smooth and the visual effect is not natural. The method adopts a joint loss function combining Charbonier with gradient domain, which has better robustness and has texture retention.

1) Charbonnier loss function

Wherein the content of the first and second substances,

m is the number of samples,

pixel value, X, representing the ith pixel of the prediction⁽ⁱ⁾The pixel value of the ith pixel of the true value is represented, x represents the true value, and epsilon is that the difference value between each pixel point pair of the LR image and the HR image is larger than a certain threshold value, so that the interference of an abnormal value is reduced; in the invention, the parameter epsilon is 0.001;

2) gradient domain loss function

horizontal direction:

d_x(i,j)＝I(i+1,j)-I(i,j)

vertical direction:

d_y(i,j)＝I(i,j+1)-I(i,j)

Example (b):

1. experimental data

The CAVE dataset contains 32 hyperspectral data from different objects, each object having 31 photographs from different wavelength bands, each image being 512x512 in size, the wavelength bands ranging from 400nm to 700nm (each band being 10nm apart). The Harvard data set contains 50 real-world outdoor/indoor data under sunlight and 27 synthetic rays, and the size of each data block is 1392x1040x31, and the 31 wave bands are uniformly distributed between 420nm and 720 nm. For each Harvard sample, the experiment cut it to a size of 1024x1024x31 to facilitate calculations.

In order to compare the quality of the reconstructed image, the average value of the wave band range of 400nm-500nm (or 420nm-520nm) is used as the B channel of the color image, the average value of 500nm-600nm (or 520nm-620nm) is used as the G channel of the color image, and the average value of 600nm-700nm (or 620nm-720nm) is used as the R channel of the color image in the experiment.

The experimental test set extracted 17 data from different subjects, 7 in the CAVE data set (balloon, Chart and stuck toy, Faces, Flowers, Jelly beans, Oil painting, Real and fake applets), and 10 in the Harvard data set (Img1, Img2, Img3, Img4, Img5, Img6, Imga1, Imga2, Imga3, Imga 4). The data from the 92 objects are all left to be used as the training set, and the training set is randomly clipped using 32 × 32 × 31 patches, yielding approximately 50000 patches. In the experiment, the original image is used as a real high-resolution image, and Bicubic downsampling is performed to obtain a corresponding low-resolution image.

2. Introduction to comparative methods

In the experiment, SRCNN, VDSR, DRCN, EDSR and 3DFCNN are selected to be compared with the SSRNet provided by the invention on various performances, wherein SRCNN, VDSR, DRCN and EDSR are super-resolution reconstruction methods based on a single frame and proposed since 2016, and 3D-FCNN is a hyper-spectral image super-resolution reconstruction method based on deep learning and proposed in 2017.

3. Introduction of evaluation index

Image Quality Assessment (IQA) can be generally classified into subjective Assessment and objective Assessment. Subjective evaluation can intuitively reflect the visual perception effect of people, but the method has interference of some human factors, human eyes are difficult to judge the subtle differences among pictures, and evaluation differences obtained by different people are large, so that the evaluation on the Fidelity (Fidelity) of the pictures has deviation. The objective evaluation compares the information amount or the similarity degree between the original image and the reconstructed image through an algorithm, so that the effectiveness of the reconstruction algorithm is evaluated. The experiment combines subjective evaluation and objective evaluation to comprehensively measure the reconstruction quality of the hyperspectral image. Three objective evaluation indexes are introduced in the experiment: peak signal-to-noise ratio, average structural similarity, and spectral angle similarity.

The Peak Signal to Noise Ratio (PSNR) describes the variation of the Signal to Noise Ratio of the hyperspectral image, and the larger the index value of the PSNR, the closer the reconstructed image is to the original image, the unit of the PSNR is dB. The formula for the peak signal-to-noise ratio is as follows:

the hyperspectral image reconstruction method comprises the following steps that the number of spectral bands of a hyperspectral image is B, the number of pixels of each spectral band of the image is MXN, MAX represents the maximum pixel value in the hyperspectral image, and MSE is a hyperspectral reconstruction HR image

The mean square error with the original HR image X.

The Mean Structural Similarity (MSSIM) is used for measuring the Mean of Structural similarities of all spectral bands of the hyperspectral reconstructed HR image and the original HR image, and the larger the index value of the Mean Structural Similarity (MSSIM) is, the more similar the reconstructed HR image and the original HR image are, and the better the reconstruction quality is. Compared with the PSNR, which pays more attention to the mean square error (gray information) of the image, the MSSIM pays more attention to the structural information of the image. The mathematical expression for MSSIM is as follows:

wherein, the high spectrum reconstruction HR image and the original HR image under the ith spectrum band are respectively

And X_iReconstructing an HR image

And the original HR image X_iCorrespond to each otherThe value and variance are respectively

And

the covariance between the two is

Constant c₁And c₂The values of (a) are 0.0001 and 0.0009, respectively.

The Spectral Angle Similarity (SAM) judges the image similarity by measuring the included Angle between the spectra of the hyperspectral reconstructed HR image and the original HR image, and the smaller the index value is, the closer the two are, and the better the quality of the reconstructed image is. The mathematical expression for SAM is as follows:

wherein the content of the first and second substances,

x (i, j) are respectively the spectral vectors of pixel points with coordinates (i, j) on the hyperspectral reconstructed HR image and the original HR image,

representing the dot product between the two.

The higher the values of indexes such as PSNR, MSSIM and the like are, the better the hyperspectral image super-resolution reconstruction effect is, and the lower the SAM is, the better the spectrum information of the reconstructed image is restored.

4. Experiment of

In The experiment, 7 hyperspectral images of a CAVE data set are firstly subjected to a reconstruction experiment, and experimental comparison results of flowers under 2, 3 and 4 times of sampling factors are respectively shown in fig. 3, 4 and 5, wherein (a) an original image, (b) Bicubic, (c) SRCNN, (D) VDSR, (e) DRCN, (f) EDSR, (g)3D-FCNN and (h) The deployed SSRnet. The experiment box displays a certain local area of the reconstructed image and enlarges the local area to be placed at the upper left corner of the whole image so as to obtain more accurate visual judgment effect. As can be seen from the comparison, the recovery quality of the SRCNN image is only better than Bicubic. VDSR uses residual idea, however, serious blur still appears at the flower heart, petals, etc., and the information recovery of detail texture, etc. is poor. The DRCN and EDSR methods are good for reconstructing petal edges, but the phenomenon of detail blurring is still obvious, and the phenomenon is particularly obvious when the flower center area is excessively smooth. Compared with the method, the 3D-FCNN and the SSRnet provided by the experiment are obviously improved in visual effect, and images with clear textures and clear details can be obtained.

Next, The experiment reconstructs 10 hyperspectral images of The Harvard dataset, and The reconstructed images of Img1 at 2, 3, and 4-fold sampling factors are given in fig. 6, 7, and 8, respectively, wherein (a) original image, (b) Bicubic, (c) SRCNN, (D) VDSR, (e) DRCN, (f) EDSR, (g)3D-FCNN, (h) The deployed SSRNet. Similarly, to obtain better visual perception, the present experiment outlined some local area of the reconstructed image and placed it enlarged in the lower right corner of the whole image. It can be easily found that 3D-FCNN and SSRnet both recover image texture (e.g. brick) far more than other methods, however, at a sampling factor of 4, the 3D-FCNN still blurs the edge of the house tile.

Compared with single-frame super-resolution reconstruction methods such as SRCNN, VDSR, DRCN and EDSR, the 3D-FCNN and the SSRNet use three-dimensional convolution, not only can the spatial information of the image be recovered, but also the spectral information can be reconstructed, and therefore the visual effect is greatly improved. Compared with the five comparison methods, the method provided by the invention can obtain a better reconstruction effect, and seen from a local enlarged image, the image reconstructed by the method provided by the invention has sharp edges and rich detail information, can effectively inhibit edge artifacts, and sometimes even has a better reconstruction effect than the original image, because the SSRnet not only effectively utilizes the spectral space information of a hyperspectral image, but also converts an HR image reconstruction task into a residual error between a fitted HR image and an LR image, so that the network learning center of gravity can be placed on the depiction of high-frequency information such as detail textures.

Table 1 gives the results of the objective indices PSNR, MSSIM and SAM for different methods under two data sets and different sampling factors.

TABLE 1 reconstruction results CAVE and Harvard data sets for methods on CAVE and Harvard datasets at different sampling factors

As can be seen from table 1, PSNR and MSSIM of all methods are improved to some extent on the basis of Bicubic, but methods based on single frame reconstruction, such as SRCNN and VDSR, are sometimes inferior to Bicubic in spectrum evaluation index SAM. The DRCN uses a recursive convolutional neural network to reduce the overfitting phenomenon caused by the network being too deep, but the reconstruction effect is still not good. In the method based on single frame reconstruction, the EDSR samples the LR image without interpolation in advance but at the last stage of the network, and extra noise is not introduced, and the obtained PSNR and MSSIM are superior to algorithms such as SRCNN, VDSR, DRCN and the like. Compared with a single-frame reconstruction-based method, the indexes of 3D-FCNN and SSRNet such as SAM are greatly reduced, which also shows that the three-dimensional convolution can effectively utilize the inter-spectrum information to better reconstruct.

The SSRnet provided by the invention is obviously superior to other algorithms, namely 3D-FCNN. Under CAVE and Harvard data sets, compared with algorithms such as SRCNN, VDSR, DRCN, EDSR, 3D-FCNN and the like, PSNR, MSSIM and SAM of images reconstructed by the SSRnet are all optimal values. The SSRnet can well solve the problems of detail loss, edge blurring, unclear texture and the like of a hyperspectral LR image. Relevant experimental data show that the SSRNet can learn low-level information of a hyperspectral image, and effectively reconstruct details and textures of the image by using local features and a global structure of the low-level information, so that a better reconstruction effect is achieved on subjective visual evaluation and objective evaluation.

Finally, it is noted that the above-mentioned preferred embodiments illustrate rather than limit the invention, and that, although the invention has been described in detail with reference to the above-mentioned preferred embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the scope of the invention as defined by the appended claims.

Claims

1. A hyperspectral image super-resolution reconstruction method based on spectral-spatial combination and gradient domain loss is characterized by comprising the following steps:

s1: obtaining a hyperspectral image;

s2: dividing the hyperspectral images into a training set and a test set;

s3: inputting the training set into a neural network with spectrum and space combination, and training by utilizing the joint loss of a space domain and a gradient domain; the spectrum-space combined neural network SSRNet is a convolutional neural network based on deep learning, converts an HR image reconstruction task into fitting of residual errors between an HR image and an LR image, and extracts spectrum-space characteristics by using pseudo three-dimensional convolution, so that the improvement of spatial resolution and spectrum resolution is realized; the SSRNet uses a loss function combining a pixel domain Charbonier loss function and a gradient domain loss to improve the quality of a reconstructed image, and adopts a multi-scale training mode to complete an image reconstruction task under various sampling factors;

the SSRNet structure includes:

1) deep residual learning: jumping connection is added between network input and output, so that residual errors between high/low resolution images are learned by a network;

2) introducing a Pseudo-three-dimensional Convolution (P3D) residual block; wherein, CONV _2 is a bottleneck layer, the size of a convolution kernel is 1x1x1, and the number of the convolution kernels is 16; CONV _3 adopts a one-dimensional spectrum convolution kernel 3x1x1, and CONV _4 adopts a two-dimensional space domain convolution kernel 1x3x 3; CONV _5 is a bottleneck layer, and the size of a convolution kernel is 1x1x 1;

the number of the residual blocks is modified according to different computing resources;

4) the activation functions in the network all adopt Relu activation functions, so that the convergence speed is accelerated, and gradient explosion and gradient disappearance are prevented;

2. The hyperspectral image super-resolution reconstruction method based on spectral-spatial combination and gradient domain loss according to claim 1, wherein in step S3, the loss function is an objective function optimized by a neural network and used for evaluating the difference between a predicted value and a true value, and the excellent loss function can improve the network convergence speed and improve the quality of the predicted value, otherwise, the overall performance of the network is reduced;

1) charbonnier loss function

Wherein the content of the first and second substances,

m is the number of samples,

pixel value, X, representing the ith pixel of the prediction⁽ⁱ⁾The pixel value of the ith pixel of the true value is represented, and epsilon is that the difference value between each pixel point pair of the LR image and the HR image is larger than a certain threshold value, so that the interference of abnormal values is reduced;

2) gradient domain loss function

horizontal direction:

d_x(i,j)＝I(i+1,j)-I(i,j)

vertical direction:

d_y(i,j)＝I(i,j+1)-I(i,j)

wherein the content of the first and second substances,

representing the predicted value, X the true value, alpha₁、α₂Weighting coefficients for the horizontal and vertical gradient penalties respectively,