CN112699929B

CN112699929B - Deep network multi-source spectral image fusion method for multi-supervision recursive learning

Info

Publication number: CN112699929B
Application number: CN202011568917.3A
Authority: CN
Inventors: 肖亮; 陆育达; 刘鹏飞; 杨劲翔
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2022-11-01
Anticipated expiration: 2040-12-25
Also published as: CN112699929A

Abstract

The invention discloses a depth network multi-source spectral image fusion method for multi-supervised recursive learning, which comprises the following steps: adopting recursive learning to form recursive residual sub-networks, and adding the output and the input of each recursive residual sub-network to be used as the input of the next recursive residual sub-network; the network consists of a pre-super-resolution module and a fusion module, the pre-super-resolution module realizes the automatic learning of the up-sampling interpolation, and the pre-super-resolution image and the multispectral image are spliced to be input as the fusion module; adopting a stacking method of a plurality of recursive residual sub-networks to establish a pre-super-resolution module and a fusion module; adopting a multi-supervised learning mode, and forming intermediate fusion images of all levels by splicing and convoluting the characteristics of the low layer, the intermediate layer and the high layer; and taking the L1 norm and the spectrum angle as two measures of a loss function, establishing a joint loss function by the intermediate fusion images and the real images at all levels, and performing end-to-end network training. Simulation experiment results prove the effectiveness of the invention on multi-far spectral image fusion.

Description

Deep network multi-source spectral image fusion method for multi-supervised recursive learning

Technical Field

The invention relates to the field of hyperspectral image fusion, in particular to a depth network multisource spectral image fusion method for multi-supervised recursive learning.

Background

In recent years, deep learning has become a research focus in the field of artificial intelligence, and has gained wide attention in the theoretical and industrial fields, and has gained a great deal of applications in the fields of pattern recognition, computer vision, natural language processing, and the like. The deep learning model is generally a neural network with a multilayer structure, multi-level feature extraction is carried out on data through multiple nonlinear transformation of the multilayer neural network, low-level to high-level hierarchical features are automatically learned, and the abstraction degree of the features is enhanced along with the increase of the number of layers. Compared with the traditional shallow machine learning model, the deep learning model has more comprehensive characteristics and overcomes the dependence of artificial design characteristics on personal experience.

Deep learning is also widely used in the field of hyperspectral image fusion, and a large number of hyperspectral image fusion models based on convolutional neural networks are proposed, such as Masi G, cozzolino D, verdoliva L, et al. The PNN network is only of a three-layer structure, and the performance cannot be improved by simply increasing the number of layers, because the model is difficult to be effectively trained when the number of network layers is increased. Residual concatenation can solve the problem of difficult training when the network is too deep, such as gradient explosion and gradient disappearance. At present, researchers have proposed many network models based on residual connection, such as [ Yuan Q, wei Y, meng X, et al. A multiscale and multicast connectivity neural network for Remote Sensing image pan-sharing [ J ]. IEEE Journal of Selected Topics in Applied Earth updates and Remote Sensing,2018,11 (3): 978-989 ]. Most of the existing network models stack a low-resolution hyperspectral image and a corresponding auxiliary source image as network input, and realize image fusion in subsequent feature extraction and mapping. Recently, a dual-channel neural network model [ Shao Z, cai J. Removal sensing image fusion with deep connected neural network [ J ]. IEEE journal of selected targets in applied targets continuously and remotes sensing,2018,11 (5): 1656-1669 ] is proposed by scholars, the method extracts spectral features and spatial features from low-resolution hyperspectral images and auxiliary source images respectively, and then the extracted features are stacked to complete feature fusion through a convolutional layer, so that a good effect is achieved.

However, most of the models are shallow in depth, and the powerful feature extraction capability and nonlinear representation capability of the deep network structure cannot be fully utilized.

Disclosure of Invention

The invention aims to provide a depth network multi-source spectral image fusion method for multi-supervised recursive learning.

The technical solution for realizing the purpose of the invention is as follows: a depth network multi-source spectral image fusion method for multi-supervised recursive learning comprises the following steps:

step one, adopting recursive learning to form recursive residual sub-networks, and adding the output and the input of each recursive residual sub-network to be used as the input of the next recursive residual sub-network;

secondly, the whole network model is composed of a pre-super-resolution module and a fusion module, the pre-super-resolution module realizes automatic learning of up-sampling interpolation, and a pre-super-resolution image and a multispectral image are spliced to be input as the fusion module;

thirdly, establishing a pre-super-resolution module and a fusion module by adopting a plurality of recursive residual sub-network stacking methods;

fourthly, adopting a multi-supervised learning mode, and forming intermediate fusion images of all levels by splicing and convoluting the characteristics of the low level, the intermediate level and the high level;

and fifthly, taking the L1 norm and the spectrum angle as two measures of the loss function, establishing a combined loss function by the intermediate fusion images and the real images at all levels, and performing end-to-end network training.

Compared with the prior art, the invention has the remarkable advantages that: (1) The recursive learning is used for constructing the network, so that the problem of overlarge network parameter scale when the deep network is applied to the field of hyperspectral images is solved, and a relatively light deep learning network can be formed; (2) The image up-sampling automatic learning is realized through the pre-super-resolution module, the spatial details of the auxiliary source image can be better fused, and the spectral distortion caused by the traditional artificial interpolation (such as bicubic interpolation) is reduced; (3) Dense connection is used in the fusion stage, and low, medium and high-level feature information is effectively utilized to realize feature multiplexing; (4) The multi-supervision end-to-end training network is used, the problem that the low-level network cannot be effectively trained when the network depth is too large is solved, meanwhile, the middle fusion image of each level is connected with the next level, the features are extracted step by step for fidelity constraint, and the multi-scale spectral feature fidelity capability is effectively enhanced; (5) The method can be applied to fusion and resolution enhancement of multispectral and hyperspectral images, can also be applied to fusion and enhancement of panchromatic and hyperspectral images, and has wide application value in multisource remote sensing fusion, ground feature classification and identification and high-resolution environment monitoring.

The present invention is described in further detail below with reference to the attached drawing figures.

Drawings

FIG. 1 is a block diagram of the process of the present invention.

Fig. 2 is a network structure diagram of a simulation experiment.

FIG. 3 is a graph of the test results of the present invention on a Cave dataset.

FIG. 4 is a graph of the results of the testing of the present invention on a Harvard data set.

Detailed Description

The invention provides a deep network multi-source spectral image fusion method for multi-supervision recursive learning. The method repeatedly uses one residual block to form a recursive residual sub-network, avoids the difficulty in training caused by introducing excessive parameters, and further reduces the performance. Meanwhile, the method realizes automatic learning of image up-sampling through the pre-super-resolution module, can better fuse the space details of the auxiliary source image and reduce the spectral distortion caused by the traditional artificial interpolation (such as bicubic interpolation). In addition, the method trains the network in a multi-level supervision mode, and dense connection is adopted in the fusion stage, so that the low-level and middle-level features can be effectively trained, and meanwhile, the final fusion image can be formed together with the high-level features. The method is an end-to-end multi-supervision neural network model, the input and output forms are simple in structure, pre-processing and post-processing procedures are not needed, and simulation experiments on Cave and Harvard data sets show that the model is high in robustness and can be widely applied to the engineering field. The following detailed description of the implementation of the present invention, with reference to fig. 1, includes the following steps:

in the first step, recursive learning is adopted, namely, the multi-layer networks share one residual block to form recursive residual sub-networks, and the output and the input of each recursive residual sub-network are added to be used as the input of the next recursive residual sub-network. Recording the input hyperspectral image as X ∈ R^h×w×CH, w, C represent the height, width and number of channels of X, respectively. The multispectral image is Y ∈ R^H×W×cH, W, c respectively represent height, width and channel number of Y. The ith residual block input is Res_iOutput is as

I is more than or equal to 1 and less than or equal to n, then:

wherein m and n respectively represent residual block number and total residual block number of the super-resolution module, sigma represents activation function and operator

Which represents the operation of a convolution with the original,

represents the convolution kernel parameters, k is the kernel size, u represents the number of input and output channels, b_i，1∈R^1×u、b_i，2∈R^1×uRepresenting an offset term, F_i(. Cndot.) denotes the ith residual block. Let the ith recursive subnetwork be G_i(. To) with input as Reci and output as

Then there are:

secondly, the network is composed of a pre-super-resolution module and a fusion module, the pre-super-resolution module realizes the automatic learning of the up-sampling interpolation, and the traditional manual interpolation is reducedSpectral distortion caused by values (such as bicubic interpolation), and the pre-super-resolution image and the multispectral image are spliced to be input as a fusion module. The pre-super-resolution module is P (-), the fusion module is Q (-), and the pre-super-resolution image is Z_pre∈R^H×W×cWith the fusion module input being Z_in∈R^H×W×(C+c)Then, there are:

Z_pre＝P(X)

Z_in＝[Z_pre,Y]

wherein [ \8230 ] represents the stitching operation, and X and Y represent the hyperspectral and multispectral images, respectively.

And thirdly, establishing a pre-super-resolution module and a fusion module by adopting a plurality of recursive residual sub-network stacking methods. The output characteristic of the ith recursive sub-network of the pre-super-resolution module is recorded as Fe_i(x)∈Rh^×w×uI is more than or equal to 1 and less than or equal to m, and the output characteristic of the jth recursion sub-network of the fusion module is Fe_j(y)∈R^H×W×u，m+1≤j≤n，x＝X,y＝Z_inRepresenting module inputs, then:

Q(y)＝R_n-m(Fe_n(y),U)

wherein, G₁,…,G_m,G_m+1,…,G_nFor recursive sub-networks, operators

Representing a convolution operation, sigma is an activation function, u is the number of recursive sub-network output channels,

as convolution kernel parameters, b_l∈R^1×C、b₁∈R^1×u、b₂∈R^1×C、b₃∈R^1×uIs a bias term, R_l(x, U) denotes a one-dimensional convolution operation.

And fourthly, forming intermediate fusion images of all levels by splicing and convoluting the characteristics of all levels of the low-level, the intermediate-level and the high-level in a multi-supervised learning mode. The output characteristic of the ith recursion sub-network in the fusion stage is recorded as Fe_i∈R^H×W×u(m +1 is not less than i and not more than n), and the corresponding output intermediate fusion image is

Then there are:

wherein, [ \8230]Indicating a splicing operation, Z_preRepresenting a pre-super-resolution image.

And fifthly, taking the L1 norm and the spectrum angle as two measures of the loss function, namely simultaneously considering the spatial information and the spectrum information, establishing a joint loss function by fusing images in the middle of each level and training the high-resolution hyperspectral images, and performing end-to-end network training. Recording the network output image as Z_pred∈R^H×W×CCorresponding to a real image of Z_true∈R^H×W×CThe Loss function is denoted as Loss (Z)_pred,Z_true) Then, there are:

Loss(Z_pred,Z_true)＝L1Loss(Z_pred,Z_true)+SamLoss(Z_pred,Z_true)

wherein, L1Loss and SamLoss respectively represent L1Loss function and spectral angle Loss function,

representing the spectral vectors of the predicted and real images at positions (i, j), respectively.

If the total Loss function is Loss _ total, then:

wherein alpha epsilon (0, 1) is an equilibrium parameter.

The network has the characteristics of few parameters and deep layer number, can learn deep features step by step, realizes combined fidelity constraint of learning features at all levels, and effectively enhances the fidelity capability of multi-scale spectral features. The method can be effectively applied to the fusion and resolution enhancement of high-resolution multispectral and low-resolution hyperspectral images, can also be applied to the fusion and enhancement of panchromatic and hyperspectral images, and has wide application value in multisource remote sensing fusion, ground feature classification identification and high-resolution environmental monitoring.

The effect of the present invention can be further illustrated by the following simulation experiments:

simulation conditions

The simulation experiment adopts two groups of hyperspectral data sets, namely a Cave data set and a Harvard data set. The Cave data set comprises 32 indoor hyperspectral images, each image comprises 31 wave bands, the wavelength range is from 400nm to 700nm, and the image resolution is 512 x 512. The Harvard data set contains 50 images of the interior and exterior of a room under daylight conditions, with 31 wavelength bands, a wavelength range of 420nm to 720nm, and an image resolution of 1392 × 1040. For the Cave data set, the first 20 images were selected as the training set, and the last 12 images were selected as the test set. For the Harvard dataset, the first 30 images were selected as the training set and the last 20 images were selected as the test set. Since there are no real images on the dataset, the Wald protocol (r.carra, l.santurri, b.aiazzi, and s.baronti, "Full-scale assessment of print third through multimedia adaptation of multiscale measures," IEEE Transactions on geometry and Remote Sensing vol, 53, no.12, pp.6344-6355, 2015.) is used to generate training data, i.e. for each image, 5 × 5 gaussian kernels with mean 0 and standard deviation 2 are used to filter and then 8 times down-sampled to generate a low spatial resolution hyperspectral image. And generating a high-spatial-resolution multispectral image according to the IKONOS camera spectral response, wherein the original image is used as a real image. The training image blocks are 64 × 64 in size, and the image cropping interval is 16. The simulation experiments are all completed by adopting Python3.6+ Pythroch under a Windows10 operating system, and the network architecture used in the experiments is shown in FIG. 2.

Analysis of simulation experiment results

Fig. 3 and 4 are simulation experimental results of the method of the present invention on Cave and Harvard data sets, respectively, with a wave band =20 and a ten-fold error magnification. The (a) represents a real image, and the (b), (c), (d) and (e) respectively represent RSIFNN, PNN and MSDCNN and a result error graph of the method, so that the result is more visual, and the error is amplified by 10 times. Intuitively, the method has smaller error of the image fusion result. To further quantify the model performance, PSNR (peak signal-to-noise ratio), SAM (spectral angle), SSIM (structural similarity), ERGAS (relative error of overall dimension), RMSE (root mean square error) were used as image evaluation indices, as shown in tables 1 and 2.

TABLE 1 cave data set evaluation index results

Table 2 harvard data set evaluation index results

The result shows that each index of the method greatly exceeds other three classical models, and the effectiveness of the method is shown.

Claims

1. A depth network multi-source spectral image fusion method for multi-supervision recursive learning is characterized by comprising the following steps:

2. The method for multi-source spectral image fusion in deep network based on multi-supervised recursive learning as claimed in claim 1, wherein in the first step, recursive learning is adopted, i.e. the multi-layer network shares one residual block to form a recursive residual sub-networkThe output and input of each recursive residual subnetwork are added as the input of the next recursive residual subnetwork; recording the input hyperspectral image as X ∈ R^h×w×CH, w and C respectively represent the height, width and channel number of X; the multispectral image is Y ∈ R^H×W×cH, W and c respectively represent the height, width and channel number of Y; the ith residual block input is Res_iOutput is as

Then there are:

Res_i∈R^h×w×u，

Res_i∈R^H×W×u，

wherein m and n respectively represent residual block number and total residual block number of the super-resolution module, sigma represents activation function, and operator

Which represents the operation of a convolution with the original,

represents the convolution kernel parameters, k is the kernel size, u represents the number of input and output channels, b_i，1∈R^1×u、b_i，2∈R^1×uRepresents a bias term, F_i() represents the ith residual block; let the ith recursive subnetwork be G_i(. To) input is Rec_iOutput is

Then there are:

Rec_i∈R^h×w×u，

Rec_i∈R^H×W×u，

3. the deep network multisource spectral image fusion method of unsupervised recursive learning according to claim 1, wherein in the second step, the network is composed of a pre-super-resolution module and a fusion module, the pre-super-resolution module realizes automatic learning of up-sampling interpolation, and the pre-super-resolution image and the multispectral image are spliced as input of the fusion module; the pre-super-resolution module is P (-), the fusion module is Q (-), and the pre-super-resolution image is Z_pre∈R^H×W×cWith the fusion module input being Z_in∈R^H×W×(C+c)Then, there are:

Z_pre＝P(X)

Z_in＝[Z_pre，Y]

wherein [ \8230 ] represents splicing operation, and X and Y represent hyperspectral and multispectral images respectively.

4. The deep-network multi-source spectrum for multi-supervised recursive learning of claim 1The image fusion method is characterized in that in the third step, a plurality of recursive residual sub-network stacking methods are adopted to establish a pre-super-resolution module and a fusion module; the output characteristic of the ith recursive sub-network of the pre-super-resolution module is recorded as Fe_i(x)∈R^h×w×uI is more than or equal to 1 and less than or equal to m, and the output characteristic of the jth recursion sub-network of the fusion module is Fe_j(y)∈R^H×W×u，m+1≤j≤n，x＝X，y＝Z_inRepresenting module inputs, then:

Q(y)＝R_n-m(Fe_n(y)，U)

wherein G is₁，…，G_m，G_m+1，…，G_nFor recursing subnetworks, operators

Represents convolution operation, sigma is an activation function, u is the input and output channel number of the recursion sub-network,

W₁ ^p∈R^k×k×u、

is a rollParameter of product nucleus, b_l∈R^1×C、b₁∈R^1×u、b₂∈R^1×C、b₃∈R^1×uIs a bias term, R_l(x, U) denotes a convolution operation.

5. The multi-supervised recursive learning depth network multi-source spectral image fusion method according to claim 1, characterized in that in the fourth step, a multi-supervised learning mode is adopted, and each level of characteristics of a low layer, a middle layer and a high layer are spliced and convolutional-layer to form each level of middle fusion image; the output characteristic of the ith recursion sub-network in the fusion stage is recorded as Fe_i∈R^H×W×uM +1 is not less than i and not more than n, and the corresponding output intermediate fusion image is

Then there are:

6. The deep network multi-source spectral image fusion method for multi-supervised recursive learning according to claim 1, characterized in that in the fifth step, L1 norm and spectral angle are taken as two measures of a loss function, and a joint loss function is established between the intermediate fusion images of each stage and the training high-resolution hyperspectral image to perform end-to-end network training; recording the network output image as Z_pred∈R^H×W×CCorresponding to a real image of Z_true∈R^H×W×CThe Loss function is denoted as Loss (Z)_pred，Z_true) Then, there are:

Loss(Z_pred，Z_true)＝L1Loss(Z_pred，Z_true)+SamLoss(Z_pred，Z_true)

spectral vectors representing the predicted image and the real image at positions (i, j), respectively;

if the total Loss function is Loss _ total, then:

wherein alpha epsilon (0, 1) is an equilibrium parameter.