CN114821261A

CN114821261A - Image fusion algorithm

Info

Publication number: CN114821261A
Application number: CN202210555218.8A
Authority: CN
Inventors: 董张玉; 许道礼; 彭鹏; 张晋; 汪燕; 杨智
Original assignee: Anhui Geological Survey Institute; Hefei University of Technology
Current assignee: Anhui Geological Survey Institute; Hefei University of Technology
Priority date: 2022-05-20
Filing date: 2022-05-20
Publication date: 2022-07-29

Abstract

The invention discloses an image fusion algorithm, in the aspect of the whole network architecture, the invention uses a double-branch network to divide the network into a detail promoting branch and a space information maintaining branch, thereby not only maintaining the spectrum information of an MS image in a fusion image, but also obviously enhancing the space detail information in the fusion image; in the aspect of a residual error module, the multi-scale representation capability of the network can be improved on a finer-grained level by utilizing a multi-scale residual error fusion module, the receptive field of each network layer is increased, and the feature extraction capability of the algorithm is increased; in the aspect of a decoder, the extracted multi-scale features are firstly subjected to upsampling in a nested connection mode, and then the features of different scales are fully fused in a connection skipping mode, so that the spatial detail expression capability of a fusion result is stronger.

Description

Image fusion algorithm

Technical Field

The invention relates to the technical field of image fusion, in particular to an algorithm for image fusion.

Background

The synthetic aperture radar is a high-resolution microwave imaging radar, an SAR image similar to an optical image can be obtained under the poor meteorological condition, and the SAR sensor has the characteristics of high definition, all-weather work, strong penetrating power and capability of identifying more space detail information. However, the information in the high spatial resolution SAR image obtained by the method is mainly formed by backscattering of a target object, and then the backscattering is easily influenced by the wavelength of a sensor, a polarization mode adopted by an image and the size of a reflection angle, so that the same target object has different expression forms, and the correct interpretation of the SAR image is inconvenient; the multispectral image can represent the real information of a ground object target, the information quantity of the target object is increased and the visibility is higher compared with the SAR image, but partial space detail information in the MS image is easy to lose due to the fact that the imaging of the multispectral satellite sensor is influenced by weather and light. Due to the different characteristics of SAR and MS images, the contained information has high complementarity. Therefore, the MS picture spectrum information can be kept, the space detail information of the SAR image can be injected into the fusion result, and the fusion result can be better applied to relevant fields such as military affairs, disaster assessment, target identification and the like.

Aiming at the fusion research and utilization of images, a great deal of work is done by many scholars at home and abroad. The conventional fusion method is roughly classified into the following 3 types: 1) the Component replacement (CS) method replaces the spatial detail information of the multispectral image with the spatial detail information in the SAR image with high spatial resolution, and then performs inverse transformation to obtain a fusion result. Common component substitution methods include an IHS (Intensity-Hue-preservation) transformation, a component analysis (PCA), a high-pass filtering method and the like; the method is easy to implement and high in calculation efficiency, but the phenomenon of spectral distortion exists in the fused image because the huge difference of the imaging waveband ranges of the SAR image and the MS image is not considered. 2) The Multi-resolution analysis (MRA) method first obtains spatial detail information in the SAR image by using Multi-resolution analysis methods such as Wavelet (Wavelet) transform and laplacian pyramid, and then obtains a final result by using weighted fusion and reconstruction of Multi-resolution representation coefficients. Common methods are wavelet transform, non-subsampled contourlet transform (NSCT), and methods using Shearlet transform. For example: the SAR and MS images are fused by using a Shearlet conversion method such as Zerah construction, the retention amount of fused image space detail information and spectrum information is obviously improved, but the edge distortion exists; and the easy-maintenance method and the like fuse a high-resolution image with spectral information and SAR image space detail information by using an NSCT method, and obviously improve the fusion result. Compared with the CS method, the MAR method has higher computational complexity, can reduce noise in the SAR image and reduce spectral distortion, but the accuracy of image registration can influence the effect of fused images, thereby causing edge distortion and image aliasing. 3) The fusion method is to fully utilize the advantages of different fusion methods, and can reduce the complexity of the algorithm when reducing the spatial and spectral distortion. For example, Alparone and the like propose an SAR and MS image fusion mode of IHS and multiresolution fusion, and comprehensively utilize the advantages of the IHS image fusion mode in the aspect of preserving space detail information and the advantages of a multiresolution analysis method capable of well preserving the spectral information of an image; the fusion mode can obtain better space detail information and spectrum information, but the CS method can only use the fusion method (such as PCA and IHS) which can realize space transformation. Compared with the conventional method, the Convolutional Neural Networks (CNN) has been applied to various research fields of image fusion with excellent feature extraction and feature representation capabilities, and has achieved surprising performance, which also prompts researchers to discuss the application of CNN to the field of fusion of SAR and multispectral images. Firstly, Masi and the like provide a Pan-sharpening (Pan-sharpening) fusion algorithm based on a super-resolution convolutional neural network (SRCNN), and the fusion result is obviously higher than that of the traditional algorithm; zhong et al propose a pan-sharpening method based on CNN, which utilizes CNN to enhance spatial detail information in a multispectral image, and then performs Gram-Schmidt (GS) transformation to fuse the enhanced image; yang and the like provide a fusion framework of a double-branch network for maintaining image spectrums and improving image space information, and fusion results are obviously improved in the aspects of subjective vision and objective image quality evaluation indexes; yuan and the like introduce multi-scale feature extraction and residual learning into a basic Convolutional Neural Network (CNN) system structure, and the strong characterization capability of the nonlinear relation of the deep neural network is fully utilized, so that the fusion result is further improved; li and the like propose that a network based on nested connection can store a large amount of characteristic information in input data from a multi-scale angle in infrared and visible light image fusion, but the network structure with finer granularity is not used for acquiring characteristics of different scales in an image. Based on the above research, the invention introduces a spatial detail information fusion scheme of an encoder-feature fusion layer-decoder, provides a Double-branch multi-scale Residual fusion Nested connection network architecture (Double-branch multi-scale Residual-fusion Nested-connections Net, DMRN-Net), increases the network scale on finer granularity, is used for improving the fusion effect of the SAR and the multispectral image, and performs contrastive analysis with other algorithms.

Disclosure of Invention

The invention aims to make up for the defects of the prior art and provides an image fusion algorithm.

The invention is realized by the following technical scheme:

an image fusion algorithm realizes the fusion of an SAR image and an MS image through a dual-branch multi-scale residual fusion nested connection network architecture; the dual-branch multi-scale residual fusion nested connection network architecture comprises a detail promotion branch network architecture and a spectrum maintenance branch network architecture, and specifically comprises the following steps:

(1) acquiring high-frequency information in the SAR image and the MS image;

(2) in the detail improvement shunt network architecture, image reconstruction is carried out on the obtained high-frequency information respectively through a multi-depth feature extraction layer, a multi-scale residual error fusion network layer and a nested connection decoder in sequence to obtain a reconstructed image;

(3) in a spectrum holding shunt network architecture, performing triple upsampling on an MS image; and (3) fusing the up-sampled MS image with the reconstructed image obtained in the step (2), so that the MS spectrum information and the detail information of the reconstructed image are both injected into the fused image, and a final image fusion result is obtained.

The specific content of the step (1) is as follows: the SAR image and the MS image are subjected to high-pass filter to obtain high-frequency information of the images, the high-frequency information of the MS image is subjected to three times of upsampling to the same resolution of the SAR image, and output channel values of the SAR image and the MS image are increased to 60 through 1 x1, (1,60) and 1 x1, (3,60) convolution blocks respectively.

In the step (2), image feature information is respectively output at different depths of the multi-depth feature extraction layer, feature information at different depths is obtained to perform feature fusion in a multi-scale residual fusion network layer, and a decoder based on nested connection is used for reconstructing an image.

The multi-scale residual error fusion network layer replaces the feature extractors of n channels with s groups of smaller feature extractors, each group of feature extractors uses k channels, n is s multiplied by k, the smaller feature extractors are connected in a mode similar to residual errors in a layering mode, and the range of scales which can be represented by output features is enlarged; dividing input feature information into s groups, firstly, extracting features from the input feature information by each group of feature extractors, secondly, sending the output features of the previous group and the input feature information of the current group to the next group of feature extractors together, and repeating the step for multiple times until all the input feature information is processed; finally, features from each group are concatenated and sent to another group of convolutional layers for feature information fusion;

the working principle of the multi-scale residual error fusion network layer is shown as the formula (1): let x _i Represents input information, where i ∈ {1,2 _i () Representing a 3x3 convolution, then the output y _i Comprises the following steps:

the nested concatenated decoder has two convolutional layers, each convolutional layer with a convolutional kernel of 3 × 3; in each row, the convolution blocks are connected by short connections; aiming at the output of different levels in the multi-scale residual fusion network layer, the nested connection decoder samples the characteristic information to the same scale in an up-sampling mode, and fully fuses the multi-scale image characteristics.

The spectrum maintaining shunt network architecture is realized according to the following principle:

wherein F is a fused image, F _hp For the output image of the detail promotion shunt network architecture, the image of the MS after upsampling is three times for the ≠ MS.

The loss function L used by the dual-branch multi-scale residual fusion nested connection network architecture _total Including the spectral loss function L _spectral And a detail loss function L _detail Two parts, as shown in formula (3):

L _total ＝L _spectral +λL _detail (3)

where λ is the spectral loss function L _spectral And a detail loss function L _detail A parameter in between;

spectral loss function L _spectral The function of the L2 norm of the fused image F and reference image GT for the detail-enhancing split network architecture and the spectral-enhancing split network architecture is shown in formula (4):

where N is the number of training image pairs per batch, GT ⁽ⁱ⁾ Original MS image representing the ith reference, F ⁽ⁱ⁾ The image after the fusion of the ith pair of detail improvement shunt network architecture and the spectrum maintaining shunt network architecture is represented;

detail loss function L _detail Fusion result F for improving output of shunt network architecture for details _hp And high-frequency information S of SAR image _hp The norm of L2 between, the function of which is shown in equation (5):

wherein

Showing the detail information output by the ith SAR image and multispectral image detail promoting shunt network architecture,

and high-frequency information representing the ith SAR image.

The invention has the advantages that: 1) in the aspect of the whole network architecture, the invention divides the network into a detail promoting branch and a spatial information maintaining branch by using a double-branch network, thereby not only maintaining the spectral information of the MS image in the fused image, but also obviously enhancing the spatial detail information in the fused image;

2) in the aspect of a residual error module, the multi-scale representation capability of the network can be improved on a finer-grained level by utilizing a multi-scale residual error fusion module, the receptive field of each network layer is increased, and the feature extraction capability of the algorithm is increased;

3) in the aspect of a decoder, the extracted multi-scale features are firstly subjected to up-sampling in a nested connection mode, and then the features of different scales are fully fused in a skip connection mode, so that the spatial detail expression capability of a fusion result is stronger;

4) the algorithm obtains a better fusion result on subjective judgment and objective evaluation, wherein the correlation coefficient in objective evaluation indexes is 0.9936, the peak signal-to-noise ratio is 32.0170, and compared with other algorithms, the algorithm can increase the network scale on finer granularity on the basis of maintaining spectral information so as to improve the fusion effect of SAR and multispectral images, enrich the detailed information in the fusion result, further increase the spatial detailed information of the images, and verify that the algorithm has important reference value in the field of image fusion.

Drawings

FIG. 1 is a general architecture diagram of the algorithm of the present invention;

FIG. 2 is a diagram of a network architecture of a detail lift branch;

fig. 3 is a diagram comparing RFN and the proposed MRFN block (scale s ═ 4) (fig. 3a is a diagram of the RFN block; fig. 3b is a diagram of the MRFN block);

FIG. 4 is a diagram of a decoder architecture;

FIG. 5 is a first set of experimental data graphs (5a is a reference MS image; 5b is an MS image; 5c is an SAR image);

FIG. 6 is a second set of experimental data (6a is a reference MS image; 6b is an MS image; 6c is an SAR image);

FIG. 7 is a graph of the loss function during the training phase (7a is a graph of the total loss function; 7b is a graph of the spectral loss function; 7c is a graph of the detail loss function);

FIG. 8 is a graph of the loss function in the verification stage (8a is a graph of the total loss function; 8b is a graph of the spectral loss function; 8c is a graph of the detail loss function);

FIG. 9 shows a first set of data fusion results (9a is IHS data fusion result, 9b is NSCT data fusion result, 9c is Wavelet data fusion result, 9d is TCNN data fusion result, 9e is DRN-Net data fusion result, and 9f is data fusion result of the present invention);

FIG. 10 shows the second set of data fusion results (10a is the IHS data fusion result, 10b is the NSCT data fusion result, 10c is the Wavelet data fusion result, 10d is the TCNN data fusion result, 10e is the DRN-Net data fusion result, and 10f is the data fusion result of the present invention).

Detailed Description

In the field of Synthetic Aperture Radar (SAR) and multi-spectral (MS) image fusion, a fusion mode based on deep learning obviously improves the fusion effect, but the existing fusion scheme mainly describes the scale of a network model by increasing the number of convolutional layers, and does not increase the scale of the network between layers of the convolutional layers, so as to improve the capability of an algorithm for extracting space detail characteristics of different scales, and make the detail information in the fusion result richer. In order to solve the problem, the invention designs a Double-branch multi-scale Residual fusion Nested connection network architecture (DMRN-Net) to realize the fusion of SAR and MS images. In DMRN-Net, the fusion task is divided into spectral preservation and detail promotion of images: in the spectrum maintaining branch, firstly, fusing the MS image after up-sampling with the acquisition result of the detail improving branch, and injecting the spectrum information in the MS image into the fused image; in the detail improvement shunt, firstly, high-frequency information in the SAR and MS images is processed by a multi-depth feature extraction layer, a multi-scale residual fusion network layer and a nested connection decoder respectively to obtain a reconstructed image, and finally, the reconstructed detail information is injected into an up-sampled MS image, so that a fusion result is obtained. The DMRN-Net algorithm, the traditional algorithm and a comparison experiment of a common double-branch network show that the algorithm obtains a better fusion result on subjective judgment and objective evaluation, wherein the correlation coefficient in an objective evaluation index is 0.9936, the peak signal-to-noise ratio is 32.0170, and compared with other algorithms, the algorithm can further increase the spatial detail information of an image on the basis of maintaining spectral information, thereby verifying that the algorithm has an important reference value in the field of image fusion. The method comprises the following specific steps:

1. network architecture

The imaging principle and the contained wave band of the SAR and the multispectral image are greatly different, the correlation between image information is not strong, local negative correlation may occur, and the color distortion is easy to occur in the obtained fusion image by a common component substitution method. The invention provides a network architecture for fusing SAR and MS images based on dual-branch multi-scale residual fusion nested connection by combining a multi-depth encoder, a multi-scale residual fusion network and a nested connection decoder on the basis of a PanNet and dual-branch convolutional neural network architecture, and the fusion process of the SAR and multispectral images is divided into two aspects of space detail promotion and spectrum maintenance, as shown in figure 1.

1.1 detail promotion shunt network architecture

Referring to the residual fusion nested connection network structure of Li and the like, a multi-scale residual fusion nested connection network structure is used in the detail promotion shunt, and the network structure is shown in fig. 2. The network structure comprises three parts: a multi-depth feature extraction module (left), a multi-scale residual fusion network (MRFN) and a nested concatenated Decoder (Decoder). Convolution layer "n × n, (x, y)" indicates that the convolution kernel size is n × n, the input channel value is x, and the output channel value is y.

Firstly, SAR and MS images pass through a high-pass filter to obtain high-frequency information of the images, the high-frequency information of the MS is up-sampled by three times to the same resolution of the SAR images, and output channel values of the SAR and MS are increased to 60 through a '1 × 1' (1,60) ',' 1 × 1 '(3, 60)' convolution block respectively. And secondly, outputting image characteristic information at different depths of the module by using a multi-depth characteristic extraction module, and performing characteristic fusion on the obtained characteristic information at different depths in a multi-scale residual error fusion module. The multi-depth feature extraction module is used, so that the shallow feature and the deep feature of the image can be considered, the features of different depths of the image can be fully used in the feature extraction mode, and the fused image can be reconstructed more conveniently. And finally, reconstructing the image by using a decoder network based on nested connection, so that the multi-depth characteristics of the image can be fully utilized.

As shown in fig. 2: s. the _hp And ↓ [ MS ] _hp High frequency information, F, for the SAR image passed through the high pass filter and the MS image passed through the high pass filter and up-sampled, respectively _hp And (4) outputting the detail lifting branches, namely the fused images of the detail lifting branches. "MRFN _m "denotes a multi-scale residual fusion network,in the network structure, four MRFN networks are constructed, and the MRFN networks use the same network architecture but have different weights.

1.2 Multi-Scale residual fusion network (MRFN)

The multi-scale feature representation has important significance in the field of image processing, so that a visual system can sense information from different scales, and the understanding of components and task objects is facilitated. Therefore, in deep learning, multi-scale features are widely used, but the current method mainly adopts layering as a way of representing the multi-scale features. The MRFN algorithm is a simpler and more efficient multi-scale processing method based on the concept of residual blocks and combined with a multi-scale residual block structure Res2Net, and is different from the traditional mode of enhancing the multi-scale representation capability of a network by using the number of CNN convolutional layers, and the multi-scale representation capability is improved from a lower quantity level. Unlike those parallel works that enhance multi-scale capability using features of different resolutions, the multi-scale capability of Res2Net refers to the addition of multiple available receive fields from layer to layer of the convolutional network. To achieve this goal, n-channel feature extractors (Conv 4 in fig. 3 a) are replaced by (the section between x1-x4 and y1-y4 in fig. 3 b) s sets of smaller extractors, each set using k channels (n ═ s × k), and the smaller feature extractors are connected together hierarchically in a residual-like fashion to increase the range of scales that the output features can represent. In contrast to the conventional RFN module, the MRFN divides the input feature information into s groups, and first, each group of feature extractors extracts features from the input feature information, and then, transmits the output features of the previous group and the input feature information of the current group together to the next group of feature extractors, and this step is repeated several times until all the input feature information is processed. Finally, features from each group are concatenated and sent to another group of convolutional layers for feature information fusion. With any possible path of the input feature information being converted into output feature information, the MRFN increases many equivalent feature scales due to the combined effect of the multiple feature extractors. As shown in figure 3 of the drawings,

and

represents the ith layer features extracted by the encoder network, i ∈ {1,2,3,4} represents the ith MRFN network. "Conv n" represents a 3 × 3 convolutional layer in MRFN, and in the residual fusion network, the input "Conv 8" in which the outputs of "Conv 1" and "Conv 2" are spliced to "Conv 3" is the first layer in the module for fusion. Due to the finer-grained multi-scale features and the residual network structure, the details and salient structure of the image are preserved by the shallow and deep MRFN networks, respectively.

The working principle is shown in formula (1): let x be _i Represents the input information, where i ∈ {1, 2.., s }, where s ═ 4, k _i () Representing a 3x3 convolution, then the output y _i Comprises the following steps:

1.3 nested concatenated decoder networks

A decoder network based on a nested connection architecture is shown in fig. 4. Compared with UNet + +, the network structure is simplified on the image fusion task, so that the extracted image features are simpler and more effective to reconstruct.

Representing multiscale feature information obtained through an MRFN network, and "DCB" representing a decoder convolutional block, which has two convolutional layers, each of which has a convolutional kernel of 3 × 3. Within each row, the convolution blocks are connected by short connections, similar to dense block structures. Aiming at the output of different levels in the MRFN network, the nested connection decoder samples the characteristic information to the same scale in an up-sampling mode, and the multi-scale image characteristics can be fully fused.

1.4 spectral lifting Branch

Firstly, three times of upsampling is carried out on the MS image, secondly, the upsampled MS image and the image reconstructed from the detail promoting branch are superposed, so that the spectrum information of the MS and the detail information of the reconstructed image are both injected into the fused image F, and the final fused result is obtained. The realization principle is as follows:

1.5 loss function

Loss function L used by the present network architecture _total Including the spectral loss function L _spectral And a detail loss function L _detail Two parts, as shown in formula (3):

L _total ＝L _spectral +λL _detail (3)

where λ is the spectral loss function L _spectral And a detail loss function L _detail The parameter in between.

Spectral loss function L _spectral The L2 norm of the fused image F and reference image GT for the detail-lifting branch and spectral-lifting branch is as shown in equation (4):

where N is the number of training image pairs per batch, GT ⁽ⁱ⁾ Original MS image representing the ith reference, F ⁽ⁱ⁾ Showing the fused image of the ith pair of detail-enhancing branches and the spectral preserving branch.

Because the spatial resolution in the SAR image is higher than that of the MS image, the detail loss function L is designed for improving the spatial resolution of the fused image, increasing detail information, realizing the constraint of the SAR image on the fusion result and _detail which is a fusion result F of the detail boosting shunt network output _hp And high-frequency information S of SAR image _hp The norm of L2 between, the function of which is shown in equation (5):

wherein

Detail information of the ith pair of SAR images and the output of the multispectral image detail promoting shunt network architecture,

and high-frequency information representing the ith SAR image.

2 experiment and analysis of results

2.1 Experimental area and study data

Two groups of characteristic research areas (figure 5 and figure 6) with abundant ground feature types are selected as experimental data to verify the effectiveness of the algorithm. The first group is a Tongzhou bay in Nantong city, which comprises various land features such as seawater, buildings, cultivated land, roads and the like, and can mainly observe whether the improvement of the algorithm in the aspect of space details is obvious or not; the second group is a forest park of a county poppy country and nearby Shang lakes, including mountains, forests, lake water, bridges, residential quarters and other land features which are complex and can be used for observing whether the fusion of images of multiple land features influences the precision of the algorithm. The study data included the IW mode of Sentinel-1B downloaded from Copernicius Open Access Hub, SAR data at GRD level, and Landsat8_ OLI _ TIRS multispectral data downloaded from geospatial data cloud. In the experiment, SNAP is used for carrying out orbit correction, radiometric calibration, coherent speckle filtering and terrain correction on the SAR image, ENVI is used for carrying out radiometric calibration and atmospheric correction on the multispectral image, then image registration is carried out on the preprocessed SAR and MS images, and the registration error is within 0.5 pixel point. In addition, because of the lack of ideal standards after SAR and MS image fusion, in order to make the experimental results have a reference basis in the aspects of spectral information and detailed information, the Sentinel-2A data (with the resolution of 10m) is used as a reference image. The satellite data parameters are shown in table 1.

TABLE 1 satellite data parameters

2.2 Experimental setup

The experimental environment of the invention is Window10, 64-bit operating system, 2.5GHz processor, NVIDIA GeForce GTX 1650Ti display card, and the network architecture of the invention is built by TensorFlow in Python3.6 environment. The number of trains per batch was 100, the total number of iterations was 25000, Adam was used as the optimizer, the learning rate was set to 0.0001, and the momentum attenuation coefficient was set to 0.99. Cutting 10000 groups of SAR and MS image pairs of 90 multiplied by 90 and 30 multiplied by 30 pixels from the preprocessed images, and taking 8000 groups of image pairs as a training data set of the network and 2000 groups of image pairs as a verification data set of the network according to the proportion of 8: 2; 2 sets of SAR and MS image pairs with sizes of 900 x 900 and 300 x 300 pixels were cropped out as test sets. At the beginning of network training, first, according to Wald ^[17] The protocol preprocesses the data: the original MS image is used as the reference image GT, a 3-time upsampling technology is simultaneously used, the original SAR and the MS image are upsampled to obtain an image pair of the SAR and the MS image, and then GT (30 × 30 × 3), SAR (30 × 30 × 1) and MS (10 × 10 × 3)3 are used as input of network training together, and the training time of the network is about 2.7 h.

2.3 parameter lambda experiment

L in the formula (5) _detail The parameter lambda of (2) determines whether the design of the loss function is reasonable, so that the detail loss function during training and verification when the lambda takes different values is evaluated, and the fluctuation condition (figure 7 and figure 8) appears along with the change of the iteration number, thereby determining an optimal lambda value. As can be seen from the graph, as the parameter λ decreases, the total loss (fig. 7a, 8a) decreases; as shown in fig. 7b and 8 b: when λ is 0.5, 10, 100, the spectral loss value of the image is small; as shown in fig. 7c and 8 c: when lambda is 0.5 and 1, the loss value of the space detail is small; so to take into account spectral and detail loss, the present invention selects a value of 0.5 for λ.

In order to evaluate the image fusion effect, the evaluation indexes of the following image fusion, namely a correlation coefficient CC, a root mean square error RMSE and a spatial correlation coefficient SCC are introduced ^[19] Evaluation index representing spatial detail correlation between SAR image and fusion result, and mutual information MI, MI _MF Representing the degree of similarity between the fusion result and the MS image, MI _SF To representAnd the similarity between the fusion result and the SAR image and the degree of influence on the fusion result on the improvement of the detail information and the spectrum maintenance when the lambda takes different values are evaluated by objective evaluation indexes. From table 2, it can be derived: the lambda value varying from large to small, MI _MF Becomes low, the image similarity representing the fusion result and the MS image becomes low, and the spectral information in the result is affected, but CC, RMSE, SCC, and MI _SF The value of λ is reduced, and becomes better, wherein SCC is optimal when λ is 1, which indicates that the spatial detail information of the SAR image added in the fused image is consistent with the analysis results of fig. 7 and 8;

CC, RMSE, MI when λ is 0.5 _SF Optimally, SCC is suboptimal, and more spatial detail information can be added into the fusion result.

TABLE 2 evaluation index of different lambda value experimental results

2.4 comparison of different fusion methods

In order to verify the obvious advantages of the algorithm in the aspects of spectrum maintenance and detail improvement in the fusion process of SAR and MS images, the algorithm is compared with the traditional algorithms IHS, Wavelet, NSCT, deep learning algorithm, dual-branch convolutional neural network (TCNN) and DRN-Net, and experimental parameters of a comparison experiment are completely set according to parameters in the original paper.

2.4.1 subjective evaluation

As can be seen from the comparison of the fusion result with the original image MS (fig. 9) of bilinear triple upsampling, the six tested methods all have enhancement in the aspect of spatial detail information, but the enhancement amplitudes of the different methods are slightly different. From the first set of data it can be seen that: the spatial detail features in the IHS algorithm (fig. 9a) and the NSCT algorithm (fig. 9b) are significantly enhanced, but the detail information in the fusion result is added excessively, so that the color distortion phenomenon exists in the fusion result, and especially the color distortion of the water part in the graph is serious. Wavelet and TCNN, the DRN-Net network and the algorithm are excellent in performance in the aspect of spectrum preservation, but the detail improvement amplitudes of the four algorithms are still different from the effect of image fusion. The Wavelet fusion algorithm (fig. 9c) has certain spatial detail information improvement, but the fusion result has obvious jagged texture and distorted edge part characteristics, which affects the extraction of the target structure and the edge in the later period; due to the strong computing power of the convolutional neural network, compared with the traditional method without edge distortion, the fusion result of the TCNN algorithm (figure 9d) is slightly improved in image detail texture compared with Wavelet, but the detail of the building area is fuzzy and the fusion is not natural; DRN-Net (figure 9e) is a residual error fusion network (RFN) added on the basis of a double-branch convolution neural network, and compared with the algorithm of the invention, the fusion result of the DRN-Net is slightly more in the aspect of adding space details, so that the fusion result of the ground feature edge is still not ideal; the algorithm (figure 9f) of the multiscale residual fusion convolutional neural network (DMRN-Net) based on the dual-branch architecture fully extracts image features of different depths of an SAR and an MS image, the multiscale residual fusion network is used, the depth of the network is expanded on finer granularity, a more appropriate loss function combining a spectrum and details is designed for supervising training of the network, the space information of the SAR images with different depths is added into a fusion result in a nested connection mode, the fusion result does not cause spectrum distortion due to the addition of too much space detail information of the SAR images, the phenomena of edge distortion and texture blurring do not occur, and the fusion result simultaneously enhances the space detail information of the images and the remarkable features of a target. The fused results of the second set of experimental data (fig. 10) are substantially consistent with the above comments.

2.4.2 Objective evaluation

Besides the spectrum evaluation indexes of images such as CC, RMSE and the like, peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) are further selected for evaluating the spatial detail quality, relatively dimensionless global Error (ERGAS) and algorithm test time of the fusion result to compare the fusion results of different algorithms (tables 3 and 4). As can be seen from the spectral evaluation indexes CC and RMSE, the IHS and NSCT algorithms are poor in performance, and the color distortion of a fusion result is caused in the fusion process of SAR and MS images by the two algorithms; the Wavelet algorithm adds spatial detail information of excessive SAR images into the fusion image, so that the edge distortion of the target object is caused; both the TCNN algorithm and the DRN-Net algorithm have better spectrum retention capability, but still have room for improvement. The IHS and NSCT algorithms are poor in PSNR indexes and show that more obvious image distortion exists in the fusion result of the two algorithms; the PSNR index of the Wavelet algorithm is further improved, however, the fusion result is easily influenced by the Wavelet transformation direction constraint, and the edge of a target ground object is obviously jagged; TCNN and DRN-Net have better performance in PSNR and SSIM, and obviously enhance the detail information in the fusion result. As can be seen from the evaluation indexes ERGAS of image detail and spectrum balance, the IHS and NSCT algorithms are poor in performance because the evaluation indexes of the image detail and the spectrum detail are poor in spectrum maintenance and space detail improvement, which is consistent with the subjective evaluation result; the Wavelet algorithm is obviously improved, and the color distortion phenomenon in the IHS and NSCT algorithms is effectively changed; the TCNN and DRN-Net algorithms fully utilize the strong computing power of the convolutional neural network, and the overall effect of image fusion is further improved; the IHS algorithm takes the least time in terms of algorithm efficiency as compared. Compared with the RFN algorithm, the RFN algorithm and the RFN module-based algorithm use the same network structure and parameters except for the residual fusion network, and the comparison shows that the RFN module-based algorithm is further improved in spectrum retention and space detail, effectively improves the space detail information of the MS image, improves the algorithm efficiency compared with the Wavelet algorithm, and is suitable for fusion processing of mass remote sensing data.

TABLE 3 evaluation results of the first set of data

TABLE 4 evaluation results of the second group of data

3. Conclusion

In order to improve the phenomena of color distortion and fuzzy spatial details in the traditional SAR and MS image fusion algorithm, the invention provides a double-branch convolutional neural network for fusing SAR and MS images, which comprises a multi-scale residual fusion network and a nested connection decoder, on the basis of the double-branch network, and the following conclusion is obtained: 1) in the aspect of the whole network architecture, a network is divided into a detail promoting branch and a spatial information maintaining branch by using a double-branch network, so that not only is the spectral information of an MS image maintained in a fusion image, but also the spatial detail information in the fusion image is obviously enhanced; 2) in the aspect of a residual error module, the multi-scale representation capability of the network can be improved on a finer-grained level by using a multi-scale residual error fusion module, the receptive field of each network layer is increased, and the feature extraction capability of the algorithm is increased; 3) on the aspect of a decoder, the extracted multi-scale features are firstly subjected to up-sampling in a nested connection mode, and then the features of different scales are fully fused in a skip connection mode, so that the spatial detail expression capability of a fusion result is stronger. However, in the design process of the algorithm, the influence of different attention mechanisms on the image fusion result is not concerned, and the experiment of the different attention mechanisms on the algorithm improvement range can be performed at the later stage, so that the experiment effect is further improved.

Claims

1. An algorithm for image fusion, characterized by: the fusion of the SAR image and the MS image is realized through a double-branch multi-scale residual fusion nested connection network architecture; the dual-branch multi-scale residual fusion nested connection network architecture comprises a detail promotion branch network architecture and a spectrum maintenance branch network architecture, and specifically comprises the following steps:

(1) acquiring high-frequency information in the SAR image and the MS image;

2. An image fusion algorithm according to claim 1, characterized in that: the specific content of the step (1) is as follows: the SAR image and the MS image are subjected to high-pass filter to obtain high-frequency information of the images, the high-frequency information of the MS image is subjected to three times of upsampling to the same resolution of the SAR image, and output channel values of the SAR image and the MS image are increased to 60 through 1 x1, (1,60) and 1 x1, (3,60) convolution blocks respectively.

3. An image fusion algorithm according to claim 1, characterized in that: in the step (2), image feature information is respectively output at different depths of the multi-depth feature extraction layer, feature information at different depths is obtained to perform feature fusion in a multi-scale residual fusion network layer, and a decoder based on nested connection is used for reconstructing an image.

4. An image fusion algorithm according to claim 3, characterized in that: the multi-scale residual error fusion network layer replaces the feature extractors of n channels with s groups of smaller feature extractors, each group of feature extractors uses k channels, n is s multiplied by k, the smaller feature extractors are connected in a mode similar to residual errors in a layering mode, and the range of scales which can be represented by output features is enlarged; dividing input feature information into s groups, firstly, extracting features from the input feature information by each group of feature extractors, secondly, sending the output features of the previous group and the input feature information of the current group to the next group of feature extractors together, and repeating the step for multiple times until all the input feature information is processed; finally, features from each group are concatenated and sent to another group of convolutional layers for feature information fusion.

5. An image fusion algorithm according to claim 4, characterized in that: the working principle of the multi-scale residual error fusion network layer is shown as the formula (1): let x _i Represents input information, where i ∈ {1,2 _i () Representing a 3x3 convolution, then the output y _i Comprises the following steps:

6. an image fusion algorithm according to claim 4, characterized in that: the nested concatenated decoder has two convolutional layers, each convolutional layer with a convolutional kernel of 3 × 3; in each row, the convolution blocks are connected by short connections; aiming at the output of different levels in the multi-scale residual fusion network layer, the nested connection decoder samples the characteristic information to the same scale in an up-sampling mode, and fully fuses the multi-scale image characteristics.

7. An image fusion algorithm according to claim 1, characterized in that: the spectrum maintaining shunt network architecture is realized according to the following principle:

8. An image fusion algorithm according to claim 7, characterized in that: the loss function L used by the dual-branch multi-scale residual fusion nested connection network architecture _total Including the spectral loss function L _spectral And a detail loss function L _detail Two parts of the utility model are provided with a water tank,as shown in formula (3):

L _total ＝L _spectral +λL _detail (3)

wherein

and high-frequency information representing the ith SAR image.