CN111080567A

CN111080567A - Remote sensing image fusion method and system based on multi-scale dynamic convolution neural network

Info

Publication number: CN111080567A
Application number: CN201911271164.7A
Authority: CN
Inventors: 胡建文; 胡佩; 张辉
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2019-12-12
Filing date: 2019-12-12
Publication date: 2020-04-28
Anticipated expiration: 2039-12-12
Also published as: CN111080567B

Abstract

The invention discloses a remote sensing image fusion method and a system based on a multi-scale dynamic convolution neural network, wherein the method comprises the steps of firstly, generating a network dynamic multi-scale filter by a high-resolution panchromatic image and a low-resolution multispectral image through a multi-scale filter, and then carrying out multi-scale dynamic convolution on the filter and the panchromatic image; and (3) carrying out proper weighting on the detail features obtained by dynamic convolution by using a weight generation network, passing the weighted multi-scale detail features through two convolution layers to obtain a final detail image, and adding the detail image and the low-resolution multispectral image to obtain a fused image. The invention adopts multi-scale local self-adaptive dynamic convolution, can dynamically generate a local self-adaptive filter at each pixel position according to each input image, enhances the self-adaptability of the network, improves the generalization capability of the network, obtains good fusion effect, and can be used for target detection, target identification and the like.

Description

Remote sensing image fusion method and system based on multi-scale dynamic convolution neural network

Technical Field

The invention belongs to the field of image processing, and particularly relates to a remote sensing image fusion method and system based on a multi-scale dynamic convolution neural network.

Background

In order to efficiently utilize and integrate images from different sensors, image fusion technology has been rapidly developed and widely applied in the fields of civil use, military, medicine, computer vision, and the like. In the image fusion technology, the fusion of the multispectral image and the full-color image is most widely applied. Multispectral images contain rich spectral information, but have low spatial resolution; the full-color image is rich in spatial details but lacks spectral information, and the spatial resolution of the multispectral image can be improved through the fusion of the multispectral image and the full-color image, so that the multispectral image is better applied to the identification and detection of surface objects and other applications.

The traditional image fusion method mainly comprises the following three methods: component substitution, multiresolution analysis, and sparse representation. The component replacement method comprises the steps of firstly carrying out certain spatial transformation on a low-resolution multispectral image, then replacing structural components of the transformed multispectral image with a high-resolution panchromatic image, and finally carrying out inverse transformation to obtain the high-resolution multispectral image, wherein the method is simple and rapid, can obtain better spatial details, and has certain spectral distortion; the central idea of multiresolution analysis is to inject structural details of the panchromatic image extracted by multiresolution analysis into the multispectral image, the spectral preservation of this method is generally superior to the component replacement method, but the structural details of the fused image are not good enough; the sparse representation method mainly comprises the steps of constructing or learning a dictionary and sparse solution, the fusion result of the method is influenced by the constructed dictionary, and the sparse solution efficiency is low.

The existing image fusion methods are mainly methods based on deep learning models, such as PNN, PanNet, and RSIFNN, etc., wherein the PNN method is described in Masi, Giuseppe, et al, "Pansharpening by statistical connected networks," Remote Sensing 8.7(2016):594, the PanNet method is described in Yang, Junfeng, et al, "PanNet: Adep network architecture for pan-sharing," Proceedings IEEE International Conference compatibility: 2017, and the RSNN method is described in Shao, Zhenfeng, and Jianjun Cai, "Remote Sensing fusion with detailed connected networks," IEEE Journal joining network of simulation networks "1658 and RS Press, and 1658 (approximate Toconnecting device of electronic devices, RS, and J.; although the fusion effect is remarkably improved compared with the traditional fusion method, the network adaptivity of the methods is not strong, the structure is simpler, the spectrum of the fused image has certain distortion, and partial structural details are lost. In order to enable the network to have better generalization capability and improve the fusion performance, the image fusion method based on the deep learning model still needs to be continuously improved and innovated.

Disclosure of Invention

The invention provides a remote sensing image fusion method and system based on a multi-scale dynamic convolutional neural network, aiming at the problems of fusion image detail loss and spectrum distortion caused by the fact that a filter is fixed and poor in self-adaptability after training in the traditional remote sensing image fusion method based on the convolutional neural network.

A remote sensing image fusion method based on a multi-scale dynamic convolution neural network comprises the following steps:

step 1: utilizing a multi-scale filter generation network to obtain a multi-scale local self-adaptive dynamic filter;

step 2: performing multi-scale dynamic convolution with the full-color image by using a multi-scale local adaptive dynamic filter;

and step 3: obtaining weights of different scales by using a weight generation network, and multiplying the weights by the dynamic convolution result of the corresponding scale to obtain the detail characteristics of the full-color image in different scales;

and 4, step 4: reconstructing a fused image;

splicing the channel dimensions by using detail features of different scales, and sequentially passing the splicing result through two convolution layers of different sizes to obtain final details; and finally, adding the final details and the low-resolution multispectral image to obtain a fused image.

Further, the multi-scale local adaptive dynamic filter is obtained by inputting a spliced result into a multi-scale filter generation network after splicing a low-resolution multispectral image and a panchromatic image in a channel dimension;

the multi-scale filter generation network comprises 1 3 × 3 convolutional layer, 4 residual modules, 1 3 × 3 convolutional layer and 1 × 1 convolutional layer which are sequentially connected, and then 4 1 × 1 convolutional layers with different channel numbers and a matrix dimension conversion module are connected;

the residual error module comprises 1 convolution layer of 3 multiplied by 3 and 1 convolution layer of 1 multiplied by 1, the data input into the residual error module is added with the data input into the residual error module after being sequentially processed by the 1 convolution layer of 3 multiplied by 3 and the 1 convolution layer of 1 multiplied by 1, and the output data of the residual error module is obtained;

the matrix dimension transformation module is used for converting the matrix dimension input into the matrix dimension transformation module from four dimensions to five dimensions;

the number of channels of the 1 × 1 convolutional layer with 4 different channel numbers is C_s，C_s＝k_s×k_s×M，k_sThe size of the dynamic convolution filter at the s-th scale is shown, with s being 1, 2, 3, 4.

The multi-scale filter generation network dynamically generates a filter for each input image at each pixel position, so that the self-adaptive capacity of the network is improved, a residual error module in the network not only improves the information circulation among all residual error blocks, but also effectively avoids the degradation problem and the gradient disappearance problem caused by the increase of the network depth; the common filtering is realized by the translation of the same filter, and the local self-adaptive dynamic filtering adopts different filters in self-adaptation at different positions of an input image.

And the multi-scale dynamic convolution is utilized to extract detail features of different scales, so that the feature extraction capability of the network is improved, and a more efficient and accurate feature image is obtained.

Further, the matrix dimension transformation module is realized by adopting a reshape function of tensorflow;

the dimension of the input matrix dimension transform module is NxHxWxC_sThe dimension of the matrix output by the matrix dimension transformation module is NxHxWxk_s ²×M；

Wherein, N represents the batch size of the input image, H and W represent the height and width of the image to be fused, and M is the set number of the dynamic convolution filters.

The batch size is the number of images taken at a time of training, e.g., a batch size of 32 indicates that the network trains 32 sets of low-component multispectral images and panchromatic images at a time, which is referred to herein as a batch of input images, i.e., a batch of low-component multispectral images and panchromatic images. Batch size is a basic concept in convolutional neural networks; in addition, the low-resolution multispectral image, the full-color image and the fused image are the same in size;

further, k is_s×k_sThe values of (a) are sequentially 3 × 3, 5 × 5, 7 × 7, and 9 × 9, and M is 16.

Further, the weight generation network comprises 1 3 × 3 convolutional layer, 1 void convolutional module, 1 3 × 3 convolutional layer and 1 × 1 convolutional layer which are connected in sequence;

the cavity convolution module comprises 3 cavity convolution layers with different expansion coefficients, data input into the cavity convolution module are processed by the 3 cavity convolution layers with different expansion coefficients respectively, and channel dimension splicing is carried out on a processing result and the data input into the cavity convolution module to obtain output data of the cavity convolution module.

The weight generation network generates self-adaptive weight to adjust the proportional relation of each characteristic diagram in each scale detail characteristic, and the cavity convolution adopted in the network enlarges the receptive field under the condition of not increasing parameters, so that each convolution output contains information in a larger range.

Further, the expansion coefficients of the 3 void convolutional layers with different expansion coefficients are 1, 2 and 4, respectively, and the size of the void convolutional layer filter is 3 × 3.

Furthermore, the parameters of the multi-scale filter generation network, the weight generation network and the convolution layer in the reconstructed fusion image are determined when the loss function is minimum by optimizing the loss function when the training set is used for training;

wherein the training set comprises a plurality of sets of low-resolution multispectral images, panchromatic images, and corresponding high-resolution multispectral images; the loss function refers to the error between the fused image obtained by the low-resolution multispectral image and the panchromatic image according to the steps 1-4 and the corresponding high-resolution multispectral image.

Further, the loss function is calculated using a mean square error.

Further, the loss function adopts a mean square error and a spectral loss to carry out weight summation calculation;

L＝L₁+λL₂

wherein L is₁Represents the mean square error, L₂The spectral loss is represented, L represents a total loss function, Y represents a fused image obtained by the low-resolution multispectral image and the panchromatic image in the training set according to the steps 1-4, MS represents a high-resolution multispectral image corresponding to Y in the training set, n, m and z represent space vector coordinates, and lambda represents a constant and is 0.8.

A remote sensing image fusion system based on a multi-scale dynamic convolution neural network comprises:

the multi-scale local self-adaptive dynamic filter construction unit comprises: utilizing a multi-scale filter generation network to obtain a multi-scale local self-adaptive dynamic filter;

multi-scale dynamic convolution unit: performing multi-scale dynamic convolution with the full-color image by using a multi-scale local adaptive dynamic filter;

a detail feature adjusting unit: obtaining weights of different scales by using a weight generation network, and multiplying the weights by the dynamic convolution result of the corresponding scale to obtain the detailed characteristics of the panchromatic image in different scales;

a fused image reconstruction unit: splicing the channel dimensions by using detail features of different scales, and sequentially passing the splicing result through two convolution layers of different sizes to obtain final details; and finally, adding the final details and the low-resolution multispectral image to obtain a fused image.

In the fusion system, the processing process of each unit module carries out image data processing according to a remote sensing image fusion method based on a multi-scale dynamic convolution neural network.

Advantageous effects

Compared with the prior art, the remote sensing image fusion method and system based on the multi-scale dynamic convolution neural network provided by the invention have the following remarkable advantages:

1. the multi-scale filter generation network dynamically generates a filter for each input image at each pixel position, so that the self-adaptive capacity of the network is improved, a residual error module in the network not only improves the information circulation among all residual error blocks, but also effectively avoids the degradation problem and the gradient disappearance problem caused by the increase of the network depth; the filter used by the traditional convolutional neural network is learned in training and is kept unchanged in actual fusion, while the filter used by dynamic convolution can be adaptively changed according to different inputs in actual fusion, and local adaptive dynamic filtering is performed by applying different filters to different positions of an input image.

2. And the details of different scales are extracted by utilizing multi-scale dynamic convolution, so that the feature extraction capability of the network is improved, and more accurate image details are obtained.

3. The weight generation network generates self-adaptive weight to adjust the proportional relation of each characteristic diagram in each scale detail characteristic, and the cavity convolution adopted in the network enlarges the receptive field under the condition of not increasing parameters, so that each convolution output contains information in a larger range.

Drawings

FIG. 1 is a schematic flow diagram of a process according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a main network structure in the method according to the embodiment of the present invention, in which fig. 2(a) is a diagram of a multi-scale filter generation network structure, and fig. 2(b) is a diagram of a weight generation network structure;

fig. 3 is a schematic diagram of fusion results of different fusion methods in a simulated image experiment on GeoEye1 satellite images, wherein fig. 3(a) is a low-resolution multispectral image, fig. 3(b) is a panchromatic image, fig. 3(c) is a high-resolution multispectral reference image, fig. 3(d) is an NMRA method fusion image, fig. 3(e) is a GLP method fusion image, fig. 3(f) is a PanNet method fusion image, fig. 3(g) is a Target-PNN method fusion image, fig. 3(h) is an RSIFNN method fusion image, and fig. 3(i) is a fusion image obtained by applying the method of the present invention;

fig. 4 is a schematic diagram of fusion results of different fusion methods in a simulated image experiment on a QuickBird satellite image, wherein fig. 4(a) is a low-resolution multispectral image, fig. 4(b) is a panchromatic image, fig. 4(c) is a high-resolution multispectral reference image, fig. 4(d) is an NMRA method fusion image, fig. 4(e) is a GLP method fusion image, fig. 4(f) is a PanNet method fusion image, fig. 4(g) is a Target-PNN method fusion image, fig. 4(h) is an RSIFNN method fusion image, and fig. 4(i) is a fusion image using the method according to the embodiment of the present invention;

fig. 5 is a schematic diagram of fusion results of different fusion methods on GeoEye1 satellite images in an actual image experiment, wherein fig. 5(a) is a low-resolution multispectral image, fig. 5(b) is a full-color image, fig. 5(c) is an NMRA method fusion image, fig. 5(d) is a GLP method fusion image, fig. 5(e) is a PanNet method fusion image, fig. 5(f) is a Target-PNN method fusion image, fig. 5(g) is an RSIFNN method fusion image, and fig. 5(h) is an image fused by applying the method of the present invention;

fig. 6 is a schematic diagram of fusion results of different fusion methods on QuickBird satellite images in an actual image experiment, where fig. 6(a) is a low-resolution multispectral image, fig. 6(b) is a panchromatic image, fig. 6(c) is an NMRA method fusion image, fig. 6(d) is a GLP method fusion image, fig. 6(e) is a PanNet method fusion image, fig. 6(f) is a Target-PNN method fusion image, fig. 6(g) is an RSIFNN method fusion image, and fig. 6(h) is a fusion image obtained by applying the method of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

As shown in fig. 1, the remote sensing image fusion method based on the multi-scale dynamic convolutional neural network according to the embodiment of the present invention includes the following specific implementation steps:

(1) splicing the low-resolution multispectral image and the panchromatic image in a channel dimension, and inputting the spliced low-resolution multispectral image and the panchromatic image into a multi-scale filter generation network to obtain a multi-scale local adaptive filter; the multi-scale filter generation network structure is shown in fig. 2(a), and includes a 3 × 3 convolutional layer, 4 residual modules, a 3 × 3 convolutional layer, and a 1 × 1 convolutional layer, which are connected in sequence, and then connected to 4 1 × 1 convolutional layers with different channel numbers, where the channel numbers of the 4 convolutional layers are set according to the size and number of the filters. Finally, the reshape function of tensoflow is used for respectively carrying out matrix dimension transformation on the 1 multiplied by 1 convolved results of 4 different channel numbers, and 16 filters with the sizes of 3 multiplied by 3, 5 multiplied by 5, 7 multiplied by 7 and 9 multiplied by 9 are respectively generated. The residual error module comprises 1 convolution layer of 3 multiplied by 3 and 1 convolution layer of 1 multiplied by 1, the data input into the residual error module is added with the data input into the residual error module after being sequentially processed by the 1 convolution layer of 3 multiplied by 3 and the 1 convolution layer of 1 multiplied by 1 to obtain the output data of the residual error module, the information circulation of the residual error module is improved, and the problems of gradient disappearance and degradation caused by the fact that the network passes through the depth are avoided;

(2) and performing multi-scale dynamic convolution on the full-color image by using the generated 4 filters with the scales of 3 × 3, 5 × 5, 7 × 7 and 9 × 9 respectively, and extracting detail features of the full-color image in different scales. The filter of the local dynamic convolution is generated locally and dynamically according to the position of each pixel in the input, and the single-scale dynamic convolution can be expressed as:

wherein, X_y+j,x+iWhich represents the input image, is,

representing the input filter, s represents scale, v represents the number of channels, k represents the size of the filter, y, x represent spatial coordinates,

representing the result of the dynamic convolution;

(3) respectively inputting the results D (1), D (2), D (3) and D (4) obtained by dynamic convolution into a weight generation network to generate weights with the same matrix size as D (1), D (2), D (3) and D (4), and multiplying the weights with D (1), D (2), D (3) and D (4) to obtain detail features G (1), G (2), G (3) and G (4) with 4 different scales; the weight generation network structure is shown in fig. 2(b), and includes a 3 × 3 convolutional layer, a void convolutional module, a 3 × 3 convolutional layer, and a 1 × 1 convolutional layer, which are connected in sequence; the cavity convolution module comprises 3 multiplied by 3 cavity convolution layers with expansion coefficients of 1, 2 and 4 respectively, data input into the cavity convolution module are processed by the 3 cavity convolution layers with different expansion coefficients respectively, and the processing result and the data input into the cavity convolution module are subjected to channel dimension splicing to obtain output data of the cavity convolution module. Wherein the single-scale detail features can be expressed as:

wherein the content of the first and second substances,

represents the weights generated by the weight generation network,

showing the obtained detail characteristics;

(4) reconstructing a fused image; firstly, splicing the obtained G (1), G (2), G (3) and G (4) in a channel dimension, then obtaining a final detail image G through a 3 x 3 convolution layer and a 1 x 1 convolution layer, and finally adding the G and the low-resolution multispectral image to obtain a fusion image.

Examples

The practical applicability and effectiveness of the proposed fusion method are tested by using images on a GeoEye1 satellite and a Quickbird satellite, wherein the multispectral image comprises four wave bands of red, green, blue and near infrared. The embodiment of the invention provides a simulated image experiment and an actual image experiment, wherein the image in the simulated experiment is obtained by degrading and down-sampling the actual image. The simulated low-resolution multispectral image also needs to be subjected to up-sampling, and the actual multispectral image is subjected to up-sampling in an actual image experiment and then is fused with the actual full-color image.

The method of the embodiment of the invention is mainly compared with two traditional fusion methods of NMRA and GLP and three deep learning methods of PanNet, RSIFNN and Target-PNN.

Analyzing the effect of the simulated image experiment:

fig. 3(a), 3(b) and 4(a), 4(b) are low resolution multispectral image and panchromatic image of GeoEye1 and QuickBird, respectively, fig. 3(c) and 4(c) are high resolution multispectral reference images, fig. 3(d) - (h) and 4(d) - (h) are graphs of fusion results of comparison methods of two satellites, and fig. 3(i) and 4(i) are graphs of fusion results of the proposed method of the present example. Fig. 3(d) - (e) show some loss of detail compared to the high resolution multispectral reference image, and it can be seen from fig. 3(i) that the spectra of the method of the present example are well preserved. The fused images in fig. 4(d) - (f) and fig. 4(h) are blurred, the detail loss is serious, and fig. 4(i) shows that the detail preservation of the method of the embodiment of the invention is good. The fused image has small difference with the high-resolution multispectral reference image, and the spectrum retention and the detail injection are good.

By analyzing the fusion image, the fusion result can be intuitively known, and the image fusion result is further evaluated by objectively evaluating indexes. The method of the embodiment of the invention evaluates the fusion result through five indexes, namely Q (general image quality index), SAM (spectral angle mapping), SCC (spatial correlation coefficient), ERGAS (relative integral dimension comprehensive error) and Q4 (4-waveband image fusion quality evaluation index). With the optimal values for SAM and ERGAS being 0 and the optimal values for Q, SCC and Q4 being 1. The objective evaluation indexes of the fusion results of the different methods are shown in tables 1 and 2, and as can be seen from tables 1 and 2, the indexes of the method disclosed by the embodiment of the invention on the GeoEye1 satellite and the Quickbird satellite are superior to those of the other methods. The degree of correlation between the spectral information and the spatial information of the high-resolution multispectral reference image is far higher than that of the conventional fusion method.

TABLE 1 GeoEye1 objective evaluation index for remote sensing image fusion result

TABLE 2 objective evaluation index of Quickbird remote sensing image fusion result

Analyzing the fusion effect of the actual image:

fig. 5(a), 5(b) and 6(a), 6(b) are respectively a low resolution multispectral image and a panchromatic image of GeoEye1 and QuickBird, fig. 5(c) - (f) and 6(c) - (f) are graphs of fusion results of comparative methods for two satellites, and fig. 5(h) and 5(h) are graphs of fusion results of methods described in examples of the present invention. The detail structures of fig. 5(c) and 5(d) are less optimal compared to the low resolution multispectral image, and the spectra of fig. 5(e) - (g) remain less good than the method described in the present example. FIGS. 6(d), 6(e) and 6(g) have significant detail loss, and the spectra and detail of the vegetation part of FIGS. 6(c) and 6(f) are less good than that of FIG. 6(h) for the method of the present invention. The method of the embodiment of the invention well injects the spatial details while maintaining the spectral information, thereby achieving a better fusion effect.

Because there is no high-resolution multispectral reference image in the actual fusion process, the method of the embodiment of the invention adopts an objective index without a reference image to evaluate the fusion result. The main indicators are QNR (quality no reference), D_λ(degree of spectral distortion), D_s(degree of spatial distortion). D_λEvaluation of the degree of spectral distortion of the fused image, D_sEvaluating the degree of spatial detail loss of the fused image, QNR being the combination of D_λAnd D_sA comprehensive index of (1), optimal value of (D)_λAnd D_sThe most preferable value of (2) is 0. Tables 3 and 4 are objective evaluation indexes of the fusion results of different fusion methods on two satellite images. Experiments on GeoEye1 satellites and QuickBird satellites,method of the invention example D_λIndex and D_sThe indexes are all minimum, and the comprehensive index QNR of the method disclosed by the embodiment of the invention is optimal, so that better spectrum maintenance and space detail injection are realized.

TABLE 3 GeoEye1 objective evaluation index for remote sensing image fusion result

TABLE 4 objective evaluation index of Quickbird remote sensing image fusion result

a detail feature adjusting unit: obtaining weights of different scales by using a weight generation network, and multiplying the weights by the dynamic convolution result of the corresponding scale to obtain the detail characteristics of the full-color image in different scales;

It should be understood that the functional unit modules in the embodiments of the present invention may be integrated into one processing unit, or each unit module may exist alone physically, or two or more unit modules are integrated into one unit module, and may be implemented in the form of hardware or software.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims

1. A remote sensing image fusion method based on a multi-scale dynamic convolution neural network is characterized by comprising the following steps:

and step 3: obtaining weights of different scales by using a weight generation network, and multiplying the weights by the dynamic convolution result of the corresponding scale to obtain the detailed characteristics of the panchromatic image in different scales;

and 4, step 4: reconstructing a fused image;

2. The method according to claim 1, wherein the multi-scale local adaptive dynamic filter is obtained by inputting a spliced result into a multi-scale filter generation network after splicing a low-resolution multispectral image and a panchromatic image in a channel dimension;

3. The method of claim 2, wherein the matrix dimension transformation module is implemented using reshape function of tensoflow;

4. The method of claim 2, wherein k is_s×k_sThe values of (a) are sequentially 3 × 3, 5 × 5, 7 × 7, and 9 × 9, and M is 16.

5. The method of claim 1, wherein the weight generation network comprises 1 3 x 3 convolutional layer, 1 hole convolutional module, 1 3 x 3 convolutional layer, and 1 x 1 convolutional layer connected in sequence;

6. The method of claim 5, wherein the expansion coefficients of the 3 different expansion coefficient hole convolutional layers are 1, 2, and 4, respectively, and the size of the hole convolutional layer filter is 3 x 3.

7. The method according to any one of claims 1 to 6, wherein the parameters of the multi-scale filter generation network, the weight generation network and the reconstructed convolution layer in the fused image are determined by optimizing a loss function so that the loss function is minimized when training is performed by using a training set;

8. The method of claim 7, wherein the loss function is calculated using a mean square error.

9. The method of claim 7, wherein the loss function is computed as a sum of weights using mean square error and spectral loss;

L＝L₁+λL₂

10. A remote sensing image fusion system based on a multi-scale dynamic convolution neural network is characterized by comprising:

a detail feature adjusting unit: obtaining weights of different scales by using a weight generation network, and multiplying the weights by the dynamic convolution result of the corresponding scale to obtain the detailed characteristics of the full-color image in different scales;