CN112819737B

CN112819737B - Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution

Info

Publication number: CN112819737B
Application number: CN202110042742.0A
Authority: CN
Inventors: 彭进业; 付毅豪; 张二磊; 王珺; 刘璐; 俞凯; 祝轩; 赵万青; 何林青
Original assignee: Northwest University
Current assignee: Northwest University
Priority date: 2021-01-13
Filing date: 2021-01-13
Publication date: 2023-04-07
Anticipated expiration: 2041-01-13
Also published as: CN112819737A

Abstract

The invention discloses a remote sensing image fusion method of a 3D convolution-based multi-scale attention depth convolution network, which fuses high spectral resolution owned by a multispectral image and high spatial resolution owned by a panchromatic image to obtain the multispectral image with high spatial resolution and high spectral resolution. A3D multi-scale attention deep convolution network model (MSAC-Net) is designed by utilizing a U-Net network structure framework in deep learning. In order to keep the spectral resolution in the multispectral, the model integrally uses 3D convolution to extract the characteristics of the information on the spectral dimension; to capture more spatial detail, a mechanism of attention is introduced at the model's jump junction to learn region details. In the decoding stage of the model, a plurality of reconstruction layers containing multi-scale spatial information are introduced to calculate the reconstruction result, so that the model is encouraged to learn multi-scale representations of different layers, and multi-level reference is provided for the final fusion result. The fusion result of the final image is effectively improved.

Description

Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution

Technical Field

The invention belongs to the technical field of information, relates to an image processing technology, and particularly relates to a remote sensing image fusion method of a multi-scale attention depth convolution network based on 3D convolution.

Background

The remote sensing satellite can acquire a full-color (PAN) image of the same scene while shooting and acquiring a Multispectral (MS) image, wherein the multispectral image is rich in spectral information, but low in spatial resolution and poor in definition, and the full-color image is high in spatial resolution and low in spectral resolution; the spatial and spectral resolutions of the two contradict each other. The advantages of the two are fused to obtain a multispectral image with high spatial and spectral resolution, which is also a great demand at present.

At present, deep learning is widely applied to various research fields, and a new solution is provided for various fields. Among them, in the field of deep learning, 3D convolution has proven to be a very effective method of exploring volume data. Compared with 2D convolution operation, 3D convolution not only retains the features in the space dimension, but also takes the feature extraction in the spectrum dimension into consideration. The operation method is more consistent with the imaging principle of the spectral image, so that the appearance of 3D convolution is a new way for solving the problem of the traditional 2D convolution. However, due to the limited data required for using 3D convolution, it is not widely used in current multispectral panchromatic sharpening.

In order to fully utilize the internal relation among the pixels in a single wavelength band, the conventional method generally adopts observation results of different scales for common fusion so as to achieve that the final fused image simultaneously has image features under different scales. However, this approach has the disadvantage that due to the special nature of the multispectral image cube structure, the use of scale information, while enhancing the spatial detail features of the image, may result in loss of information in the spectral dimension and even spectral distortion.

Furthermore, attention mechanism (Attention mechanism) has been proposed in recent years as inspired by the human perception system. Due to its unique feature of calculation that can allocate calculation resources to a part of the region of interest information, it is widely used in the field of image processing. Unfortunately, many of the attention mechanisms proposed so far cannot be directly applied to multispectral panchromatic sharpening, and the final result is blurred or distorted by improper use of the attention mechanism, so that information in the spatial dimension and the spectral dimension is lost, and the geometric feature representation of the image is incomplete.

Disclosure of Invention

In order to fully utilize the correlation among pixels and bands of a multispectral image and the characteristic of high spatial resolution of a panchromatic image to reduce the workload of image processing and improve the accuracy of image fusion, the invention aims to provide a remote sensing image fusion method (MSAC-Net) of a 3D multi-scale attention depth convolution network based on deep learning, which adopts a 3D convolution method, extracts spatial details from the panchromatic image by using an attention mechanism while preserving the spectral details of the multispectral image through a deep learning model, and learns a final result and a plurality of intermediate scale results to obtain a required high-resolution multispectral image so as to solve the problems of incomplete remote sensing image fusion, poor fusion quality and poor fusion effect in the prior art.

In order to realize the task, the invention adopts the following technical scheme to realize the following steps:

a remote sensing image fusion method of a multi-scale attention depth convolution network based on 3D convolution is characterized by comprising the following steps:

the method comprises the following steps: acquiring a pair of panchromatic images and multispectral images with the same scene and the same angle as a sample in a test data set; acquiring multiple pairs of panchromatic images and multispectral images of multiple scenes to obtain a training data set;

for a sample in the test data set, performing up-sampling processing on a multispectral image in the sample to reach the same size as a full-color image; then, carrying out cascade copying on the full-color image to obtain a full-color image cube with the same wave band number as the multispectral image;

for the training data set, down-sampling all panchromatic images in the training data set to reach the same size as the multispectral images in the training data set; then copying and cascading operation is carried out on the full-color image obtained after down sampling so as to obtain a full-color image cube with the same wave band number as the multispectral image; the copying cascade operation means that firstly copying the full-color image, the quantity of which is the number of wave bands of the multispectral image, and then cascading all the copied images on the wave band dimension to obtain a full-color image cube;

step two: inputting samples of the training data set into a MASC-Net model to obtain a fusion result of the multispectral image and the panchromatic image cube

And reconstructing and outputting the intermediate image of the model by using a reconstruction block in the model:

wherein the content of the first and second substances,

is a characteristic diagram obtained by the model on the ith layer scale, R (-) is a reconstruction block of the model corresponding scale,

is an intermediate scale image derived from an i-th layer reconstruction block, based on image data obtained from a pixel data block and based on image data obtained from a pixel data block>

Reconstructing a final image for the model;

step three; training the MASC-Net model in the second step by using a training data set by adopting a random gradient descent algorithm until convergence so as to obtain a fusion model;

utilizing bicubic interpolation to carry out down-sampling on the reference image to obtain an image Y with the corresponding size of each intermediate image _i ：

Y _i ＝D(Y _i-1 ),i＝2,3,…,k

Wherein D (-) is a downsampling operation, Y _i To simulate a reference image at the ith scale, Y _i-1 To simulate the intermediate image at the i-1 st scale, Y ₁ Is a reference picture.

In the process of training the network by the stochastic gradient descent algorithm, continuously optimizing a loss function until convergence, wherein the loss function of the model is as follows:

where λ is the weight of the scale information, l ₁ As a loss function,/ _i Is a loss function at the ith scale; i is the number of scaling scales.

Step four: and aiming at the full-color image and the multispectral image to be fused of a certain scene, obtaining a final fused image by utilizing the fused model obtained after training in the third step according to the up-sampling in the first step.

Further, the method for performing upsampling and downsampling processes and copy concatenation in the step one includes:

step 2.1, in the training data set, carrying out down-sampling with interval of p on the original multispectral image and the panchromatic image by adopting an interpolation method; and then, performing up-sampling p times on the down-sampled multispectral image by utilizing bicubic interpolation to obtain the multispectral image with the low resolution ratio which is the same as the size of the down-sampled panchromatic image.

Step 2.2, obtaining the image sets with the same number as the multiple spectral bands by copying the full-color image sampled in step 2.1, and then cascading in the spectral dimension to obtain an h × w × c data cube, that is:

wherein k belongs to c, c is the wave band number of the multispectral image, and k represents the k-th wave band number of the number c; p _HR Is an original full-color image;

and 2.3, performing the same down-sampling operation and copying cascade operation as in the step 2.1 and the step 2.2 on the full-color image in the test data set.

Further, the reconstruction block of step two includes:

step 3.1, obtaining the low-level semantic features of the i-th layer through an MASC-Net model

And the high-level characteristic of the corresponding layer->

Obtaining the grid attention ^ of the current layer by convolution>

The formula is as follows:

where W and b are weight and offset, respectively, σ ₁ Activating a function for the ReLU;

in the step 3.2, the step of the method,

and/or>

Multiplying to obtain corresponding high-level semantic information>

The formula is as follows:

wherein σ ₂ The function is activated for the sigmoid and,

for feature-by-feature map multiplication, is selected>

Obtaining high-level semantic information of a current layer;

step 3.3, for

And &>

Concatenation is performed followed by feature extraction by convolution to obtain high level features->

The cascade formula is:

wherein cat (. Cndot.) is a cascade operation;

the extraction characteristic formula is as follows:

wherein, MASC _i (-) represents a convolution operation;

and 3.4, reconstructing the multispectral image under the ith scale through the independent corresponding layer reconstruction block, wherein the reconstruction formula is as follows:

wherein, the first and the second end of the pipe are connected with each other,

representing the convolution calculation.

Further, the loss function of the model in the third step includes:

step 4.1, obtaining a reference image Y ₁ Reducing the image size to the reconstructed image under each scale by adopting a bicubic interpolation method

The formula is:

Y _i ＝D(Y _i-1 )

wherein D (-) is downsampling by bicubic interpolation, Y _i To simulate a reference image at the ith scale, Y ₁ Is a reference image;

step 4.2, the model constructs the multi-scale loss by using the down-sampling image of the intermediate reconstructed image and the reference image, and the formula is as follows:

wherein l _i Is l at the ith scale ₁ The loss of the carbon dioxide gas is reduced,

for the reconstructed image at the i-th scale, Y _i Is the ith scale reference image;

finally, according to steps 4.1 and 4.2, a loss function specific to the model is constructed:

wherein λ is the weight of the multi-scale loss;

and 4.3, continuously optimizing loss until convergence in the process of training the network by the random gradient descent algorithm according to the loss functions constructed in the steps 4.1 and 4.2.

Compared with the prior art, the remote sensing image fusion method of the 3D convolution-based multi-scale attention depth convolution network has the following advantages:

1. the deep learning method based on 3D convolution is different from other deep learning methods in that spectral information in the multispectral image is reserved and spectral distortion is reduced by utilizing 3D convolution.

2. The multi-scale information idea is adopted, the intermediate image under each scale is reconstructed and output by using the multi-scale reconstruction block, and the final image is constrained by using the intermediate image, so that the final image is fused with the scale information of each scale, and the spatial details on each scale space on the multispectral image are effectively reserved.

3. By adopting a grid attention mechanism, regional feature extraction is carried out on the low-level semantic information in the model, regional detail features in the low-level semantic information are concerned and are fused with the high-level semantic information, so that the spatial detail information of the fused image is effectively improved, and the spatial information storage performance of the fused model is improved.

4. The relationship between the image characteristics of the panchromatic image and the multispectral image and the image is fully considered in the modeling solving process, so that the fusion is more comprehensive, effective and accurate.

Drawings

FIG. 1 is a frame diagram of a remote sensing image fusion method (MSAC-Net) of a 3D multi-scale attention depth convolution network based on deep learning.

FIG. 2 shows the result of the IKONOS satellite image fusion by different fusion methods in a simulation experiment; fig. 2 (a) is an up-sampling multispectral image, fig. 2 (b) is a panchromatic image, fig. 2 (c) is a reference image, fig. 2 (d) is an SR method fused image, fig. 2 (e) is a GS method fused image, fig. 2 (f) is an indication method fused image, fig. 2 (g) is a PNN method fused image, fig. 2 (h) is a PanNet method fused image, and fig. 2 (i) is a MASC-Net method fused image.

FIG. 3 shows the results of the fusion of Quickbird satellite images by different fusion methods in a simulation experiment; fig. 3 (a) is an up-sampling multispectral image, fig. 3 (b) is a panchromatic image, fig. 3 (c) is a reference image, fig. 3 (d) is an SR method fused image, fig. 3 (e) is a GS method fused image, fig. 3 (f) is an indication method fused image, fig. 3 (g) is a PNN method fused image, fig. 3 (h) is a PanNet method fused image, and fig. 3 (i) is a MASC-Net method fused image.

FIG. 4 shows the result of fusion of IKONOS satellite images by different fusion methods in practical experiments; fig. 4 (a) is an up-sampling multispectral image, fig. 4 (b) is a panchromatic image, fig. 4 (c) is an SR method fused image, fig. 4 (d) is a GS method fused image, fig. 4 (e) is an indication method fused image, fig. 4 (f) is a PNN method fused image, fig. 4 (g) is a PanNet method fused image, and fig. 4 (h) is MASC-Net method fused.

FIG. 5 shows the results of fusion of Quickbird satellite images by different fusion methods in practical experiments; fig. 5 (a) is an up-sampling multispectral image, fig. 5 (b) is a panchromatic image, fig. 5 (c) is an SR method fused image, fig. 5 (d) is a GS method fused image, fig. 5 (e) is an indication method fused image, fig. 5 (f) is a PNN method fused image, fig. 5 (g) is a PanNet method fused image, and fig. 5 (h) is MASC-Net method fused.

The invention is described in further detail below with reference to the figures and examples

Detailed Description

According to the remote sensing image fusion method (MSAC-Net) based on the 3D convolution multi-scale attention depth convolution network, on one hand, scale information of an image under each scale is fully utilized, and a final fusion result and each intermediate scale image are learned; on the other hand, in the process of connecting high-level semantics and low-level semantics by using the U-Net network, an Attention grid mechanism (Attention Gate) is introduced, so that the characteristic diagram of network learning focuses more on the local spatial details of the image, and the performance of panchromatic sharpening is improved; and finally, introducing 3D convolution, and extracting information on the spectral dimension by using the calculation characteristic of the 3D convolution so as to reduce the spectral distortion of the fused multispectral image.

Referring to fig. 1, the above remote sensing image fusion method (MASC-Net) based on the 3D convolution multi-scale attention depth convolution network specifically includes the following steps:

step 1, acquiring a pair of panchromatic images and multispectral images with the same scene and the same angle as a sample in a test data set; acquiring multiple pairs of panchromatic images and multispectral images of multiple scenes to obtain a training data set;

for a sample in the test data set, performing up-sampling processing on a multispectral image in the sample to reach the same size as a full-color image; then copying and cascading the full-color image to obtain a full-color image cube with the same wave band number as the multispectral image;

for the training data set, performing down-sampling processing of a scaling factor p =4 on all image sizes in the training data set; then, performing up-sampling operation on the down-sampled multispectral image to obtain an input multispectral image M'; performing a copy cascade on the downsampled panchromatic image to obtain a panchromatic image cube P 'with the same number of bands as the multispectral image'

The up-sampling and down-sampling interpolation method can obtain an image with a size larger or smaller than that of the original image under the condition of keeping a certain spatial characteristic. Specifically, in this embodiment, bicubic interpolation is used for upsampling the images in the training set and the test set, and equidistant interpolation is used for downsampling. Bicubic interpolation can obtain smoother image edges compared with other interpolation methods. Moreover, using upsampling aligns the data input to the model; in the down sampling, the interval is taken as 4, and the down sampling scaling operation is carried out on the multispectral image.

The copy cascade adopts simple copy to obtain the same original image; all the copied images are then cascaded in the wave band dimension to form cubic data P' with the same size H × W × c as the original multispectral image. The obtained up-sampled multispectral images M 'and P' are then concatenated in a fourth dimension to obtain a 2 × h × w × c input data X.

Step 2, inputting the training sample X of the training data set into MASC-Net to obtain the fusion result of the multispectral image and the panchromatic image cube

Reconstruction block process, in particular:

step 2.1, obtaining the low-level semantic features of the i-th layer through an MASC-Net model

And the high-level characteristic of the corresponding layer->

Obtaining the grid attention ^ of the current layer by convolution>

The formula is as follows:

in the step 2.2, the step of the method,

and/or>

Multiplying to obtain corresponding high-level semantic information>

The formula is as follows:

wherein σ ₂ The function is activated for the sigmoid and,

multiplying by feature-by-feature maps; />

Obtaining high-level semantic information of a current layer;

step 2.3, for

And &>

The cascade formula is:

wherein cat (. Cndot.) is a cascade operation;

the extraction characteristic formula is as follows:

wherein, MASC _i (-) represents a convolution operation;

step 2.4, reconstructing the multispectral image under the ith scale through the independent corresponding layer reconstruction block, wherein the reconstruction formula is as follows:

wherein the content of the first and second substances,

representing a convolution calculation.

Step 2.5, on the first scale, obtaining a feature map on the first scale through feature extraction

Then finally heavy with the modelBuilding block to obtain final result>

Step 3, training the MASC-Net model in the step 2 by using a training data set by adopting a random gradient descent algorithm, and correcting a fusion result by obtaining an intermediate image through an intermediate reconstruction layer to obtain a fusion model, wherein the specific steps are as follows:

step 3.1, obtaining a reference image Y ₁ Reducing the image size to the reconstructed image under each scale by adopting a bicubic interpolation method

The formula is:

Y _i ＝D(Y _i-1 )

wherein D (-) is downsampling by bicubic interpolation, Y _i To simulate a reference image at the ith scale, Y ₁ Is a reference image.

Step 3.2, the model constructs the multi-scale loss by using the intermediate reconstructed image and the down-sampling image of the reference image, and the formula is as follows:

for the reconstructed image at the i-th scale, Y _i Is the ith scale reference image.

Finally, according to steps 3.1 and 3.2, a loss function specific to the model is constructed:

where λ is the weight of the multi-scale loss. l ₁ To damageLoss function,/ _i Is a loss function at the ith scale; i is the number of scaling scales.

And 3.3, continuously optimizing loss until convergence in the process of training the network by the random gradient descent algorithm according to the loss functions constructed in the steps 3.1 and 3.2.

The method specifically comprises the following steps: in the training data set, p samples are respectively selected to form a small batch of samples, wherein p =32 is selected, and then random gradient descent is carried out on the samples:

wherein the content of the first and second substances,

is a loss function in step 3>

Where m represents p samples, and then the model is updated using the gradient descent method:

here, the first and second liquid crystal display panels are,

is the loss function versus the parameter theta _i The partial derivative of (c). α is the learning rate set by the model.

After the steps, the method can be simplified into the following steps:

in the formula (I), the compound is shown in the specification,

representing the gradient of the loss function for which it was found. By taking a sample at random each time the model is updated

To update the parameters.

The embodiment is as follows:

in the embodiment, 2 satellite remote sensing images are adopted to verify the effectiveness of the provided fusion algorithm; the spatial resolution of the full-color image and the multispectral image which can be shot and obtained by the IKONOS satellite is 1 meter and 4 meters respectively; the spatial resolutions of the panchromatic and multispectral images provided by the QuickBird satellite are 0.7 m and 2.8 m, respectively; the multispectral images acquired by the two satellites respectively comprise four wave bands of red, green, blue and near infrared; the full-color image size used in the experiment was 256 × 256 and the multispectral image size was 64 × 64.

In order to better evaluate the practicability of the remote sensing image fusion method (MASC-Net) based on the 3D convolution multi-scale attention depth convolution network of the present embodiment, the present embodiment provides two experiment types, which are a simulated image experiment and an actual image experiment, respectively, wherein the simulated image experiment reduces the spatial resolution of the panchromatic image and the multispectral image by 4 times at the same time, and uses the panchromatic image and the multispectral image as simulated image data to be fused, and uses the original multispectral image as a standard fusion result for reference, and the actual image experiment directly fuses the real images.

The remote sensing image fusion method (MASC-Net) based on the 3D convolution multi-scale attention depth convolution network provided in this embodiment is mainly compared with the following five widely used image fusion methods: the method comprises a sparse representation-based method SR, a component substitution-based method GS, a multi-resolution analysis-based method Indusion, and a deep learning-based method PNN and PanNet.

Training a network by using a Pythrch software package in an experiment, performing 25000 iterations approximately, and setting the batch size to be 32; for the stochastic gradient descent algorithm, the weight attenuation is set to 10 ^-3 Momentum of 0.9; the invention sets the MASC-Net network depth to be 5.

Analyzing the effect of the simulated image experiment:

FIG. 2 is a diagram of IKONOS satellite simulation experiment results; FIGS. 2 (a), (b) are an up-sampled multispectral image and a panchromatic image, respectively, FIG. 2 (c) is a reference image, FIGS. 2 (d) - (h) are fused images of five comparison methods, respectively, and FIG. 2 (i) is a fused image of the MASC-Net method (i.e., the present invention); by comparing the fused image with the reference image, it can be seen that all the methods can improve the spatial resolution of the original multispectral image, but obviously the SR has obvious color deviation in vision, the SR and PNN methods have serious spectral distortion, and the vegetation area synthesized by Indusion and PanNet has too sharp edge; as can be seen from fig. 2 (i), the spatial resolution of the multispectral image is improved, the spectral information of the source image is better retained, and the obtained fused image is better and more natural.

FIG. 3 is a graph of results of a Quickbird satellite simulation experiment; FIGS. 3 (a), (b) are an up-sampled multispectral image and a panchromatic image, respectively, FIG. 3 (c) is a reference image, FIGS. 3 (d) - (h) are fused images of five comparison methods, respectively, and FIG. 3 (i) is a fused image of the MASC-Net method (i.e., the present invention); as can be seen from fig. 3 (d) and (g), the color of the SR method and PNN method fused image is changed very much, and has a significant spectral difference from the reference image, and it is found from fig. 3 (e), (f) and (h) that the GS, indusion and PanNet method fused image has a large difference from the reference image in the bare land area at the lower right corner of the image; but the present embodiment and the reference image have small differences in both spectral resolution and spatial resolution.

Visual evaluation comparison can provide more visual understanding for the fusion result, but the most correct judgment is difficult to be given to the fusion result by purely depending on subjective evaluation, so that the fusion result needs to be evaluated together with objective indexes; in the embodiment, six objective evaluation indexes of CC, PSNR, Q4, SAM, SSIM and ERGAS are adopted to comprehensively evaluate the image; wherein CC represents a correlation coefficient, and the similarity degree of spectral and spatial information between the wave band images of the two images is evaluated from the aspect of statistical correlation; PSNR (peak signal-to-noise ratio) is an objective standard for evaluating images; q4 is an objective index for comprehensively evaluating the spatial quality and the spectral quality of the fused image, and the optimal value is 1; SAM represents global spectral distortion measurement, reflecting the color difference between the two images, with an optimal value of 0; the SSIM realizes the measurement of the similarity of the reference image and the structure of each wave band image in the fusion result through the comparison of the brightness, the contrast and the structure; ERGAS represents a global index of fusion image quality, and the optimal value is 0.

Tables 1 and 2 are objective indexes of results of different fusion methods in IKONOS and Quickbird satellite simulation image experiments respectively; as can be seen from tables 1 and 2, most objective indexes of the embodiment are superior to those of other methods, wherein the CC and Q4 values of the embodiment are much higher than those of other methods, that is, the method provided by the embodiment has the strongest correlation between the fused image and the reference image, and can well improve the spatial resolution of the multispectral image and maintain the spectral characteristics.

By integrating visual evaluation and objective index evaluation, the remote sensing image fusion method based on the 3D convolution multi-scale attention depth convolution network can well obtain a fusion image with high space and high spectral resolution.

Table 1: objective index of IKONOS satellite image simulation experiment fusion result

Table 2: objective index of fusion result of Quickbird satellite image simulation experiment

Analyzing the experimental effect of the actual image:

FIG. 4 is a diagram of IKONOS satellite practical experiment results; fig. 4 (a) and (b) are respectively an up-sampling multispectral image and a panchromatic image, fig. 4 (c) to (g) are respectively fusion images of five comparison methods, and fig. 4 (h) is a fusion image of an MSAC-Net method; it can be seen that fig. 4 (d) has little spectral distortion, fig. 4 (f) appears very blurred, and fig. 4 (e) and (g) have poor edge extraction; in general, the remote sensing image fusion method based on the 3D convolution multi-scale attention depth convolution network has the advantages of high fusion image spatial resolution, small spectral distortion and good overall visual effect.

FIG. 5 is a graph of results of a Quickbird satellite experiment; FIGS. 5 (a), (b) are an up-sampled multispectral image and a panchromatic image, respectively, FIGS. 5 (c) - (g) are fused images of five comparison methods, respectively, and FIG. 5 (h) is a fused image of MASC-Net method; FIG. 5 (c) the fused image is over-sharpened, the colors of the fused images in FIGS. 5 (d) and (f) are obviously changed, and the overall spatial resolution of the fused images in FIGS. 5 (e) and (g) is not high; as shown in fig. 5 (h), the remote sensing image fusion method based on the 3D convolution multi-scale attention depth convolution network of the present embodiment is adopted, and the obtained fused image has a clearer outline compared with other methods.

In the actual image experiment, because no reference image exists, in order to effectively and objectively evaluate each fusion result, an objective evaluation index QNR without the reference image is adopted to evaluate the image fusion quality; QNR measures the brightness, contrast and local correlation between the fused image and the original image, and includes a spatial information loss index D _s And spectral information loss index D _λ Wherein the optimum value of QNR is 1 and D _s And D _λ The most preferable value of (2) is 0.

Tables 3 and 4 are objective indexes of results of different fusion methods in IKONOS and Quickbird satellite actual image experiments respectively; as can be seen from tables 3 and 4, by using the remote sensing image fusion method based on the 3D convolution multi-scale attention depth convolution network of the present embodiment, the loss of spatial detail information generated in the fusion process is minimal, and although the spectral loss is slightly higher, the objective indicator QNR without reference for evaluation is optimal compared with all other methods.

In summary, the remote sensing image fusion method based on the 3D convolution multi-scale attention depth convolution network according to the embodiment greatly improves the spatial resolution of the fusion image while well retaining the spectral information of the multi-spectral image.

Table 3: objective index of IKONOS satellite image actual experiment fusion result

Table 4: objective index of practical experiment fusion result of Quickbird satellite image

/>

Claims

1. A remote sensing image fusion method of a multi-scale attention depth convolution network based on 3D convolution is characterized by comprising the following steps:

inputting the samples of the training data set into the 3D multi-scale attention depth convolution network model to obtain the fusion result of the multispectral image and the panchromatic image cube

wherein the content of the first and second substances,

is a feature map obtained by the model on the ith scale, R (-) is a reconstructed block of the corresponding scale of the model, and is based on the corresponding scale>

Is an image of an intermediate scale obtained by the i-th layer reconstruction block, is based on>

Reconstructing a final image for the model;

the reconstruction block comprises:

step 3.1, obtaining low-level semantic features of the ith layer through a 3D multi-scale attention depth convolution network model

And the high-level characteristic of the corresponding layer->

Obtaining the grid attention ^ of the current layer by convolution>

The formula is as follows:

wherein W and b are the weights respectivelyHeavy sum offset, σ ₁ Activating a function for the ReLU;

in the step 3.2, the step of the method,

and/or>

Multiplying to obtain corresponding high-level semantic information>

The formula is as follows:

wherein σ ₂ The function is activated for the sigmoid and,

for feature-by-feature map multiplication, is selected>

Obtaining high-level semantic information of a current layer;

step 3.3, for

And &>

The cascade formula is:

wherein cat (. Cndot.) is a cascade operation;

the extraction characteristic formula is as follows:

wherein, MASC _i (-) represents a convolution operation;

wherein the content of the first and second substances,

represents a convolution calculation;

step three; training the 3D multi-scale attention depth convolution network model in the second step by using a training data set by adopting a random gradient descent algorithm until convergence so as to obtain a fusion model;

Y _j ＝(Y _i-1 ),＝2,3,…,

Wherein D (-) is a downsampling operation, Y _j To simulate an intermediate image of the j-th scale, Y _j-1 To simulate an intermediate image at the j-1 st scale, Y ₁ Is a reference image;

in the process of training the network by the stochastic gradient descent algorithm, continuously optimizing the loss function until convergence, wherein the loss function of the model is as follows:

where λ is the weight of the scale information, l ₁ As a loss function,/ _j Is a loss function at the j-th scale; i is the number of scaling scales;

2. The method of claim 1, wherein the upsampling and downsampling processes and the copy concatenation of step one comprises the steps of:

step 2.1, in the training data set, adopting an interpolation method to carry out down-sampling with the interval of 4 on the original multispectral image and the panchromatic image; then, performing up-sampling on the down-sampled multispectral image by 4 times by utilizing bicubic interpolation to obtain a low-resolution multispectral image with the same size as the down-sampled panchromatic image;

step 2.2, obtaining the image sets with the same number as the multiple spectral bands for the full color image sampled in step 2.1 by means of copying, and then cascading in the spectral dimension to obtain a data cube of h × w × c, that is:

wherein k belongs to c, c is the wave band number of the multispectral image, and k represents the k-th wave band number of the number c; p _HR The original full-color image is obtained;

step 2.3, for the test data set, the full color image is subjected to the same down-sampling operation and copy cascade operation as in step 2.1 and step 2.2.