CN112819737B - Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution - Google Patents

Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution Download PDF

Info

Publication number
CN112819737B
CN112819737B CN202110042742.0A CN202110042742A CN112819737B CN 112819737 B CN112819737 B CN 112819737B CN 202110042742 A CN202110042742 A CN 202110042742A CN 112819737 B CN112819737 B CN 112819737B
Authority
CN
China
Prior art keywords
image
scale
multispectral
model
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110042742.0A
Other languages
Chinese (zh)
Other versions
CN112819737A (en
Inventor
彭进业
付毅豪
张二磊
王珺
刘璐
俞凯
祝轩
赵万青
何林青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwest University
Original Assignee
Northwest University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwest University filed Critical Northwest University
Priority to CN202110042742.0A priority Critical patent/CN112819737B/en
Publication of CN112819737A publication Critical patent/CN112819737A/en
Application granted granted Critical
Publication of CN112819737B publication Critical patent/CN112819737B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4023Scaling of whole images or parts thereof, e.g. expanding or contracting based on decimating pixels or lines of pixels; based on inserting pixels or lines of pixels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • G06T2207/10036Multispectral image; Hyperspectral image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a remote sensing image fusion method of a 3D convolution-based multi-scale attention depth convolution network, which fuses high spectral resolution owned by a multispectral image and high spatial resolution owned by a panchromatic image to obtain the multispectral image with high spatial resolution and high spectral resolution. A3D multi-scale attention deep convolution network model (MSAC-Net) is designed by utilizing a U-Net network structure framework in deep learning. In order to keep the spectral resolution in the multispectral, the model integrally uses 3D convolution to extract the characteristics of the information on the spectral dimension; to capture more spatial detail, a mechanism of attention is introduced at the model's jump junction to learn region details. In the decoding stage of the model, a plurality of reconstruction layers containing multi-scale spatial information are introduced to calculate the reconstruction result, so that the model is encouraged to learn multi-scale representations of different layers, and multi-level reference is provided for the final fusion result. The fusion result of the final image is effectively improved.

Description

Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution
Technical Field
The invention belongs to the technical field of information, relates to an image processing technology, and particularly relates to a remote sensing image fusion method of a multi-scale attention depth convolution network based on 3D convolution.
Background
The remote sensing satellite can acquire a full-color (PAN) image of the same scene while shooting and acquiring a Multispectral (MS) image, wherein the multispectral image is rich in spectral information, but low in spatial resolution and poor in definition, and the full-color image is high in spatial resolution and low in spectral resolution; the spatial and spectral resolutions of the two contradict each other. The advantages of the two are fused to obtain a multispectral image with high spatial and spectral resolution, which is also a great demand at present.
At present, deep learning is widely applied to various research fields, and a new solution is provided for various fields. Among them, in the field of deep learning, 3D convolution has proven to be a very effective method of exploring volume data. Compared with 2D convolution operation, 3D convolution not only retains the features in the space dimension, but also takes the feature extraction in the spectrum dimension into consideration. The operation method is more consistent with the imaging principle of the spectral image, so that the appearance of 3D convolution is a new way for solving the problem of the traditional 2D convolution. However, due to the limited data required for using 3D convolution, it is not widely used in current multispectral panchromatic sharpening.
In order to fully utilize the internal relation among the pixels in a single wavelength band, the conventional method generally adopts observation results of different scales for common fusion so as to achieve that the final fused image simultaneously has image features under different scales. However, this approach has the disadvantage that due to the special nature of the multispectral image cube structure, the use of scale information, while enhancing the spatial detail features of the image, may result in loss of information in the spectral dimension and even spectral distortion.
Furthermore, attention mechanism (Attention mechanism) has been proposed in recent years as inspired by the human perception system. Due to its unique feature of calculation that can allocate calculation resources to a part of the region of interest information, it is widely used in the field of image processing. Unfortunately, many of the attention mechanisms proposed so far cannot be directly applied to multispectral panchromatic sharpening, and the final result is blurred or distorted by improper use of the attention mechanism, so that information in the spatial dimension and the spectral dimension is lost, and the geometric feature representation of the image is incomplete.
Disclosure of Invention
In order to fully utilize the correlation among pixels and bands of a multispectral image and the characteristic of high spatial resolution of a panchromatic image to reduce the workload of image processing and improve the accuracy of image fusion, the invention aims to provide a remote sensing image fusion method (MSAC-Net) of a 3D multi-scale attention depth convolution network based on deep learning, which adopts a 3D convolution method, extracts spatial details from the panchromatic image by using an attention mechanism while preserving the spectral details of the multispectral image through a deep learning model, and learns a final result and a plurality of intermediate scale results to obtain a required high-resolution multispectral image so as to solve the problems of incomplete remote sensing image fusion, poor fusion quality and poor fusion effect in the prior art.
In order to realize the task, the invention adopts the following technical scheme to realize the following steps:
a remote sensing image fusion method of a multi-scale attention depth convolution network based on 3D convolution is characterized by comprising the following steps:
the method comprises the following steps: acquiring a pair of panchromatic images and multispectral images with the same scene and the same angle as a sample in a test data set; acquiring multiple pairs of panchromatic images and multispectral images of multiple scenes to obtain a training data set;
for a sample in the test data set, performing up-sampling processing on a multispectral image in the sample to reach the same size as a full-color image; then, carrying out cascade copying on the full-color image to obtain a full-color image cube with the same wave band number as the multispectral image;
for the training data set, down-sampling all panchromatic images in the training data set to reach the same size as the multispectral images in the training data set; then copying and cascading operation is carried out on the full-color image obtained after down sampling so as to obtain a full-color image cube with the same wave band number as the multispectral image; the copying cascade operation means that firstly copying the full-color image, the quantity of which is the number of wave bands of the multispectral image, and then cascading all the copied images on the wave band dimension to obtain a full-color image cube;
step two: inputting samples of the training data set into a MASC-Net model to obtain a fusion result of the multispectral image and the panchromatic image cube
Figure GDA0003955940100000031
And reconstructing and outputting the intermediate image of the model by using a reconstruction block in the model:
Figure GDA0003955940100000032
wherein the content of the first and second substances,
Figure GDA0003955940100000033
is a characteristic diagram obtained by the model on the ith layer scale, R (-) is a reconstruction block of the model corresponding scale,
Figure GDA0003955940100000034
is an intermediate scale image derived from an i-th layer reconstruction block, based on image data obtained from a pixel data block and based on image data obtained from a pixel data block>
Figure GDA0003955940100000035
Reconstructing a final image for the model;
step three; training the MASC-Net model in the second step by using a training data set by adopting a random gradient descent algorithm until convergence so as to obtain a fusion model;
utilizing bicubic interpolation to carry out down-sampling on the reference image to obtain an image Y with the corresponding size of each intermediate image i
Y i =D(Y i-1 ),i=2,3,…,k
Wherein D (-) is a downsampling operation, Y i To simulate a reference image at the ith scale, Y i-1 To simulate the intermediate image at the i-1 st scale, Y 1 Is a reference picture.
In the process of training the network by the stochastic gradient descent algorithm, continuously optimizing a loss function until convergence, wherein the loss function of the model is as follows:
Figure GDA0003955940100000036
where λ is the weight of the scale information, l 1 As a loss function,/ i Is a loss function at the ith scale; i is the number of scaling scales.
Step four: and aiming at the full-color image and the multispectral image to be fused of a certain scene, obtaining a final fused image by utilizing the fused model obtained after training in the third step according to the up-sampling in the first step.
Further, the method for performing upsampling and downsampling processes and copy concatenation in the step one includes:
step 2.1, in the training data set, carrying out down-sampling with interval of p on the original multispectral image and the panchromatic image by adopting an interpolation method; and then, performing up-sampling p times on the down-sampled multispectral image by utilizing bicubic interpolation to obtain the multispectral image with the low resolution ratio which is the same as the size of the down-sampled panchromatic image.
Step 2.2, obtaining the image sets with the same number as the multiple spectral bands by copying the full-color image sampled in step 2.1, and then cascading in the spectral dimension to obtain an h × w × c data cube, that is:
Figure GDA0003955940100000041
wherein k belongs to c, c is the wave band number of the multispectral image, and k represents the k-th wave band number of the number c; p HR Is an original full-color image;
and 2.3, performing the same down-sampling operation and copying cascade operation as in the step 2.1 and the step 2.2 on the full-color image in the test data set.
Further, the reconstruction block of step two includes:
step 3.1, obtaining the low-level semantic features of the i-th layer through an MASC-Net model
Figure GDA0003955940100000042
And the high-level characteristic of the corresponding layer->
Figure GDA0003955940100000043
Obtaining the grid attention ^ of the current layer by convolution>
Figure GDA0003955940100000044
The formula is as follows:
Figure GDA0003955940100000045
where W and b are weight and offset, respectively, σ 1 Activating a function for the ReLU;
in the step 3.2, the step of the method,
Figure GDA0003955940100000046
and/or>
Figure GDA0003955940100000047
Multiplying to obtain corresponding high-level semantic information>
Figure GDA0003955940100000048
The formula is as follows:
Figure GDA0003955940100000049
wherein σ 2 The function is activated for the sigmoid and,
Figure GDA00039559401000000410
for feature-by-feature map multiplication, is selected>
Figure GDA00039559401000000411
Obtaining high-level semantic information of a current layer;
step 3.3, for
Figure GDA0003955940100000051
And &>
Figure GDA0003955940100000052
Concatenation is performed followed by feature extraction by convolution to obtain high level features->
Figure GDA0003955940100000053
The cascade formula is:
Figure GDA0003955940100000054
wherein cat (. Cndot.) is a cascade operation;
the extraction characteristic formula is as follows:
Figure GDA0003955940100000055
wherein, MASC i (-) represents a convolution operation;
and 3.4, reconstructing the multispectral image under the ith scale through the independent corresponding layer reconstruction block, wherein the reconstruction formula is as follows:
Figure GDA0003955940100000056
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003955940100000057
representing the convolution calculation.
Further, the loss function of the model in the third step includes:
step 4.1, obtaining a reference image Y 1 Reducing the image size to the reconstructed image under each scale by adopting a bicubic interpolation method
Figure GDA0003955940100000058
The formula is:
Y i =D(Y i-1 )
wherein D (-) is downsampling by bicubic interpolation, Y i To simulate a reference image at the ith scale, Y 1 Is a reference image;
step 4.2, the model constructs the multi-scale loss by using the down-sampling image of the intermediate reconstructed image and the reference image, and the formula is as follows:
Figure GDA0003955940100000059
wherein l i Is l at the ith scale 1 The loss of the carbon dioxide gas is reduced,
Figure GDA00039559401000000510
for the reconstructed image at the i-th scale, Y i Is the ith scale reference image;
finally, according to steps 4.1 and 4.2, a loss function specific to the model is constructed:
Figure GDA0003955940100000061
wherein λ is the weight of the multi-scale loss;
and 4.3, continuously optimizing loss until convergence in the process of training the network by the random gradient descent algorithm according to the loss functions constructed in the steps 4.1 and 4.2.
Compared with the prior art, the remote sensing image fusion method of the 3D convolution-based multi-scale attention depth convolution network has the following advantages:
1. the deep learning method based on 3D convolution is different from other deep learning methods in that spectral information in the multispectral image is reserved and spectral distortion is reduced by utilizing 3D convolution.
2. The multi-scale information idea is adopted, the intermediate image under each scale is reconstructed and output by using the multi-scale reconstruction block, and the final image is constrained by using the intermediate image, so that the final image is fused with the scale information of each scale, and the spatial details on each scale space on the multispectral image are effectively reserved.
3. By adopting a grid attention mechanism, regional feature extraction is carried out on the low-level semantic information in the model, regional detail features in the low-level semantic information are concerned and are fused with the high-level semantic information, so that the spatial detail information of the fused image is effectively improved, and the spatial information storage performance of the fused model is improved.
4. The relationship between the image characteristics of the panchromatic image and the multispectral image and the image is fully considered in the modeling solving process, so that the fusion is more comprehensive, effective and accurate.
Drawings
FIG. 1 is a frame diagram of a remote sensing image fusion method (MSAC-Net) of a 3D multi-scale attention depth convolution network based on deep learning.
FIG. 2 shows the result of the IKONOS satellite image fusion by different fusion methods in a simulation experiment; fig. 2 (a) is an up-sampling multispectral image, fig. 2 (b) is a panchromatic image, fig. 2 (c) is a reference image, fig. 2 (d) is an SR method fused image, fig. 2 (e) is a GS method fused image, fig. 2 (f) is an indication method fused image, fig. 2 (g) is a PNN method fused image, fig. 2 (h) is a PanNet method fused image, and fig. 2 (i) is a MASC-Net method fused image.
FIG. 3 shows the results of the fusion of Quickbird satellite images by different fusion methods in a simulation experiment; fig. 3 (a) is an up-sampling multispectral image, fig. 3 (b) is a panchromatic image, fig. 3 (c) is a reference image, fig. 3 (d) is an SR method fused image, fig. 3 (e) is a GS method fused image, fig. 3 (f) is an indication method fused image, fig. 3 (g) is a PNN method fused image, fig. 3 (h) is a PanNet method fused image, and fig. 3 (i) is a MASC-Net method fused image.
FIG. 4 shows the result of fusion of IKONOS satellite images by different fusion methods in practical experiments; fig. 4 (a) is an up-sampling multispectral image, fig. 4 (b) is a panchromatic image, fig. 4 (c) is an SR method fused image, fig. 4 (d) is a GS method fused image, fig. 4 (e) is an indication method fused image, fig. 4 (f) is a PNN method fused image, fig. 4 (g) is a PanNet method fused image, and fig. 4 (h) is MASC-Net method fused.
FIG. 5 shows the results of fusion of Quickbird satellite images by different fusion methods in practical experiments; fig. 5 (a) is an up-sampling multispectral image, fig. 5 (b) is a panchromatic image, fig. 5 (c) is an SR method fused image, fig. 5 (d) is a GS method fused image, fig. 5 (e) is an indication method fused image, fig. 5 (f) is a PNN method fused image, fig. 5 (g) is a PanNet method fused image, and fig. 5 (h) is MASC-Net method fused.
The invention is described in further detail below with reference to the figures and examples
Detailed Description
According to the remote sensing image fusion method (MSAC-Net) based on the 3D convolution multi-scale attention depth convolution network, on one hand, scale information of an image under each scale is fully utilized, and a final fusion result and each intermediate scale image are learned; on the other hand, in the process of connecting high-level semantics and low-level semantics by using the U-Net network, an Attention grid mechanism (Attention Gate) is introduced, so that the characteristic diagram of network learning focuses more on the local spatial details of the image, and the performance of panchromatic sharpening is improved; and finally, introducing 3D convolution, and extracting information on the spectral dimension by using the calculation characteristic of the 3D convolution so as to reduce the spectral distortion of the fused multispectral image.
Referring to fig. 1, the above remote sensing image fusion method (MASC-Net) based on the 3D convolution multi-scale attention depth convolution network specifically includes the following steps:
step 1, acquiring a pair of panchromatic images and multispectral images with the same scene and the same angle as a sample in a test data set; acquiring multiple pairs of panchromatic images and multispectral images of multiple scenes to obtain a training data set;
for a sample in the test data set, performing up-sampling processing on a multispectral image in the sample to reach the same size as a full-color image; then copying and cascading the full-color image to obtain a full-color image cube with the same wave band number as the multispectral image;
for the training data set, performing down-sampling processing of a scaling factor p =4 on all image sizes in the training data set; then, performing up-sampling operation on the down-sampled multispectral image to obtain an input multispectral image M'; performing a copy cascade on the downsampled panchromatic image to obtain a panchromatic image cube P 'with the same number of bands as the multispectral image'
The up-sampling and down-sampling interpolation method can obtain an image with a size larger or smaller than that of the original image under the condition of keeping a certain spatial characteristic. Specifically, in this embodiment, bicubic interpolation is used for upsampling the images in the training set and the test set, and equidistant interpolation is used for downsampling. Bicubic interpolation can obtain smoother image edges compared with other interpolation methods. Moreover, using upsampling aligns the data input to the model; in the down sampling, the interval is taken as 4, and the down sampling scaling operation is carried out on the multispectral image.
The copy cascade adopts simple copy to obtain the same original image; all the copied images are then cascaded in the wave band dimension to form cubic data P' with the same size H × W × c as the original multispectral image. The obtained up-sampled multispectral images M 'and P' are then concatenated in a fourth dimension to obtain a 2 × h × w × c input data X.
Step 2, inputting the training sample X of the training data set into MASC-Net to obtain the fusion result of the multispectral image and the panchromatic image cube
Figure GDA0003955940100000091
Reconstruction block process, in particular:
step 2.1, obtaining the low-level semantic features of the i-th layer through an MASC-Net model
Figure GDA0003955940100000092
And the high-level characteristic of the corresponding layer->
Figure GDA0003955940100000093
Obtaining the grid attention ^ of the current layer by convolution>
Figure GDA0003955940100000094
The formula is as follows:
Figure GDA0003955940100000095
where W and b are weight and offset, respectively, σ 1 Activating a function for the ReLU;
in the step 2.2, the step of the method,
Figure GDA0003955940100000096
and/or>
Figure GDA0003955940100000097
Multiplying to obtain corresponding high-level semantic information>
Figure GDA0003955940100000098
The formula is as follows:
Figure GDA0003955940100000099
wherein σ 2 The function is activated for the sigmoid and,
Figure GDA00039559401000000910
multiplying by feature-by-feature maps; />
Figure GDA00039559401000000911
Obtaining high-level semantic information of a current layer;
step 2.3, for
Figure GDA00039559401000000912
And &>
Figure GDA00039559401000000913
Concatenation is performed followed by feature extraction by convolution to obtain high level features->
Figure GDA00039559401000000914
The cascade formula is:
Figure GDA00039559401000000915
wherein cat (. Cndot.) is a cascade operation;
the extraction characteristic formula is as follows:
Figure GDA00039559401000000916
wherein, MASC i (-) represents a convolution operation;
step 2.4, reconstructing the multispectral image under the ith scale through the independent corresponding layer reconstruction block, wherein the reconstruction formula is as follows:
Figure GDA00039559401000000917
wherein the content of the first and second substances,
Figure GDA00039559401000000918
representing a convolution calculation.
Step 2.5, on the first scale, obtaining a feature map on the first scale through feature extraction
Figure GDA00039559401000000919
Then finally heavy with the modelBuilding block to obtain final result>
Figure GDA00039559401000000920
Step 3, training the MASC-Net model in the step 2 by using a training data set by adopting a random gradient descent algorithm, and correcting a fusion result by obtaining an intermediate image through an intermediate reconstruction layer to obtain a fusion model, wherein the specific steps are as follows:
step 3.1, obtaining a reference image Y 1 Reducing the image size to the reconstructed image under each scale by adopting a bicubic interpolation method
Figure GDA0003955940100000101
The formula is:
Y i =D(Y i-1 )
wherein D (-) is downsampling by bicubic interpolation, Y i To simulate a reference image at the ith scale, Y 1 Is a reference image.
Step 3.2, the model constructs the multi-scale loss by using the intermediate reconstructed image and the down-sampling image of the reference image, and the formula is as follows:
Figure GDA0003955940100000102
wherein l i Is l at the ith scale 1 The loss of the carbon dioxide gas is reduced,
Figure GDA0003955940100000103
for the reconstructed image at the i-th scale, Y i Is the ith scale reference image.
Finally, according to steps 3.1 and 3.2, a loss function specific to the model is constructed:
Figure GDA0003955940100000104
where λ is the weight of the multi-scale loss. l 1 To damageLoss function,/ i Is a loss function at the ith scale; i is the number of scaling scales.
And 3.3, continuously optimizing loss until convergence in the process of training the network by the random gradient descent algorithm according to the loss functions constructed in the steps 3.1 and 3.2.
The method specifically comprises the following steps: in the training data set, p samples are respectively selected to form a small batch of samples, wherein p =32 is selected, and then random gradient descent is carried out on the samples:
Figure GDA0003955940100000111
wherein the content of the first and second substances,
Figure GDA0003955940100000112
is a loss function in step 3>
Figure GDA0003955940100000113
Where m represents p samples, and then the model is updated using the gradient descent method:
Figure GDA0003955940100000114
here, the first and second liquid crystal display panels are,
Figure GDA0003955940100000115
is the loss function versus the parameter theta i The partial derivative of (c). α is the learning rate set by the model.
After the steps, the method can be simplified into the following steps:
Figure GDA0003955940100000116
in the formula (I), the compound is shown in the specification,
Figure GDA0003955940100000117
representing the gradient of the loss function for which it was found. By taking a sample at random each time the model is updated
Figure GDA0003955940100000118
To update the parameters.
The embodiment is as follows:
in the embodiment, 2 satellite remote sensing images are adopted to verify the effectiveness of the provided fusion algorithm; the spatial resolution of the full-color image and the multispectral image which can be shot and obtained by the IKONOS satellite is 1 meter and 4 meters respectively; the spatial resolutions of the panchromatic and multispectral images provided by the QuickBird satellite are 0.7 m and 2.8 m, respectively; the multispectral images acquired by the two satellites respectively comprise four wave bands of red, green, blue and near infrared; the full-color image size used in the experiment was 256 × 256 and the multispectral image size was 64 × 64.
In order to better evaluate the practicability of the remote sensing image fusion method (MASC-Net) based on the 3D convolution multi-scale attention depth convolution network of the present embodiment, the present embodiment provides two experiment types, which are a simulated image experiment and an actual image experiment, respectively, wherein the simulated image experiment reduces the spatial resolution of the panchromatic image and the multispectral image by 4 times at the same time, and uses the panchromatic image and the multispectral image as simulated image data to be fused, and uses the original multispectral image as a standard fusion result for reference, and the actual image experiment directly fuses the real images.
The remote sensing image fusion method (MASC-Net) based on the 3D convolution multi-scale attention depth convolution network provided in this embodiment is mainly compared with the following five widely used image fusion methods: the method comprises a sparse representation-based method SR, a component substitution-based method GS, a multi-resolution analysis-based method Indusion, and a deep learning-based method PNN and PanNet.
Training a network by using a Pythrch software package in an experiment, performing 25000 iterations approximately, and setting the batch size to be 32; for the stochastic gradient descent algorithm, the weight attenuation is set to 10 -3 Momentum of 0.9; the invention sets the MASC-Net network depth to be 5.
Analyzing the effect of the simulated image experiment:
FIG. 2 is a diagram of IKONOS satellite simulation experiment results; FIGS. 2 (a), (b) are an up-sampled multispectral image and a panchromatic image, respectively, FIG. 2 (c) is a reference image, FIGS. 2 (d) - (h) are fused images of five comparison methods, respectively, and FIG. 2 (i) is a fused image of the MASC-Net method (i.e., the present invention); by comparing the fused image with the reference image, it can be seen that all the methods can improve the spatial resolution of the original multispectral image, but obviously the SR has obvious color deviation in vision, the SR and PNN methods have serious spectral distortion, and the vegetation area synthesized by Indusion and PanNet has too sharp edge; as can be seen from fig. 2 (i), the spatial resolution of the multispectral image is improved, the spectral information of the source image is better retained, and the obtained fused image is better and more natural.
FIG. 3 is a graph of results of a Quickbird satellite simulation experiment; FIGS. 3 (a), (b) are an up-sampled multispectral image and a panchromatic image, respectively, FIG. 3 (c) is a reference image, FIGS. 3 (d) - (h) are fused images of five comparison methods, respectively, and FIG. 3 (i) is a fused image of the MASC-Net method (i.e., the present invention); as can be seen from fig. 3 (d) and (g), the color of the SR method and PNN method fused image is changed very much, and has a significant spectral difference from the reference image, and it is found from fig. 3 (e), (f) and (h) that the GS, indusion and PanNet method fused image has a large difference from the reference image in the bare land area at the lower right corner of the image; but the present embodiment and the reference image have small differences in both spectral resolution and spatial resolution.
Visual evaluation comparison can provide more visual understanding for the fusion result, but the most correct judgment is difficult to be given to the fusion result by purely depending on subjective evaluation, so that the fusion result needs to be evaluated together with objective indexes; in the embodiment, six objective evaluation indexes of CC, PSNR, Q4, SAM, SSIM and ERGAS are adopted to comprehensively evaluate the image; wherein CC represents a correlation coefficient, and the similarity degree of spectral and spatial information between the wave band images of the two images is evaluated from the aspect of statistical correlation; PSNR (peak signal-to-noise ratio) is an objective standard for evaluating images; q4 is an objective index for comprehensively evaluating the spatial quality and the spectral quality of the fused image, and the optimal value is 1; SAM represents global spectral distortion measurement, reflecting the color difference between the two images, with an optimal value of 0; the SSIM realizes the measurement of the similarity of the reference image and the structure of each wave band image in the fusion result through the comparison of the brightness, the contrast and the structure; ERGAS represents a global index of fusion image quality, and the optimal value is 0.
Tables 1 and 2 are objective indexes of results of different fusion methods in IKONOS and Quickbird satellite simulation image experiments respectively; as can be seen from tables 1 and 2, most objective indexes of the embodiment are superior to those of other methods, wherein the CC and Q4 values of the embodiment are much higher than those of other methods, that is, the method provided by the embodiment has the strongest correlation between the fused image and the reference image, and can well improve the spatial resolution of the multispectral image and maintain the spectral characteristics.
By integrating visual evaluation and objective index evaluation, the remote sensing image fusion method based on the 3D convolution multi-scale attention depth convolution network can well obtain a fusion image with high space and high spectral resolution.
Table 1: objective index of IKONOS satellite image simulation experiment fusion result
Figure GDA0003955940100000141
Table 2: objective index of fusion result of Quickbird satellite image simulation experiment
Figure GDA0003955940100000142
Analyzing the experimental effect of the actual image:
FIG. 4 is a diagram of IKONOS satellite practical experiment results; fig. 4 (a) and (b) are respectively an up-sampling multispectral image and a panchromatic image, fig. 4 (c) to (g) are respectively fusion images of five comparison methods, and fig. 4 (h) is a fusion image of an MSAC-Net method; it can be seen that fig. 4 (d) has little spectral distortion, fig. 4 (f) appears very blurred, and fig. 4 (e) and (g) have poor edge extraction; in general, the remote sensing image fusion method based on the 3D convolution multi-scale attention depth convolution network has the advantages of high fusion image spatial resolution, small spectral distortion and good overall visual effect.
FIG. 5 is a graph of results of a Quickbird satellite experiment; FIGS. 5 (a), (b) are an up-sampled multispectral image and a panchromatic image, respectively, FIGS. 5 (c) - (g) are fused images of five comparison methods, respectively, and FIG. 5 (h) is a fused image of MASC-Net method; FIG. 5 (c) the fused image is over-sharpened, the colors of the fused images in FIGS. 5 (d) and (f) are obviously changed, and the overall spatial resolution of the fused images in FIGS. 5 (e) and (g) is not high; as shown in fig. 5 (h), the remote sensing image fusion method based on the 3D convolution multi-scale attention depth convolution network of the present embodiment is adopted, and the obtained fused image has a clearer outline compared with other methods.
In the actual image experiment, because no reference image exists, in order to effectively and objectively evaluate each fusion result, an objective evaluation index QNR without the reference image is adopted to evaluate the image fusion quality; QNR measures the brightness, contrast and local correlation between the fused image and the original image, and includes a spatial information loss index D s And spectral information loss index D λ Wherein the optimum value of QNR is 1 and D s And D λ The most preferable value of (2) is 0.
Tables 3 and 4 are objective indexes of results of different fusion methods in IKONOS and Quickbird satellite actual image experiments respectively; as can be seen from tables 3 and 4, by using the remote sensing image fusion method based on the 3D convolution multi-scale attention depth convolution network of the present embodiment, the loss of spatial detail information generated in the fusion process is minimal, and although the spectral loss is slightly higher, the objective indicator QNR without reference for evaluation is optimal compared with all other methods.
In summary, the remote sensing image fusion method based on the 3D convolution multi-scale attention depth convolution network according to the embodiment greatly improves the spatial resolution of the fusion image while well retaining the spectral information of the multi-spectral image.
Table 3: objective index of IKONOS satellite image actual experiment fusion result
Figure GDA0003955940100000151
Table 4: objective index of practical experiment fusion result of Quickbird satellite image
Figure GDA0003955940100000152
Figure GDA0003955940100000161
/>

Claims (2)

1. A remote sensing image fusion method of a multi-scale attention depth convolution network based on 3D convolution is characterized by comprising the following steps:
the method comprises the following steps: acquiring a pair of panchromatic images and multispectral images with the same scene and the same angle as a sample in a test data set; acquiring multiple pairs of panchromatic images and multispectral images of multiple scenes to obtain a training data set;
for a sample in the test data set, performing up-sampling processing on a multispectral image in the sample to reach the same size as a full-color image; then, carrying out cascade copying on the full-color image to obtain a full-color image cube with the same wave band number as the multispectral image;
for the training data set, down-sampling all panchromatic images in the training data set to reach the same size as the multispectral images in the training data set; then copying and cascading operation is carried out on the full-color image obtained after down sampling so as to obtain a full-color image cube with the same wave band number as the multispectral image; the copying cascade operation means that firstly copying the full-color image, the quantity of which is the number of wave bands of the multispectral image, and then cascading all the copied images on the wave band dimension to obtain a full-color image cube;
inputting the samples of the training data set into the 3D multi-scale attention depth convolution network model to obtain the fusion result of the multispectral image and the panchromatic image cube
Figure FDA0004040096580000011
And reconstructing and outputting the intermediate image of the model by using a reconstruction block in the model:
Figure FDA0004040096580000012
wherein the content of the first and second substances,
Figure FDA0004040096580000013
is a feature map obtained by the model on the ith scale, R (-) is a reconstructed block of the corresponding scale of the model, and is based on the corresponding scale>
Figure FDA0004040096580000014
Is an image of an intermediate scale obtained by the i-th layer reconstruction block, is based on>
Figure FDA0004040096580000015
Reconstructing a final image for the model;
the reconstruction block comprises:
step 3.1, obtaining low-level semantic features of the ith layer through a 3D multi-scale attention depth convolution network model
Figure FDA0004040096580000016
And the high-level characteristic of the corresponding layer->
Figure FDA0004040096580000017
Obtaining the grid attention ^ of the current layer by convolution>
Figure FDA0004040096580000018
The formula is as follows:
Figure FDA0004040096580000021
wherein W and b are the weights respectivelyHeavy sum offset, σ 1 Activating a function for the ReLU;
in the step 3.2, the step of the method,
Figure FDA0004040096580000022
and/or>
Figure FDA0004040096580000023
Multiplying to obtain corresponding high-level semantic information>
Figure FDA0004040096580000024
The formula is as follows:
Figure FDA0004040096580000025
wherein σ 2 The function is activated for the sigmoid and,
Figure FDA0004040096580000026
for feature-by-feature map multiplication, is selected>
Figure FDA0004040096580000027
Obtaining high-level semantic information of a current layer;
step 3.3, for
Figure FDA0004040096580000028
And &>
Figure FDA0004040096580000029
Concatenation is performed followed by feature extraction by convolution to obtain high level features->
Figure FDA00040400965800000210
The cascade formula is:
Figure FDA00040400965800000211
wherein cat (. Cndot.) is a cascade operation;
the extraction characteristic formula is as follows:
Figure FDA00040400965800000212
wherein, MASC i (-) represents a convolution operation;
and 3.4, reconstructing the multispectral image under the ith scale through the independent corresponding layer reconstruction block, wherein the reconstruction formula is as follows:
Figure FDA00040400965800000213
wherein the content of the first and second substances,
Figure FDA00040400965800000214
represents a convolution calculation;
step three; training the 3D multi-scale attention depth convolution network model in the second step by using a training data set by adopting a random gradient descent algorithm until convergence so as to obtain a fusion model;
utilizing bicubic interpolation to carry out down-sampling on the reference image to obtain an image Y with the corresponding size of each intermediate image i
Y j =(Y i-1 ),=2,3,…,
Wherein D (-) is a downsampling operation, Y j To simulate an intermediate image of the j-th scale, Y j-1 To simulate an intermediate image at the j-1 st scale, Y 1 Is a reference image;
in the process of training the network by the stochastic gradient descent algorithm, continuously optimizing the loss function until convergence, wherein the loss function of the model is as follows:
Figure FDA0004040096580000031
where λ is the weight of the scale information, l 1 As a loss function,/ j Is a loss function at the j-th scale; i is the number of scaling scales;
step four: and aiming at the full-color image and the multispectral image to be fused of a certain scene, obtaining a final fused image by utilizing the fused model obtained after training in the third step according to the up-sampling in the first step.
2. The method of claim 1, wherein the upsampling and downsampling processes and the copy concatenation of step one comprises the steps of:
step 2.1, in the training data set, adopting an interpolation method to carry out down-sampling with the interval of 4 on the original multispectral image and the panchromatic image; then, performing up-sampling on the down-sampled multispectral image by 4 times by utilizing bicubic interpolation to obtain a low-resolution multispectral image with the same size as the down-sampled panchromatic image;
step 2.2, obtaining the image sets with the same number as the multiple spectral bands for the full color image sampled in step 2.1 by means of copying, and then cascading in the spectral dimension to obtain a data cube of h × w × c, that is:
Figure FDA0004040096580000032
wherein k belongs to c, c is the wave band number of the multispectral image, and k represents the k-th wave band number of the number c; p HR The original full-color image is obtained;
step 2.3, for the test data set, the full color image is subjected to the same down-sampling operation and copy cascade operation as in step 2.1 and step 2.2.
CN202110042742.0A 2021-01-13 2021-01-13 Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution Active CN112819737B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110042742.0A CN112819737B (en) 2021-01-13 2021-01-13 Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110042742.0A CN112819737B (en) 2021-01-13 2021-01-13 Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution

Publications (2)

Publication Number Publication Date
CN112819737A CN112819737A (en) 2021-05-18
CN112819737B true CN112819737B (en) 2023-04-07

Family

ID=75869142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110042742.0A Active CN112819737B (en) 2021-01-13 2021-01-13 Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution

Country Status (1)

Country Link
CN (1) CN112819737B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191325B (en) * 2021-05-24 2023-12-12 中国科学院深圳先进技术研究院 Image fusion method, system and application thereof
CN113421216B (en) * 2021-08-24 2021-11-12 湖南大学 Hyperspectral fusion calculation imaging method and system
CN113763299B (en) * 2021-08-26 2022-10-14 中国人民解放军军事科学院国防工程研究院工程防护研究所 Panchromatic and multispectral image fusion method and device and application thereof
CN113628152B (en) * 2021-09-15 2023-11-17 南京天巡遥感技术研究院有限公司 Dim light image enhancement method based on multi-scale feature selective fusion
CN113962913B (en) * 2021-09-26 2023-09-15 西北大学 Construction method of deep mutual learning framework integrating spectral space information
CN114511470B (en) * 2022-04-06 2022-07-08 中国科学院深圳先进技术研究院 Attention mechanism-based double-branch panchromatic sharpening method
CN115018748A (en) * 2022-06-06 2022-09-06 西北工业大学 Aerospace remote sensing image fusion method combining model structure reconstruction and attention mechanism

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886870A (en) * 2018-12-29 2019-06-14 西北大学 Remote sensing image fusion method based on binary channels neural network
CN111292259A (en) * 2020-01-14 2020-06-16 西安交通大学 Deep learning image denoising method integrating multi-scale and attention mechanism
CN111914917A (en) * 2020-07-22 2020-11-10 西安建筑科技大学 Target detection improved algorithm based on feature pyramid network and attention mechanism
WO2020237693A1 (en) * 2019-05-31 2020-12-03 华南理工大学 Multi-source sensing method and system for water surface unmanned equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198463B (en) * 2013-04-07 2014-08-27 北京航空航天大学 Spectrum image panchromatic sharpening method based on fusion of whole structure and space detail information
US11276151B2 (en) * 2019-06-27 2022-03-15 Retrace Labs Inpainting dental images with missing anatomy

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886870A (en) * 2018-12-29 2019-06-14 西北大学 Remote sensing image fusion method based on binary channels neural network
WO2020237693A1 (en) * 2019-05-31 2020-12-03 华南理工大学 Multi-source sensing method and system for water surface unmanned equipment
CN111292259A (en) * 2020-01-14 2020-06-16 西安交通大学 Deep learning image denoising method integrating multi-scale and attention mechanism
CN111914917A (en) * 2020-07-22 2020-11-10 西安建筑科技大学 Target detection improved algorithm based on feature pyramid network and attention mechanism

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Attention U-Net:Learning Where to Look for the Pancreas;Ozan Oktay 等;《arXiv》;20180520;第1-10页 *
基于卷积神经网络超分辨率重建的遥感图像融合;薛洋等;《广西师范大学学报(自然科学版)》;20180415(第02期);第37-45页 *
基于多尺度特征融合与反复注意力机制的细粒度图像分类算法;何凯等;《天津大学学报(自然科学与工程技术版)》;20200902(第10期);第91-99页 *

Also Published As

Publication number Publication date
CN112819737A (en) 2021-05-18

Similar Documents

Publication Publication Date Title
CN112819737B (en) Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution
CN109886870B (en) Remote sensing image fusion method based on dual-channel neural network
CN110533620B (en) Hyperspectral and full-color image fusion method based on AAE extraction spatial features
CN111127374B (en) Pan-sharing method based on multi-scale dense network
CN110119780B (en) Hyper-spectral image super-resolution reconstruction method based on generation countermeasure network
Xie et al. Hyperspectral image super-resolution using deep feature matrix factorization
Deng et al. Machine learning in pansharpening: A benchmark, from shallow to deep networks
CN110363215B (en) Method for converting SAR image into optical image based on generating type countermeasure network
Li et al. Hyperspectral image super-resolution using deep convolutional neural network
Dong et al. RRSGAN: Reference-based super-resolution for remote sensing image
CN108830796B (en) Hyperspectral image super-resolution reconstruction method based on spectral-spatial combination and gradient domain loss
Hu et al. Pan-sharpening via multiscale dynamic convolutional neural network
CN111080567A (en) Remote sensing image fusion method and system based on multi-scale dynamic convolution neural network
CN104657962B (en) The Image Super-resolution Reconstruction method returned based on cascading linear
CN113327218B (en) Hyperspectral and full-color image fusion method based on cascade network
CN106251320A (en) Remote sensing image fusion method based on joint sparse Yu structure dictionary
CN102982520B (en) Robustness face super-resolution processing method based on contour inspection
CN105096286A (en) Method and device for fusing remote sensing image
Xiao et al. A dual-UNet with multistage details injection for hyperspectral image fusion
CN112801904B (en) Hybrid degraded image enhancement method based on convolutional neural network
CN114821261A (en) Image fusion algorithm
CN116309070A (en) Super-resolution reconstruction method and device for hyperspectral remote sensing image and computer equipment
CN113139902A (en) Hyperspectral image super-resolution reconstruction method and device and electronic equipment
Zhang et al. Attention-based tri-UNet for remote sensing image pan-sharpening
CN115512192A (en) Multispectral and hyperspectral image fusion method based on cross-scale octave convolution network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant