CN112819737B - Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution - Google Patents
Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution Download PDFInfo
- Publication number
- CN112819737B CN112819737B CN202110042742.0A CN202110042742A CN112819737B CN 112819737 B CN112819737 B CN 112819737B CN 202110042742 A CN202110042742 A CN 202110042742A CN 112819737 B CN112819737 B CN 112819737B
- Authority
- CN
- China
- Prior art keywords
- image
- scale
- multispectral
- model
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000007500 overflow downdraw method Methods 0.000 title claims abstract description 25
- 230000004927 fusion Effects 0.000 claims abstract description 38
- 230000003595 spectral effect Effects 0.000 claims abstract description 35
- 238000000034 method Methods 0.000 claims description 76
- 238000012549 training Methods 0.000 claims description 32
- 238000005070 sampling Methods 0.000 claims description 30
- 238000000605 extraction Methods 0.000 claims description 10
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 239000000126 substance Substances 0.000 claims description 5
- 241000282326 Felis catus Species 0.000 claims description 3
- 230000003213 activating effect Effects 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 abstract description 8
- 230000007246 mechanism Effects 0.000 abstract description 8
- 230000006870 function Effects 0.000 description 18
- 238000002474 experimental method Methods 0.000 description 17
- 238000011156 evaluation Methods 0.000 description 7
- 238000004088 simulation Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- CURLTUGMZLYLDI-UHFFFAOYSA-N Carbon dioxide Chemical compound O=C=O CURLTUGMZLYLDI-UHFFFAOYSA-N 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 229910002092 carbon dioxide Inorganic materials 0.000 description 2
- 239000001569 carbon dioxide Substances 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4023—Scaling of whole images or parts thereof, e.g. expanding or contracting based on decimating pixels or lines of pixels; based on inserting pixels or lines of pixels
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10032—Satellite or aerial image; Remote sensing
- G06T2207/10036—Multispectral image; Hyperspectral image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a remote sensing image fusion method of a 3D convolution-based multi-scale attention depth convolution network, which fuses high spectral resolution owned by a multispectral image and high spatial resolution owned by a panchromatic image to obtain the multispectral image with high spatial resolution and high spectral resolution. A3D multi-scale attention deep convolution network model (MSAC-Net) is designed by utilizing a U-Net network structure framework in deep learning. In order to keep the spectral resolution in the multispectral, the model integrally uses 3D convolution to extract the characteristics of the information on the spectral dimension; to capture more spatial detail, a mechanism of attention is introduced at the model's jump junction to learn region details. In the decoding stage of the model, a plurality of reconstruction layers containing multi-scale spatial information are introduced to calculate the reconstruction result, so that the model is encouraged to learn multi-scale representations of different layers, and multi-level reference is provided for the final fusion result. The fusion result of the final image is effectively improved.
Description
Technical Field
The invention belongs to the technical field of information, relates to an image processing technology, and particularly relates to a remote sensing image fusion method of a multi-scale attention depth convolution network based on 3D convolution.
Background
The remote sensing satellite can acquire a full-color (PAN) image of the same scene while shooting and acquiring a Multispectral (MS) image, wherein the multispectral image is rich in spectral information, but low in spatial resolution and poor in definition, and the full-color image is high in spatial resolution and low in spectral resolution; the spatial and spectral resolutions of the two contradict each other. The advantages of the two are fused to obtain a multispectral image with high spatial and spectral resolution, which is also a great demand at present.
At present, deep learning is widely applied to various research fields, and a new solution is provided for various fields. Among them, in the field of deep learning, 3D convolution has proven to be a very effective method of exploring volume data. Compared with 2D convolution operation, 3D convolution not only retains the features in the space dimension, but also takes the feature extraction in the spectrum dimension into consideration. The operation method is more consistent with the imaging principle of the spectral image, so that the appearance of 3D convolution is a new way for solving the problem of the traditional 2D convolution. However, due to the limited data required for using 3D convolution, it is not widely used in current multispectral panchromatic sharpening.
In order to fully utilize the internal relation among the pixels in a single wavelength band, the conventional method generally adopts observation results of different scales for common fusion so as to achieve that the final fused image simultaneously has image features under different scales. However, this approach has the disadvantage that due to the special nature of the multispectral image cube structure, the use of scale information, while enhancing the spatial detail features of the image, may result in loss of information in the spectral dimension and even spectral distortion.
Furthermore, attention mechanism (Attention mechanism) has been proposed in recent years as inspired by the human perception system. Due to its unique feature of calculation that can allocate calculation resources to a part of the region of interest information, it is widely used in the field of image processing. Unfortunately, many of the attention mechanisms proposed so far cannot be directly applied to multispectral panchromatic sharpening, and the final result is blurred or distorted by improper use of the attention mechanism, so that information in the spatial dimension and the spectral dimension is lost, and the geometric feature representation of the image is incomplete.
Disclosure of Invention
In order to fully utilize the correlation among pixels and bands of a multispectral image and the characteristic of high spatial resolution of a panchromatic image to reduce the workload of image processing and improve the accuracy of image fusion, the invention aims to provide a remote sensing image fusion method (MSAC-Net) of a 3D multi-scale attention depth convolution network based on deep learning, which adopts a 3D convolution method, extracts spatial details from the panchromatic image by using an attention mechanism while preserving the spectral details of the multispectral image through a deep learning model, and learns a final result and a plurality of intermediate scale results to obtain a required high-resolution multispectral image so as to solve the problems of incomplete remote sensing image fusion, poor fusion quality and poor fusion effect in the prior art.
In order to realize the task, the invention adopts the following technical scheme to realize the following steps:
a remote sensing image fusion method of a multi-scale attention depth convolution network based on 3D convolution is characterized by comprising the following steps:
the method comprises the following steps: acquiring a pair of panchromatic images and multispectral images with the same scene and the same angle as a sample in a test data set; acquiring multiple pairs of panchromatic images and multispectral images of multiple scenes to obtain a training data set;
for a sample in the test data set, performing up-sampling processing on a multispectral image in the sample to reach the same size as a full-color image; then, carrying out cascade copying on the full-color image to obtain a full-color image cube with the same wave band number as the multispectral image;
for the training data set, down-sampling all panchromatic images in the training data set to reach the same size as the multispectral images in the training data set; then copying and cascading operation is carried out on the full-color image obtained after down sampling so as to obtain a full-color image cube with the same wave band number as the multispectral image; the copying cascade operation means that firstly copying the full-color image, the quantity of which is the number of wave bands of the multispectral image, and then cascading all the copied images on the wave band dimension to obtain a full-color image cube;
step two: inputting samples of the training data set into a MASC-Net model to obtain a fusion result of the multispectral image and the panchromatic image cubeAnd reconstructing and outputting the intermediate image of the model by using a reconstruction block in the model:
wherein the content of the first and second substances,is a characteristic diagram obtained by the model on the ith layer scale, R (-) is a reconstruction block of the model corresponding scale,is an intermediate scale image derived from an i-th layer reconstruction block, based on image data obtained from a pixel data block and based on image data obtained from a pixel data block>Reconstructing a final image for the model;
step three; training the MASC-Net model in the second step by using a training data set by adopting a random gradient descent algorithm until convergence so as to obtain a fusion model;
utilizing bicubic interpolation to carry out down-sampling on the reference image to obtain an image Y with the corresponding size of each intermediate image i :
Y i =D(Y i-1 ),i=2,3,…,k
Wherein D (-) is a downsampling operation, Y i To simulate a reference image at the ith scale, Y i-1 To simulate the intermediate image at the i-1 st scale, Y 1 Is a reference picture.
In the process of training the network by the stochastic gradient descent algorithm, continuously optimizing a loss function until convergence, wherein the loss function of the model is as follows:
where λ is the weight of the scale information, l 1 As a loss function,/ i Is a loss function at the ith scale; i is the number of scaling scales.
Step four: and aiming at the full-color image and the multispectral image to be fused of a certain scene, obtaining a final fused image by utilizing the fused model obtained after training in the third step according to the up-sampling in the first step.
Further, the method for performing upsampling and downsampling processes and copy concatenation in the step one includes:
step 2.1, in the training data set, carrying out down-sampling with interval of p on the original multispectral image and the panchromatic image by adopting an interpolation method; and then, performing up-sampling p times on the down-sampled multispectral image by utilizing bicubic interpolation to obtain the multispectral image with the low resolution ratio which is the same as the size of the down-sampled panchromatic image.
Step 2.2, obtaining the image sets with the same number as the multiple spectral bands by copying the full-color image sampled in step 2.1, and then cascading in the spectral dimension to obtain an h × w × c data cube, that is:
wherein k belongs to c, c is the wave band number of the multispectral image, and k represents the k-th wave band number of the number c; p HR Is an original full-color image;
and 2.3, performing the same down-sampling operation and copying cascade operation as in the step 2.1 and the step 2.2 on the full-color image in the test data set.
Further, the reconstruction block of step two includes:
step 3.1, obtaining the low-level semantic features of the i-th layer through an MASC-Net modelAnd the high-level characteristic of the corresponding layer->Obtaining the grid attention ^ of the current layer by convolution>The formula is as follows:
where W and b are weight and offset, respectively, σ 1 Activating a function for the ReLU;
in the step 3.2, the step of the method,and/or>Multiplying to obtain corresponding high-level semantic information>The formula is as follows:
wherein σ 2 The function is activated for the sigmoid and,for feature-by-feature map multiplication, is selected>Obtaining high-level semantic information of a current layer;
step 3.3, forAnd &>Concatenation is performed followed by feature extraction by convolution to obtain high level features->The cascade formula is:
wherein cat (. Cndot.) is a cascade operation;
the extraction characteristic formula is as follows:
wherein, MASC i (-) represents a convolution operation;
and 3.4, reconstructing the multispectral image under the ith scale through the independent corresponding layer reconstruction block, wherein the reconstruction formula is as follows:
wherein, the first and the second end of the pipe are connected with each other,representing the convolution calculation.
Further, the loss function of the model in the third step includes:
step 4.1, obtaining a reference image Y 1 Reducing the image size to the reconstructed image under each scale by adopting a bicubic interpolation methodThe formula is:
Y i =D(Y i-1 )
wherein D (-) is downsampling by bicubic interpolation, Y i To simulate a reference image at the ith scale, Y 1 Is a reference image;
step 4.2, the model constructs the multi-scale loss by using the down-sampling image of the intermediate reconstructed image and the reference image, and the formula is as follows:
wherein l i Is l at the ith scale 1 The loss of the carbon dioxide gas is reduced,for the reconstructed image at the i-th scale, Y i Is the ith scale reference image;
finally, according to steps 4.1 and 4.2, a loss function specific to the model is constructed:
wherein λ is the weight of the multi-scale loss;
and 4.3, continuously optimizing loss until convergence in the process of training the network by the random gradient descent algorithm according to the loss functions constructed in the steps 4.1 and 4.2.
Compared with the prior art, the remote sensing image fusion method of the 3D convolution-based multi-scale attention depth convolution network has the following advantages:
1. the deep learning method based on 3D convolution is different from other deep learning methods in that spectral information in the multispectral image is reserved and spectral distortion is reduced by utilizing 3D convolution.
2. The multi-scale information idea is adopted, the intermediate image under each scale is reconstructed and output by using the multi-scale reconstruction block, and the final image is constrained by using the intermediate image, so that the final image is fused with the scale information of each scale, and the spatial details on each scale space on the multispectral image are effectively reserved.
3. By adopting a grid attention mechanism, regional feature extraction is carried out on the low-level semantic information in the model, regional detail features in the low-level semantic information are concerned and are fused with the high-level semantic information, so that the spatial detail information of the fused image is effectively improved, and the spatial information storage performance of the fused model is improved.
4. The relationship between the image characteristics of the panchromatic image and the multispectral image and the image is fully considered in the modeling solving process, so that the fusion is more comprehensive, effective and accurate.
Drawings
FIG. 1 is a frame diagram of a remote sensing image fusion method (MSAC-Net) of a 3D multi-scale attention depth convolution network based on deep learning.
FIG. 2 shows the result of the IKONOS satellite image fusion by different fusion methods in a simulation experiment; fig. 2 (a) is an up-sampling multispectral image, fig. 2 (b) is a panchromatic image, fig. 2 (c) is a reference image, fig. 2 (d) is an SR method fused image, fig. 2 (e) is a GS method fused image, fig. 2 (f) is an indication method fused image, fig. 2 (g) is a PNN method fused image, fig. 2 (h) is a PanNet method fused image, and fig. 2 (i) is a MASC-Net method fused image.
FIG. 3 shows the results of the fusion of Quickbird satellite images by different fusion methods in a simulation experiment; fig. 3 (a) is an up-sampling multispectral image, fig. 3 (b) is a panchromatic image, fig. 3 (c) is a reference image, fig. 3 (d) is an SR method fused image, fig. 3 (e) is a GS method fused image, fig. 3 (f) is an indication method fused image, fig. 3 (g) is a PNN method fused image, fig. 3 (h) is a PanNet method fused image, and fig. 3 (i) is a MASC-Net method fused image.
FIG. 4 shows the result of fusion of IKONOS satellite images by different fusion methods in practical experiments; fig. 4 (a) is an up-sampling multispectral image, fig. 4 (b) is a panchromatic image, fig. 4 (c) is an SR method fused image, fig. 4 (d) is a GS method fused image, fig. 4 (e) is an indication method fused image, fig. 4 (f) is a PNN method fused image, fig. 4 (g) is a PanNet method fused image, and fig. 4 (h) is MASC-Net method fused.
FIG. 5 shows the results of fusion of Quickbird satellite images by different fusion methods in practical experiments; fig. 5 (a) is an up-sampling multispectral image, fig. 5 (b) is a panchromatic image, fig. 5 (c) is an SR method fused image, fig. 5 (d) is a GS method fused image, fig. 5 (e) is an indication method fused image, fig. 5 (f) is a PNN method fused image, fig. 5 (g) is a PanNet method fused image, and fig. 5 (h) is MASC-Net method fused.
The invention is described in further detail below with reference to the figures and examples
Detailed Description
According to the remote sensing image fusion method (MSAC-Net) based on the 3D convolution multi-scale attention depth convolution network, on one hand, scale information of an image under each scale is fully utilized, and a final fusion result and each intermediate scale image are learned; on the other hand, in the process of connecting high-level semantics and low-level semantics by using the U-Net network, an Attention grid mechanism (Attention Gate) is introduced, so that the characteristic diagram of network learning focuses more on the local spatial details of the image, and the performance of panchromatic sharpening is improved; and finally, introducing 3D convolution, and extracting information on the spectral dimension by using the calculation characteristic of the 3D convolution so as to reduce the spectral distortion of the fused multispectral image.
Referring to fig. 1, the above remote sensing image fusion method (MASC-Net) based on the 3D convolution multi-scale attention depth convolution network specifically includes the following steps:
step 1, acquiring a pair of panchromatic images and multispectral images with the same scene and the same angle as a sample in a test data set; acquiring multiple pairs of panchromatic images and multispectral images of multiple scenes to obtain a training data set;
for a sample in the test data set, performing up-sampling processing on a multispectral image in the sample to reach the same size as a full-color image; then copying and cascading the full-color image to obtain a full-color image cube with the same wave band number as the multispectral image;
for the training data set, performing down-sampling processing of a scaling factor p =4 on all image sizes in the training data set; then, performing up-sampling operation on the down-sampled multispectral image to obtain an input multispectral image M'; performing a copy cascade on the downsampled panchromatic image to obtain a panchromatic image cube P 'with the same number of bands as the multispectral image'
The up-sampling and down-sampling interpolation method can obtain an image with a size larger or smaller than that of the original image under the condition of keeping a certain spatial characteristic. Specifically, in this embodiment, bicubic interpolation is used for upsampling the images in the training set and the test set, and equidistant interpolation is used for downsampling. Bicubic interpolation can obtain smoother image edges compared with other interpolation methods. Moreover, using upsampling aligns the data input to the model; in the down sampling, the interval is taken as 4, and the down sampling scaling operation is carried out on the multispectral image.
The copy cascade adopts simple copy to obtain the same original image; all the copied images are then cascaded in the wave band dimension to form cubic data P' with the same size H × W × c as the original multispectral image. The obtained up-sampled multispectral images M 'and P' are then concatenated in a fourth dimension to obtain a 2 × h × w × c input data X.
Step 2, inputting the training sample X of the training data set into MASC-Net to obtain the fusion result of the multispectral image and the panchromatic image cubeReconstruction block process, in particular:
step 2.1, obtaining the low-level semantic features of the i-th layer through an MASC-Net modelAnd the high-level characteristic of the corresponding layer->Obtaining the grid attention ^ of the current layer by convolution>The formula is as follows:
where W and b are weight and offset, respectively, σ 1 Activating a function for the ReLU;
in the step 2.2, the step of the method,and/or>Multiplying to obtain corresponding high-level semantic information>The formula is as follows:
wherein σ 2 The function is activated for the sigmoid and,multiplying by feature-by-feature maps; />Obtaining high-level semantic information of a current layer;
step 2.3, forAnd &>Concatenation is performed followed by feature extraction by convolution to obtain high level features->The cascade formula is:
wherein cat (. Cndot.) is a cascade operation;
the extraction characteristic formula is as follows:
wherein, MASC i (-) represents a convolution operation;
step 2.4, reconstructing the multispectral image under the ith scale through the independent corresponding layer reconstruction block, wherein the reconstruction formula is as follows:
Step 2.5, on the first scale, obtaining a feature map on the first scale through feature extractionThen finally heavy with the modelBuilding block to obtain final result>
Step 3, training the MASC-Net model in the step 2 by using a training data set by adopting a random gradient descent algorithm, and correcting a fusion result by obtaining an intermediate image through an intermediate reconstruction layer to obtain a fusion model, wherein the specific steps are as follows:
step 3.1, obtaining a reference image Y 1 Reducing the image size to the reconstructed image under each scale by adopting a bicubic interpolation methodThe formula is:
Y i =D(Y i-1 )
wherein D (-) is downsampling by bicubic interpolation, Y i To simulate a reference image at the ith scale, Y 1 Is a reference image.
Step 3.2, the model constructs the multi-scale loss by using the intermediate reconstructed image and the down-sampling image of the reference image, and the formula is as follows:
wherein l i Is l at the ith scale 1 The loss of the carbon dioxide gas is reduced,for the reconstructed image at the i-th scale, Y i Is the ith scale reference image.
Finally, according to steps 3.1 and 3.2, a loss function specific to the model is constructed:
where λ is the weight of the multi-scale loss. l 1 To damageLoss function,/ i Is a loss function at the ith scale; i is the number of scaling scales.
And 3.3, continuously optimizing loss until convergence in the process of training the network by the random gradient descent algorithm according to the loss functions constructed in the steps 3.1 and 3.2.
The method specifically comprises the following steps: in the training data set, p samples are respectively selected to form a small batch of samples, wherein p =32 is selected, and then random gradient descent is carried out on the samples:
wherein the content of the first and second substances,is a loss function in step 3>Where m represents p samples, and then the model is updated using the gradient descent method:
here, the first and second liquid crystal display panels are,is the loss function versus the parameter theta i The partial derivative of (c). α is the learning rate set by the model.
After the steps, the method can be simplified into the following steps:
in the formula (I), the compound is shown in the specification,representing the gradient of the loss function for which it was found. By taking a sample at random each time the model is updatedTo update the parameters.
The embodiment is as follows:
in the embodiment, 2 satellite remote sensing images are adopted to verify the effectiveness of the provided fusion algorithm; the spatial resolution of the full-color image and the multispectral image which can be shot and obtained by the IKONOS satellite is 1 meter and 4 meters respectively; the spatial resolutions of the panchromatic and multispectral images provided by the QuickBird satellite are 0.7 m and 2.8 m, respectively; the multispectral images acquired by the two satellites respectively comprise four wave bands of red, green, blue and near infrared; the full-color image size used in the experiment was 256 × 256 and the multispectral image size was 64 × 64.
In order to better evaluate the practicability of the remote sensing image fusion method (MASC-Net) based on the 3D convolution multi-scale attention depth convolution network of the present embodiment, the present embodiment provides two experiment types, which are a simulated image experiment and an actual image experiment, respectively, wherein the simulated image experiment reduces the spatial resolution of the panchromatic image and the multispectral image by 4 times at the same time, and uses the panchromatic image and the multispectral image as simulated image data to be fused, and uses the original multispectral image as a standard fusion result for reference, and the actual image experiment directly fuses the real images.
The remote sensing image fusion method (MASC-Net) based on the 3D convolution multi-scale attention depth convolution network provided in this embodiment is mainly compared with the following five widely used image fusion methods: the method comprises a sparse representation-based method SR, a component substitution-based method GS, a multi-resolution analysis-based method Indusion, and a deep learning-based method PNN and PanNet.
Training a network by using a Pythrch software package in an experiment, performing 25000 iterations approximately, and setting the batch size to be 32; for the stochastic gradient descent algorithm, the weight attenuation is set to 10 -3 Momentum of 0.9; the invention sets the MASC-Net network depth to be 5.
Analyzing the effect of the simulated image experiment:
FIG. 2 is a diagram of IKONOS satellite simulation experiment results; FIGS. 2 (a), (b) are an up-sampled multispectral image and a panchromatic image, respectively, FIG. 2 (c) is a reference image, FIGS. 2 (d) - (h) are fused images of five comparison methods, respectively, and FIG. 2 (i) is a fused image of the MASC-Net method (i.e., the present invention); by comparing the fused image with the reference image, it can be seen that all the methods can improve the spatial resolution of the original multispectral image, but obviously the SR has obvious color deviation in vision, the SR and PNN methods have serious spectral distortion, and the vegetation area synthesized by Indusion and PanNet has too sharp edge; as can be seen from fig. 2 (i), the spatial resolution of the multispectral image is improved, the spectral information of the source image is better retained, and the obtained fused image is better and more natural.
FIG. 3 is a graph of results of a Quickbird satellite simulation experiment; FIGS. 3 (a), (b) are an up-sampled multispectral image and a panchromatic image, respectively, FIG. 3 (c) is a reference image, FIGS. 3 (d) - (h) are fused images of five comparison methods, respectively, and FIG. 3 (i) is a fused image of the MASC-Net method (i.e., the present invention); as can be seen from fig. 3 (d) and (g), the color of the SR method and PNN method fused image is changed very much, and has a significant spectral difference from the reference image, and it is found from fig. 3 (e), (f) and (h) that the GS, indusion and PanNet method fused image has a large difference from the reference image in the bare land area at the lower right corner of the image; but the present embodiment and the reference image have small differences in both spectral resolution and spatial resolution.
Visual evaluation comparison can provide more visual understanding for the fusion result, but the most correct judgment is difficult to be given to the fusion result by purely depending on subjective evaluation, so that the fusion result needs to be evaluated together with objective indexes; in the embodiment, six objective evaluation indexes of CC, PSNR, Q4, SAM, SSIM and ERGAS are adopted to comprehensively evaluate the image; wherein CC represents a correlation coefficient, and the similarity degree of spectral and spatial information between the wave band images of the two images is evaluated from the aspect of statistical correlation; PSNR (peak signal-to-noise ratio) is an objective standard for evaluating images; q4 is an objective index for comprehensively evaluating the spatial quality and the spectral quality of the fused image, and the optimal value is 1; SAM represents global spectral distortion measurement, reflecting the color difference between the two images, with an optimal value of 0; the SSIM realizes the measurement of the similarity of the reference image and the structure of each wave band image in the fusion result through the comparison of the brightness, the contrast and the structure; ERGAS represents a global index of fusion image quality, and the optimal value is 0.
Tables 1 and 2 are objective indexes of results of different fusion methods in IKONOS and Quickbird satellite simulation image experiments respectively; as can be seen from tables 1 and 2, most objective indexes of the embodiment are superior to those of other methods, wherein the CC and Q4 values of the embodiment are much higher than those of other methods, that is, the method provided by the embodiment has the strongest correlation between the fused image and the reference image, and can well improve the spatial resolution of the multispectral image and maintain the spectral characteristics.
By integrating visual evaluation and objective index evaluation, the remote sensing image fusion method based on the 3D convolution multi-scale attention depth convolution network can well obtain a fusion image with high space and high spectral resolution.
Table 1: objective index of IKONOS satellite image simulation experiment fusion result
Table 2: objective index of fusion result of Quickbird satellite image simulation experiment
Analyzing the experimental effect of the actual image:
FIG. 4 is a diagram of IKONOS satellite practical experiment results; fig. 4 (a) and (b) are respectively an up-sampling multispectral image and a panchromatic image, fig. 4 (c) to (g) are respectively fusion images of five comparison methods, and fig. 4 (h) is a fusion image of an MSAC-Net method; it can be seen that fig. 4 (d) has little spectral distortion, fig. 4 (f) appears very blurred, and fig. 4 (e) and (g) have poor edge extraction; in general, the remote sensing image fusion method based on the 3D convolution multi-scale attention depth convolution network has the advantages of high fusion image spatial resolution, small spectral distortion and good overall visual effect.
FIG. 5 is a graph of results of a Quickbird satellite experiment; FIGS. 5 (a), (b) are an up-sampled multispectral image and a panchromatic image, respectively, FIGS. 5 (c) - (g) are fused images of five comparison methods, respectively, and FIG. 5 (h) is a fused image of MASC-Net method; FIG. 5 (c) the fused image is over-sharpened, the colors of the fused images in FIGS. 5 (d) and (f) are obviously changed, and the overall spatial resolution of the fused images in FIGS. 5 (e) and (g) is not high; as shown in fig. 5 (h), the remote sensing image fusion method based on the 3D convolution multi-scale attention depth convolution network of the present embodiment is adopted, and the obtained fused image has a clearer outline compared with other methods.
In the actual image experiment, because no reference image exists, in order to effectively and objectively evaluate each fusion result, an objective evaluation index QNR without the reference image is adopted to evaluate the image fusion quality; QNR measures the brightness, contrast and local correlation between the fused image and the original image, and includes a spatial information loss index D s And spectral information loss index D λ Wherein the optimum value of QNR is 1 and D s And D λ The most preferable value of (2) is 0.
Tables 3 and 4 are objective indexes of results of different fusion methods in IKONOS and Quickbird satellite actual image experiments respectively; as can be seen from tables 3 and 4, by using the remote sensing image fusion method based on the 3D convolution multi-scale attention depth convolution network of the present embodiment, the loss of spatial detail information generated in the fusion process is minimal, and although the spectral loss is slightly higher, the objective indicator QNR without reference for evaluation is optimal compared with all other methods.
In summary, the remote sensing image fusion method based on the 3D convolution multi-scale attention depth convolution network according to the embodiment greatly improves the spatial resolution of the fusion image while well retaining the spectral information of the multi-spectral image.
Table 3: objective index of IKONOS satellite image actual experiment fusion result
Table 4: objective index of practical experiment fusion result of Quickbird satellite image
Claims (2)
1. A remote sensing image fusion method of a multi-scale attention depth convolution network based on 3D convolution is characterized by comprising the following steps:
the method comprises the following steps: acquiring a pair of panchromatic images and multispectral images with the same scene and the same angle as a sample in a test data set; acquiring multiple pairs of panchromatic images and multispectral images of multiple scenes to obtain a training data set;
for a sample in the test data set, performing up-sampling processing on a multispectral image in the sample to reach the same size as a full-color image; then, carrying out cascade copying on the full-color image to obtain a full-color image cube with the same wave band number as the multispectral image;
for the training data set, down-sampling all panchromatic images in the training data set to reach the same size as the multispectral images in the training data set; then copying and cascading operation is carried out on the full-color image obtained after down sampling so as to obtain a full-color image cube with the same wave band number as the multispectral image; the copying cascade operation means that firstly copying the full-color image, the quantity of which is the number of wave bands of the multispectral image, and then cascading all the copied images on the wave band dimension to obtain a full-color image cube;
inputting the samples of the training data set into the 3D multi-scale attention depth convolution network model to obtain the fusion result of the multispectral image and the panchromatic image cubeAnd reconstructing and outputting the intermediate image of the model by using a reconstruction block in the model:
wherein the content of the first and second substances,is a feature map obtained by the model on the ith scale, R (-) is a reconstructed block of the corresponding scale of the model, and is based on the corresponding scale>Is an image of an intermediate scale obtained by the i-th layer reconstruction block, is based on>Reconstructing a final image for the model;
the reconstruction block comprises:
step 3.1, obtaining low-level semantic features of the ith layer through a 3D multi-scale attention depth convolution network modelAnd the high-level characteristic of the corresponding layer->Obtaining the grid attention ^ of the current layer by convolution>The formula is as follows:
wherein W and b are the weights respectivelyHeavy sum offset, σ 1 Activating a function for the ReLU;
in the step 3.2, the step of the method,and/or>Multiplying to obtain corresponding high-level semantic information>The formula is as follows:
wherein σ 2 The function is activated for the sigmoid and,for feature-by-feature map multiplication, is selected>Obtaining high-level semantic information of a current layer;
step 3.3, forAnd &>Concatenation is performed followed by feature extraction by convolution to obtain high level features->The cascade formula is:
wherein cat (. Cndot.) is a cascade operation;
the extraction characteristic formula is as follows:
wherein, MASC i (-) represents a convolution operation;
and 3.4, reconstructing the multispectral image under the ith scale through the independent corresponding layer reconstruction block, wherein the reconstruction formula is as follows:
step three; training the 3D multi-scale attention depth convolution network model in the second step by using a training data set by adopting a random gradient descent algorithm until convergence so as to obtain a fusion model;
utilizing bicubic interpolation to carry out down-sampling on the reference image to obtain an image Y with the corresponding size of each intermediate image i :
Y j =(Y i-1 ),=2,3,…,
Wherein D (-) is a downsampling operation, Y j To simulate an intermediate image of the j-th scale, Y j-1 To simulate an intermediate image at the j-1 st scale, Y 1 Is a reference image;
in the process of training the network by the stochastic gradient descent algorithm, continuously optimizing the loss function until convergence, wherein the loss function of the model is as follows:
where λ is the weight of the scale information, l 1 As a loss function,/ j Is a loss function at the j-th scale; i is the number of scaling scales;
step four: and aiming at the full-color image and the multispectral image to be fused of a certain scene, obtaining a final fused image by utilizing the fused model obtained after training in the third step according to the up-sampling in the first step.
2. The method of claim 1, wherein the upsampling and downsampling processes and the copy concatenation of step one comprises the steps of:
step 2.1, in the training data set, adopting an interpolation method to carry out down-sampling with the interval of 4 on the original multispectral image and the panchromatic image; then, performing up-sampling on the down-sampled multispectral image by 4 times by utilizing bicubic interpolation to obtain a low-resolution multispectral image with the same size as the down-sampled panchromatic image;
step 2.2, obtaining the image sets with the same number as the multiple spectral bands for the full color image sampled in step 2.1 by means of copying, and then cascading in the spectral dimension to obtain a data cube of h × w × c, that is:
wherein k belongs to c, c is the wave band number of the multispectral image, and k represents the k-th wave band number of the number c; p HR The original full-color image is obtained;
step 2.3, for the test data set, the full color image is subjected to the same down-sampling operation and copy cascade operation as in step 2.1 and step 2.2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110042742.0A CN112819737B (en) | 2021-01-13 | 2021-01-13 | Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110042742.0A CN112819737B (en) | 2021-01-13 | 2021-01-13 | Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112819737A CN112819737A (en) | 2021-05-18 |
CN112819737B true CN112819737B (en) | 2023-04-07 |
Family
ID=75869142
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110042742.0A Active CN112819737B (en) | 2021-01-13 | 2021-01-13 | Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112819737B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113191325B (en) * | 2021-05-24 | 2023-12-12 | 中国科学院深圳先进技术研究院 | Image fusion method, system and application thereof |
CN113421216B (en) * | 2021-08-24 | 2021-11-12 | 湖南大学 | Hyperspectral fusion calculation imaging method and system |
CN113763299B (en) * | 2021-08-26 | 2022-10-14 | 中国人民解放军军事科学院国防工程研究院工程防护研究所 | Panchromatic and multispectral image fusion method and device and application thereof |
CN113628152B (en) * | 2021-09-15 | 2023-11-17 | 南京天巡遥感技术研究院有限公司 | Dim light image enhancement method based on multi-scale feature selective fusion |
CN113962913B (en) * | 2021-09-26 | 2023-09-15 | 西北大学 | Construction method of deep mutual learning framework integrating spectral space information |
CN114511470B (en) * | 2022-04-06 | 2022-07-08 | 中国科学院深圳先进技术研究院 | Attention mechanism-based double-branch panchromatic sharpening method |
CN115018748A (en) * | 2022-06-06 | 2022-09-06 | 西北工业大学 | Aerospace remote sensing image fusion method combining model structure reconstruction and attention mechanism |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109886870A (en) * | 2018-12-29 | 2019-06-14 | 西北大学 | Remote sensing image fusion method based on binary channels neural network |
CN111292259A (en) * | 2020-01-14 | 2020-06-16 | 西安交通大学 | Deep learning image denoising method integrating multi-scale and attention mechanism |
CN111914917A (en) * | 2020-07-22 | 2020-11-10 | 西安建筑科技大学 | Target detection improved algorithm based on feature pyramid network and attention mechanism |
WO2020237693A1 (en) * | 2019-05-31 | 2020-12-03 | 华南理工大学 | Multi-source sensing method and system for water surface unmanned equipment |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103198463B (en) * | 2013-04-07 | 2014-08-27 | 北京航空航天大学 | Spectrum image panchromatic sharpening method based on fusion of whole structure and space detail information |
US11276151B2 (en) * | 2019-06-27 | 2022-03-15 | Retrace Labs | Inpainting dental images with missing anatomy |
-
2021
- 2021-01-13 CN CN202110042742.0A patent/CN112819737B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109886870A (en) * | 2018-12-29 | 2019-06-14 | 西北大学 | Remote sensing image fusion method based on binary channels neural network |
WO2020237693A1 (en) * | 2019-05-31 | 2020-12-03 | 华南理工大学 | Multi-source sensing method and system for water surface unmanned equipment |
CN111292259A (en) * | 2020-01-14 | 2020-06-16 | 西安交通大学 | Deep learning image denoising method integrating multi-scale and attention mechanism |
CN111914917A (en) * | 2020-07-22 | 2020-11-10 | 西安建筑科技大学 | Target detection improved algorithm based on feature pyramid network and attention mechanism |
Non-Patent Citations (3)
Title |
---|
Attention U-Net:Learning Where to Look for the Pancreas;Ozan Oktay 等;《arXiv》;20180520;第1-10页 * |
基于卷积神经网络超分辨率重建的遥感图像融合;薛洋等;《广西师范大学学报(自然科学版)》;20180415(第02期);第37-45页 * |
基于多尺度特征融合与反复注意力机制的细粒度图像分类算法;何凯等;《天津大学学报(自然科学与工程技术版)》;20200902(第10期);第91-99页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112819737A (en) | 2021-05-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112819737B (en) | Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution | |
CN109886870B (en) | Remote sensing image fusion method based on dual-channel neural network | |
CN110533620B (en) | Hyperspectral and full-color image fusion method based on AAE extraction spatial features | |
CN111127374B (en) | Pan-sharing method based on multi-scale dense network | |
CN110119780B (en) | Hyper-spectral image super-resolution reconstruction method based on generation countermeasure network | |
Xie et al. | Hyperspectral image super-resolution using deep feature matrix factorization | |
Deng et al. | Machine learning in pansharpening: A benchmark, from shallow to deep networks | |
CN110363215B (en) | Method for converting SAR image into optical image based on generating type countermeasure network | |
Li et al. | Hyperspectral image super-resolution using deep convolutional neural network | |
Dong et al. | RRSGAN: Reference-based super-resolution for remote sensing image | |
CN108830796B (en) | Hyperspectral image super-resolution reconstruction method based on spectral-spatial combination and gradient domain loss | |
Hu et al. | Pan-sharpening via multiscale dynamic convolutional neural network | |
CN111080567A (en) | Remote sensing image fusion method and system based on multi-scale dynamic convolution neural network | |
CN104657962B (en) | The Image Super-resolution Reconstruction method returned based on cascading linear | |
CN113327218B (en) | Hyperspectral and full-color image fusion method based on cascade network | |
CN106251320A (en) | Remote sensing image fusion method based on joint sparse Yu structure dictionary | |
CN102982520B (en) | Robustness face super-resolution processing method based on contour inspection | |
CN105096286A (en) | Method and device for fusing remote sensing image | |
Xiao et al. | A dual-UNet with multistage details injection for hyperspectral image fusion | |
CN112801904B (en) | Hybrid degraded image enhancement method based on convolutional neural network | |
CN114821261A (en) | Image fusion algorithm | |
CN116309070A (en) | Super-resolution reconstruction method and device for hyperspectral remote sensing image and computer equipment | |
CN113139902A (en) | Hyperspectral image super-resolution reconstruction method and device and electronic equipment | |
Zhang et al. | Attention-based tri-UNet for remote sensing image pan-sharpening | |
CN115512192A (en) | Multispectral and hyperspectral image fusion method based on cross-scale octave convolution network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |