CN115861749A - Remote sensing image fusion method based on window cross attention - Google Patents

Remote sensing image fusion method based on window cross attention Download PDF

Info

Publication number
CN115861749A
CN115861749A CN202211491547.7A CN202211491547A CN115861749A CN 115861749 A CN115861749 A CN 115861749A CN 202211491547 A CN202211491547 A CN 202211491547A CN 115861749 A CN115861749 A CN 115861749A
Authority
CN
China
Prior art keywords
image
window
multispectral
attention
remote sensing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211491547.7A
Other languages
Chinese (zh)
Inventor
柯成杰
田昕
李松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202211491547.7A priority Critical patent/CN115861749A/en
Publication of CN115861749A publication Critical patent/CN115861749A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides a remote sensing image fusion method based on window cross attention, which utilizes a novel remote sensing image fusion network based on window cross attention to fuse panchromatic and multispectral images into multispectral images with high resolution. High-pass filtering and deep feature extraction are combined to mine more texture information, the problem that high-frequency information is not fully extracted by shallow extraction is solved, and the relation between multispectral and full-color images obtained according to feature similarity is more accurate. Then, we establish a cross-modal relationship between the panchromatic image and the multispectral image through a pixel-level window cross-attention mechanism between local windows of the multispectral and panchromatic images. Pixel level attention is more conducive to preserving fine-grained features than patch level attention. Therefore, more spatial detail from the panchromatic image is transferred into the multispectral image, and the fused multispectral image is clearer.

Description

Remote sensing image fusion method based on window cross attention
Technical Field
The invention belongs to the field of remote sensing image fusion, relates to remote sensing image fusion based on window cross attention, and is suitable for various multispectral and panchromatic image fusion application scenes.
Background
With the rapid development of satellite sensor technology, multispectral images are widely applied in the fields of military systems, environmental analysis and the like. However, limited by the limitations of satellite sensor technology, only high spatial resolution, low spectral resolution Panchromatic (PAN) images, or spectral information rich, low spatial resolution Multispectral (MS) images can be captured. Remote sensing image fusion techniques that fuse multispectral and panchromatic images have been extensively studied in order to generate multispectral images with high spatial resolution.
The existing remote sensing image fusion technology is mainly divided into four types, namely a component substitution method (CS), a multi-resolution analysis Method (MRA), a model based method and a Deep Learning (DL) based method. The component substitution method is to decompose the multispectral image into multiple components and then replace the spatial components with full-color images. However, some spectral information in the multispectral image may be lost due to incomplete separation of the components. The multiresolution analysis method is to inject the high frequency information of the full color image into the multispectral image in the transform domain. Multi-resolution analysis preserves the spectral information better, but sometimes produces spatial distortions. Model-based methods build optimization models by constructing a priori constraints, but the large computational cost and difficulty in selecting optimal manual parameters limit their application in practical applications. The current mainstream deep learning networks are still based on convolutional neural networks, which directly connect multispectral and panchromatic images before they are input into the network. This strategy does not take full advantage of the cross-modal correlation between multispectral and panchromatic images. In addition, the operation of the convolution kernel on all the pixel points is the same, and the effective characteristics cannot be focused to suppress redundant information. Therefore, blurring is easily caused in the highly textured remote sensing image. Therefore, how to design an end-to-end deep learning network to explore cross-modal correlation between the panchromatic image and the multispectral image, the spatial texture details of the panchromatic image are better transferred to the multispectral image, and the multispectral image which is rich in texture information and small in spectral distortion as far as possible is an important problem in the field of remote sensing image fusion.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a remote sensing image fusion method based on window crossing attention.
The invention provides a remote sensing image fusion method based on window crossing attention, which comprises the following steps:
step 1, constructing a depth texture feature extraction module based on characteristics of a multispectral image and a panchromatic image, and converting an input image into a feature domain;
step 2, constructing a window cross attention module to acquire a cross-mode fine-grained relation between the multispectral image and the panchromatic image, and outputting a characteristic image;
step 3, constructing an image decoding module and transmitting the generated characteristic image back to an image domain to obtain a final fusion image;
step 4, constructing a training of an objective function driven image fusion model, wherein the image fusion model comprises a depth texture feature extraction module, a window crossing attention module and an image decoding module;
and 5, training the image fusion model by using the simulation data, and testing the simulated test set and the real test set by using the trained model.
Further, the specific implementation manner of step 1 is as follows;
step 1.1, constructing a high-pass filter to extract high-frequency information of an input image, wherein the input image comprises a multispectral image M, a blurred panchromatic image P and a panchromatic image P, and the multispectral image M, the blurred panchromatic image P and the panchromatic image P are processed by the high-pass filter to respectively obtain G (M), G (P) and G (P);
step 1.2, constructing a single-channel texture extraction module to extract high-frequency characteristics of G (P) and G (P) to obtain K and V, wherein the number of convolution layers in the single-channel texture extraction module is increased layer by layer, and the receptive field of the convolution layers is decreased layer by layer to extract multi-scale detail information;
and 1.3, constructing a multi-channel texture extraction module to extract the high-frequency characteristics of G (M) to obtain Q, wherein the multi-channel texture extraction module also comprises three convolution layers, the number of convolution kernels is increased layer by layer, and the convolution kernels are all 1 multiplied by 1.
Further, the blurred panchromatic image in step 1.1 is obtained by down-sampling and up-sampling the original panchromatic image, the high-pass filter is realized by subtracting low-frequency content obtained by average filtering of the original image from the original image, and the average filtering is realized by a global pooling layer.
Further, in step 1.2, the number of convolution kernels in the three convolution layers is changed from 32, 64 to 128, and the receptive fields of the convolution kernels are changed from 7 × 7, 5 × 5 to 3 × 3.
Further, the specific implementation manner of step 2 is as follows;
step 2.1, inputting high-frequency characteristic Q/K/V epsilon R H,W,C Division into n windows:
Q=[q 1 ,q 2 ,…,q n ]
K=[k 1 ,k 2 ,…,k n ]
V=[v 1 ,v 2 ,…,v n ]
wherein q is i /k i /v i ∈R h,w,C
Figure BDA0003963433070000031
C is the number of characteristic channels, H and w are the image sizes of the images, and H and w are the window sizes;
step 2.2, in order to extract fine-grained features, each window q is transformed by dimension i 、v i 、v i Unfolding the sequence of pixels, for the mth pixel in the sequence
Figure BDA0003963433070000032
And the nth pixel->
Figure BDA0003963433070000033
Calculating feature similarity between them:
Figure BDA0003963433070000034
wherein the content of the first and second substances,
Figure BDA0003963433070000035
represents the pixel-level cross-modal correlation within window i;
step 2.3, normalizing the correlation among the pixels obtained in the step 2.2 by a softmax function:
Figure BDA0003963433070000036
wherein the content of the first and second substances,
Figure BDA0003963433070000037
represents the th from a full-color image>
Figure BDA0003963433070000038
Pixel to multispectral image ^ th ^ based on>
Figure BDA0003963433070000039
The injection gain of each pixel point;
step 2.4, gain according to injection
Figure BDA00039634330700000310
Texture information of the full-color image is extracted, so the m-th pixel of the i-th window of the output feature image is calculated as follows:
Figure BDA00039634330700000311
step 2.5, folding the unfolded pixel sequence into an original pixel window through dimension conversion to obtain the ith window of the output image:
Figure BDA00039634330700000312
and 2.6, respectively obtaining the output characteristic image of each window through the cross attention of the windows, and finally splicing the characteristic images of all the windows to obtain the final output characteristic image:
O=[O 1 ,O 2 ,…,O n ]。
further, the specific implementation manner of step 3 is as follows;
step 3.1, in order to keep the high-frequency characteristic information in the multispectral image, adding an output characteristic image obtained by the cross attention of the window and the multispectral characteristic image Q through a jump connection to obtain a high-frequency characteristic image;
3.2, the high-frequency characteristic image obtained by fusion passes through a convolution layer to obtain a multi-channel characteristic image with higher dimensionality;
and 3.3, remapping the multi-channel characteristic image into a four-channel image by adopting 4 convolution layers with the size of 1 multiplied by 1 to obtain a reconstructed high-frequency image, and then adding the low-frequency multispectral image to the reconstructed high-frequency image to obtain a final fusion image.
Further, the convolution layer in step 3.2 has 256 channels and a convolution kernel size of 3 × 3.
Further, in step 3.3, the number of convolution kernels for 4 convolutional layers is 128, 64, 32, 4, respectively.
Further, the constructed loss function in step 4 is as follows;
Figure BDA0003963433070000041
wherein, F n And G n Representing the fused image and the reference image, respectively, and b is the batch size.
Further, step 5 includes comparing the test result with the existing algorithm through objective evaluation indexes, wherein the objective evaluation indexes include peak signal-to-noise ratio and no-reference indexes.
Compared with the prior art, the invention has the advantages and beneficial effects that:
firstly, sending an up-sampled multispectral image, a blurred panchromatic image and an original panchromatic image into a high-pass filter, then respectively sending the multispectral image, the blurred panchromatic image and the original panchromatic image into a multi-channel and single-channel depth characteristic extraction module, converting the images into characteristic domains, and extracting depth high-frequency characteristics; then, expressing the extracted high-frequency features as a query vector Q, a key vector K and a value vector V, and acquiring cross-mode correlation between the multispectral image and the panchromatic image through the cross attention of the window; and finally, reconstructing the image, and converting the high-frequency characteristic image obtained by fusion back to the image domain. Due to the combination of high-pass filtering and deep feature extraction, more texture information can be mined, and the relation between the multispectral image and the full-color image obtained according to the feature similarity is more accurate. Furthermore, a cross-modal relationship is established between local windows of the multispectral and panchromatic images through a pixel-level window cross-attention mechanism. Pixel level attention helps preserve fine-grained features, transferring more spatial detail from the panchromatic image into the multispectral image than patch level attention. Therefore, the multispectral image obtained by fusion is clearer, and the spectral information is well stored.
Drawings
Fig. 1 is an overall frame diagram of a remote sensing image fusion network based on window crossing attention according to an embodiment.
FIG. 2 is a network architecture diagram of a window crossing attention module of an embodiment. Where SM is the Softmax normalization function and RS is the dimension transformation module.
Fig. 3 is a graph of the test results of simulation data of an embodiment, wherein (a) is a low resolution multispectral image, (b) is the result of IHS, (c) is the result of PNN, (d) is the result of FuionNet, (e) is the result of the proposed method of the present invention, and (f) is a reference image.
Fig. 4 is a graph of the results of testing the real data for an example where (a) is a low resolution multispectral image, (b) is the result of IHS, (c) is the result of PNN, (d) is the result of FuionNet, (e) is the result of the proposed method of the present invention, and (f) is a full color image.
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention will be described in further detail with reference to the accompanying drawings and examples, it is to be understood that the examples described herein are only for the purpose of illustrating the present invention and are not to be construed as limiting the present invention.
The invention is mainly directed to the application requirements for obtaining a high-resolution multispectral image. High-pass filtering and deep feature extraction are combined to mine more texture information, and then a cross-modal relationship between a full-color image and a multi-spectral image is established between local windows of the multi-spectral image and the full-color image through a pixel-level window cross attention mechanism. Therefore, more texture details of the full-color image are transferred to the multispectral image, and the multispectral fusion image with rich space details and small spectral distortion is obtained.
Fig. 1 is an overall framework diagram of a remote sensing image fusion network based on window crossing attention of the embodiment, and fig. 2 is a network framework diagram of a window crossing attention module of the embodiment. The embodiment provides a remote sensing image fusion method based on window crossing attention to realize fusion of a non-multispectral image and a full-color image, which specifically comprises the following steps:
step 1: and constructing a depth texture feature extraction module based on the characteristics of the multispectral image and the panchromatic image, and converting the input image into a feature domain. The specific implementation comprises the following substeps:
step 1.1: and constructing a high-pass filter to extract high-frequency information of the input image. The input image comprises a multispectral image M and a blurred panchromatic image
Figure BDA0003963433070000051
A panchromatic image P, wherein the blurred panchromatic image is obtained by down-sampling and up-sampling the original panchromatic image, the high-pass filter is designed by subtracting the low-frequency content obtained by averaging and filtering the original image from the original image, the averaging and filtering are realized by a global pooling layer, and the multispectral image M and the blurred panchromatic image->
Figure BDA0003963433070000052
And the full-color image P is processed by a high-pass filter to obtain G (M),. Or->
Figure BDA0003963433070000053
And G (P).
Step 1.2: constructing single-channel texture extraction module extraction
Figure BDA0003963433070000061
And the high frequency characteristics of G (P), yielding K and V. The number of convolution kernels of the three convolution layers in the single-channel texture extraction module is increased layer by layer, the outline features and the high-dimensionality features of the image are extracted step by step, the resolution of the image is kept unchanged in the process, and the number of the convolution kernels is changed from 32 to 64 to 128. The receptive field of the convolution kernel is gradually reduced from 7 multiplied by 7, 5 multiplied by 5 to 3 multiplied by 3 to extract multi-scale detail information, the convolution kernel firstly covers a larger area of the image to extract more area information, and then is gradually reduced to learn deeper detail information for a smaller area.
Step 1.3: and constructing a multi-channel texture extraction module to extract the high-frequency characteristics of G (M) to obtain Q. Similar to step 1.2, the number of convolution kernels of the multi-channel texture extraction module becomes larger layer by layer, and the number of the convolution kernels is changed from 32 and 64 to 128. The convolution kernel is 1 x 1 in size to maintain spatial fidelity and maximize the use of spatial information of the multispectral image.
And 2, step: and a window cross attention module (WCA) is constructed to acquire a cross-modal fine-grained relation between the multispectral image and the panchromatic image. The specific implementation comprises the following substeps:
step 2.1: inputting high-frequency characteristic Q/K/V epsilon R H,W,C Division into n windows:
Q=[q 1 ,q 2 ,…,q n ]
K=[k 1 ,k 2 ,…,k n ]
V=[v 1 ,v 2 ,…,v n ]
wherein q is i /k i /v i ∈R h,w,C
Figure BDA0003963433070000062
C isThe number of feature channels. In the present embodiment, H =256, w =256, H =2, w =2, c =128, n =16384. H. W is the image size of the image and h, W are the window sizes, representing the division of the image into 16384 image blocks of 2x2 size.
Step 2.2: to extract fine-grained features, each window q is transformed by a dimension transform (RS) i 、v i 、v i The sequence of pixels is unfolded. For the mth pixel in the sequence
Figure BDA0003963433070000068
And the nth pixel->
Figure BDA0003963433070000064
Feature similarity between them is calculated by inner product operation in similarity relation Calculation (CRM):
Figure BDA0003963433070000065
wherein the content of the first and second substances,
Figure BDA0003963433070000066
representing the pixel level cross modal correlation within window i.
Step 2.3: the correlation between the pixels obtained in step 2.2 is normalized by the softmax function (SM):
Figure BDA0003963433070000067
wherein the content of the first and second substances,
Figure BDA0003963433070000071
represents the th from a full-color image>
Figure BDA0003963433070000072
Pixel to multispectral image ^ th ^ based on>
Figure BDA0003963433070000073
The injection gain of each pixel.
Step 2.4: according to the injection gain
Figure BDA0003963433070000074
Texture information of the full-color image can be extracted. The m-th pixel of the i-th window of the output feature image is therefore calculated as follows:
Figure BDA0003963433070000075
step 2.5: folding the unfolded pixel sequence into a pixel window through dimension transformation (RS), and obtaining an ith window of an output image:
Figure BDA0003963433070000076
step 2.6: respectively obtaining the output characteristic image of each window through the cross attention of the windows, and finally splicing the characteristic images of all the windows to obtain the final output characteristic image:
O=[O 1 ,O 2 ,…,O n ]
and step 3: the constructed image decoding module transmits the generated feature image back to the image domain. The specific implementation comprises the following substeps:
step 3.1: in order to keep the high-frequency characteristic information in the multispectral image, the output characteristic image obtained by the cross attention of the window is added with the multispectral characteristic image Q through a jump connection to obtain the high-frequency characteristic image.
Step 3.2: the high-frequency characteristic image obtained by fusion firstly passes through a convolution kernel with 256 channels and 3 multiplied by 3 to obtain a multi-channel characteristic image with higher dimensionality.
Step 3.3: and 4 convolution kernels with the size of 1 multiplied by 1 are adopted, the number of the convolution kernels of 4 layers is 128, 64, 32 and 4 respectively, the multi-channel characteristic image is remapped into a four-channel image to obtain a reconstructed high-frequency image, and then the reconstructed high-frequency image and the low-frequency multispectral image are added to obtain a final fusion image.
And 4, step 4: and constructing an image fusion model objective function driving model training. The specific implementation comprises the following substeps:
step 4.1: a loss function is constructed. Constructing a loss function based on L2:
Figure BDA0003963433070000077
wherein, F n And G n Representing the fused image and the reference image, respectively, and b is the batch size. In the present embodiment, b =8.
Step 4.2: and b data are randomly selected from the training set to be input into the network, one iteration is completed, and network parameters are adjusted.
And 5: and training the network by using simulation data, testing the simulation test set and the real test set by using the trained model, and comparing the simulation test set and the real test set with other algorithms. The specific implementation comprises the following substeps:
step 5.1: and training the network by using simulation data, and comparing the obtained test result with each comparison method on visual and objective evaluation indexes. In this example, we used mainly images of high-resolution second satellite in the experiment, 4000 image pairs were divided into 90% for training and the remaining 10% for verification. The reference image takes an original MS image with a resolution of 256 × 256, takes a downsampled multispectral and panchromatic image with a factor of 4 as input, and the input image size is 256 × 256. The simulated test image size is 512 x 512 images. To verify the effectiveness of the proposed method, we compared the proposed method with the traditional method and the deep learning based method. The traditional method is IHS, and deep learning methods comprise PNN and fusion Net. In the training phase, the initial learning rate was 0.001 and the batch size was set to 8, and the initial learning rate was attenuated by multiplying by 0.5 when the peak signal-to-noise ratio (PSNR) degradation of the validation set reached 20 rounds. We used 450 rounds to train the proposed network and optimized by Adam optimizer. The visual comparison results are shown in fig. 3. The objective evaluation index is peak signal to noise ratio (PSNR), and the average result on the simulation test set is shown in table 1.
And step 5.2: and testing the network performance by using the real data, and comparing the obtained test result with each comparison method on visual and objective evaluation indexes. 210 real images with a size of 512 × 512 are selected for testing to verify the performance of the proposed method in the real world. The comparison methods are IHS, PNN and FusionNet. The visual comparison results are shown in figure 4. The objective evaluation index was no reference index (QNR), and the average results on the real test set are shown in Table 2.
TABLE 1 average PSNR (dB) comparison of simulation data for different methods (ideal: +∞)
Figure BDA0003963433070000081
TABLE 2 average PSNR (dB) comparison of the different methods of the real data (ideal: + ∞)
Figure BDA0003963433070000082
It can be seen that the method firstly carries out high-frequency feature extraction, then obtains cross-mode correlation between pixel-level panchromatic images and multispectral images through window cross attention, and finally transfers texture details of the panchromatic images to the multispectral images, thereby realizing the best effect of remote sensing image fusion.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
It should be understood that the above-mentioned embodiments are described in some detail, and not intended to limit the scope of the invention, and those skilled in the art will be able to make alterations and modifications without departing from the scope of the invention as defined by the appended claims.

Claims (10)

1. A remote sensing image fusion method based on window cross attention is characterized by comprising the following steps:
step 1, constructing a depth texture feature extraction module based on multispectral image and panchromatic image characteristics, and converting an input image into a feature domain;
step 2, constructing a window cross attention module to acquire a cross-mode fine-grained relation between the multispectral image and the panchromatic image, and outputting a characteristic image;
step 3, constructing an image decoding module and transmitting the generated characteristic image back to an image domain to obtain a final fusion image;
step 4, constructing a training of an objective function driven image fusion model, wherein the image fusion model comprises a depth texture feature extraction module, a window crossing attention module and an image decoding module;
and 5, training the image fusion model by using the simulation data, and testing the simulated test set and the real test set by using the trained model.
2. The remote sensing image fusion method based on the window crossing attention as claimed in claim 1, characterized in that: the specific implementation manner of the step 1 is as follows;
step 1.1, constructing a high-pass filter to extract high-frequency information of an input image, wherein the input image comprises a multispectral image M and a blurred panchromatic image
Figure FDA0003963433060000012
Panchromatic image P, multispectral image M, blurred panchromatic image>
Figure FDA0003963433060000011
And the full-color image P is processed by a high-pass filter to obtain G (M),. Or->
Figure FDA0003963433060000013
And G (P);
step 1.2, constructing a single-channel texture extraction module for extraction
Figure FDA0003963433060000014
And G (P) high-frequency characteristics to obtain K and V, wherein the number of convolution kernels in the single-channel texture extraction module is increased layer by layer, and the receptive field of the convolution kernels is decreased layer by layer to extract multi-scale detail information;
and 1.3, constructing a multi-channel texture extraction module to extract the high-frequency characteristics of G (M) to obtain Q, wherein the multi-channel texture extraction module also comprises three convolution layers, the number of convolution kernels is increased layer by layer, and the convolution kernels are all 1 multiplied by 1.
3. The remote sensing image fusion method based on the window crossing attention as claimed in claim 2, characterized in that: the blurred panchromatic image in the step 1.1 is obtained by down-sampling and up-sampling the original panchromatic image, the high-pass filter is realized by subtracting low-frequency content obtained by average filtering of the original image from the original image, and the average filtering is realized by a global pooling layer.
4. The remote sensing image fusion method based on the window crossing attention as claimed in claim 2, characterized in that: in step 1.2, the number of convolution kernels in the three convolution layers is changed from 32 and 64 to 128, and the receptive fields of the convolution kernels are changed from 7 × 7 and 5 × 5 to 3 × 3.
5. The remote sensing image fusion method based on the window crossing attention as claimed in claim 2, characterized in that: the specific implementation manner of the step 2 is as follows;
step 2.1, inputting high-frequency characteristic Q/K/V epsilon R H,W,C Division into n windows:
Q=[q 1 ,q 2 ,…,q n ]
K=[k 1 ,k 2 ,…,k n ]
V=[v 1 ,v 2 ,…,v n ]
wherein q is i /k i /v i ∈R h,w,C
Figure FDA0003963433060000021
C is the number of characteristic channels, H and W are the image sizes of the images, and H and W are the window sizes;
step 2.2, in order to extract fine-grained features, each window q is transformed by dimension i 、v i 、v i Unfolding the sequence of pixels, for the mth pixel in the sequence
Figure FDA0003963433060000022
And the nth pixel->
Figure FDA0003963433060000023
Calculating feature similarity between them: />
Figure FDA0003963433060000024
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003963433060000025
represents the pixel-level cross-modal correlation within window i;
step 2.3, normalizing the correlation among the pixels obtained in the step 2.2 by a softmax function:
Figure FDA0003963433060000026
wherein the content of the first and second substances,
Figure FDA0003963433060000027
represents the th from a full-color image>
Figure FDA0003963433060000028
Fifth of multispectral image from pixel point>
Figure FDA0003963433060000029
The injection gain of each pixel point;
step 2.4, gain according to injection
Figure FDA00039634330600000210
Texture information of the full-color image is extracted, so the m-th pixel of the i-th window of the output feature image is calculated as follows:
Figure FDA00039634330600000211
step 2.5, folding the unfolded pixel sequence into an original pixel window through dimension conversion to obtain the ith window of the output image:
Figure FDA00039634330600000212
and 2.6, respectively obtaining the output characteristic image of each window through the cross attention of the windows, and finally splicing the characteristic images of all the windows to obtain the final output characteristic image:
O=[O 1 ,O 2 ,…,O n ]。
6. the remote sensing image fusion method based on window crossing attention as claimed in claim 3, characterized in that: the specific implementation manner of the step 3 is as follows;
step 3.1, in order to keep the high-frequency characteristic information in the multispectral image, adding an output characteristic image obtained by the cross attention of the window and the multispectral characteristic image Q through a jump connection to obtain a high-frequency characteristic image;
step 3.2, the high-frequency characteristic image obtained by fusion passes through a convolution layer to obtain a multi-channel characteristic image with higher dimensionality;
and 3.3, remapping the multi-channel characteristic image into a four-channel image by adopting 4 convolution layers with the size of 1 multiplied by 1 to obtain a reconstructed high-frequency image, and then adding the low-frequency multispectral image to the reconstructed high-frequency image to obtain a final fusion image.
7. The remote sensing image fusion method based on the window crossing attention as claimed in claim 6, characterized in that: the convolution layer in step 3.2 has 256 channels and a convolution kernel size of 3 x 3.
8. The remote sensing image fusion method based on the window crossing attention as claimed in claim 6, characterized in that: in step 3.3, the number of convolution kernels for the 4 convolutional layers is 128, 64, 32, 4, respectively.
9. The remote sensing image fusion method based on the window crossing attention as claimed in claim 1, characterized in that: the constructed loss function in step 4 is as follows;
Figure FDA0003963433060000031
wherein, F n And G n Representing the fused image and the reference image, respectively, and b is the batch size.
10. The remote sensing image fusion method based on the window crossing attention as claimed in claim 1, characterized in that: step 5 also comprises comparing the test result with the existing algorithm through objective evaluation indexes, wherein the objective evaluation indexes comprise peak signal-to-noise ratio and no-reference indexes.
CN202211491547.7A 2022-11-25 2022-11-25 Remote sensing image fusion method based on window cross attention Pending CN115861749A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211491547.7A CN115861749A (en) 2022-11-25 2022-11-25 Remote sensing image fusion method based on window cross attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211491547.7A CN115861749A (en) 2022-11-25 2022-11-25 Remote sensing image fusion method based on window cross attention

Publications (1)

Publication Number Publication Date
CN115861749A true CN115861749A (en) 2023-03-28

Family

ID=85666544

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211491547.7A Pending CN115861749A (en) 2022-11-25 2022-11-25 Remote sensing image fusion method based on window cross attention

Country Status (1)

Country Link
CN (1) CN115861749A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116030311A (en) * 2023-03-29 2023-04-28 山东省海洋资源与环境研究院(山东省海洋环境监测中心、山东省水产品质量检验中心) Wetland classification method based on multi-source remote sensing data and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116030311A (en) * 2023-03-29 2023-04-28 山东省海洋资源与环境研究院(山东省海洋环境监测中心、山东省水产品质量检验中心) Wetland classification method based on multi-source remote sensing data and electronic equipment

Similar Documents

Publication Publication Date Title
CN112507997B (en) Face super-resolution system based on multi-scale convolution and receptive field feature fusion
CN111080567B (en) Remote sensing image fusion method and system based on multi-scale dynamic convolutional neural network
CN111127374B (en) Pan-sharing method based on multi-scale dense network
CN111275637A (en) Non-uniform motion blurred image self-adaptive restoration method based on attention model
CN109636769A (en) EO-1 hyperion and Multispectral Image Fusion Methods based on the intensive residual error network of two-way
Luo et al. Lattice network for lightweight image restoration
CN113643197B (en) Two-order lightweight network full-color sharpening method combining guided filtering and NSCT
CN116152120B (en) Low-light image enhancement method and device integrating high-low frequency characteristic information
CN114581347B (en) Optical remote sensing spatial spectrum fusion method, device, equipment and medium without reference image
CN113191325B (en) Image fusion method, system and application thereof
CN111951164A (en) Image super-resolution reconstruction network structure and image reconstruction effect analysis method
Yang et al. License plate image super-resolution based on convolutional neural network
CN112163998A (en) Single-image super-resolution analysis method matched with natural degradation conditions
CN116205830A (en) Remote sensing image fusion method based on combination of supervised learning and unsupervised learning
CN117474781A (en) High spectrum and multispectral image fusion method based on attention mechanism
CN116309062A (en) Remote sensing image super-resolution reconstruction method
CN117197008A (en) Remote sensing image fusion method and system based on fusion correction
CN115578262A (en) Polarization image super-resolution reconstruction method based on AFAN model
CN115861749A (en) Remote sensing image fusion method based on window cross attention
Wali et al. Recent progress in digital image restoration techniques: a review
Wen et al. The power of complementary regularizers: Image recovery via transform learning and low-rank modeling
CN113628143A (en) Weighted fusion image defogging method and device based on multi-scale convolution
CN117408924A (en) Low-light image enhancement method based on multiple semantic feature fusion network
CN117350923A (en) Panchromatic and multispectral remote sensing image fusion method based on GAN and transducer
CN114511470B (en) Attention mechanism-based double-branch panchromatic sharpening method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination