CN117726548A - Panchromatic sharpening method based on cross-modal feature attention interaction - Google Patents

Panchromatic sharpening method based on cross-modal feature attention interaction Download PDF

Info

Publication number
CN117726548A
CN117726548A CN202410161201.3A CN202410161201A CN117726548A CN 117726548 A CN117726548 A CN 117726548A CN 202410161201 A CN202410161201 A CN 202410161201A CN 117726548 A CN117726548 A CN 117726548A
Authority
CN
China
Prior art keywords
image
follows
full
cross
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410161201.3A
Other languages
Chinese (zh)
Other versions
CN117726548B (en
Inventor
孙倩
孙宇
潘成胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202410161201.3A priority Critical patent/CN117726548B/en
Publication of CN117726548A publication Critical patent/CN117726548A/en
Application granted granted Critical
Publication of CN117726548B publication Critical patent/CN117726548B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Processing (AREA)

Abstract

The invention discloses a full-color sharpening method based on cross-modal feature attention interaction, which comprises the following steps of: (1) Acquiring and preprocessing a data set, and manufacturing a training sample image; (2) Obtaining a hyperspectral image under the space scale of the full-color image by using an up-sampling method based on depth image priori; (3) Decomposing and converting the up-sampled hyperspectral image and the full-color image by using an overlapped patch embedding module to respectively obtain spectral characteristic and spatial characteristic representation; (4) Utilizing a plurality of spectrum-space attention interaction modules to directly interact and fuse spectrum characteristics and space characteristics to obtain fused cross-modal characteristic representation; (5) Restoring dimensions by using an image reconstruction layer to obtain a hyperspectral image with high spatial resolution; aiming at high-dimensional and heterogeneous hyperspectral images and panchromatic images, the invention directly interacts hyperspectral image features and panchromatic image features, and realizes self-attention mechanism operation of image data of different modes with less cost.

Description

Panchromatic sharpening method based on cross-modal feature attention interaction
Technical Field
The invention relates to the technical field of computer vision images, in particular to a full-color sharpening method based on cross-modal feature attention interaction.
Background
The existing full-color sharpening technology can be classified into a traditional method and a deep learning-based method, wherein the traditional method comprises component replacement-based, multi-resolution analysis-based and variation optimization-based methods, is limited by characteristic representation capability and nonlinear optimization capability, and fusion images generated by the traditional method have ubiquitous spectral distortion and spatial distortion and have poorer performance compared with the deep learning-based method. In the deep learning method, the method based on the convolutional neural network is widely studied, and a 2D or 3D convolutional kernel is naturally suitable for the image field, has the advantages of parameter sharing, sliding window, induction bias and the like, but convolution only can extract local features of an image, is insensitive to global features of the image, and is difficult to achieve proper feature fusion aiming at image data (hyperspectral images and panchromatic images) of different modes. Recently, full-color sharpening methods based on self-attention mechanisms have received a lot of attention, which have the advantages of long-distance relational modeling and dynamics, and the like, and can extract global features of images. However, the self-attention mechanism cannot extract local features of the image, and the cost is high in terms of parameter quantity and computational complexity, and the cost is further increased when the self-attention mechanism is applied to a high-dimensional hyperspectral image processing task. Models based on self-attention mechanisms often require a large number of sample image learning, and it is difficult to satisfy this condition for the case where remote sensing image data sets are rare.
Disclosure of Invention
The invention aims to provide a full-color sharpening method based on cross-modal feature attention interaction, which solves the problem that the full-color sharpening method based on a neural network can only extract a single local or full-image feature of an image; the method aims at solving the problems of low operation efficiency and high cost overhead of a self-attention mechanism aiming at high-dimensional heterogeneous hyperspectral images and full-color images.
The technical scheme is as follows: the invention discloses a full-color sharpening method based on cross-modal feature attention interaction, which comprises the following steps of:
(1) Acquiring and preprocessing a data set, and manufacturing a training sample image;
(2) Up-sampling the input low-resolution hyperspectral image by using an up-sampling method based on depth image prior to obtain a hyperspectral image under the space scale of the full-color image;
(3) Decomposing and converting the up-sampled hyperspectral image and the full-color image by using an overlapped patch embedding module to respectively obtain spectral characteristic and spatial characteristic representation;
(4) Utilizing a plurality of spectrum-space attention interaction modules to directly interact and fuse spectrum characteristics and space characteristics to obtain fused cross-modal characteristic representation;
(5) And restoring dimensions by using an image reconstruction layer to obtain a hyperspectral image with high spatial resolution.
Further, the step (1) specifically comprises the following steps: acquiring a data set for training, cropping hyperspectral images in the data set, generating a plurality of image patches as tag hyperspectral imagesWherein->Is the number of wide, high and wave bands; for->Is subjected to an averaging pooling in the visible light band to produce a full-color image +.>The method comprises the steps of carrying out a first treatment on the surface of the Using Gaussian filter pairs->Spatially blurring and downsampling it by a factor of N to generate a low resolution hyperspectral imageThe method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>,/>The method comprises the steps of carrying out a first treatment on the surface of the P and->Make up input pair->Is a label.
Further, the step (2) specifically includes the following steps: high spectrum image with low resolution is obtained by utilizing an up-sampling method based on depth image prioriUpsampling to the P spatial scale of the panchromatic image to obtain an upsampled hyperspectral imageThe method comprises the steps of carrying out a first treatment on the surface of the The formula is as follows:
wherein,initializing a noise tensor for the random used as input; />Is neural network->Is a parameter of optimization; />An energy function driven by the task; the formula is as follows:
wherein,indicating a downsampling operation.
Further, the step (3) specifically includes the following steps: for different input images P andrespectively designing a full-color image overlapping patch embedding module and a hyperspectral image overlapping patch embedding module, wherein the patch embedding module consists of convolution and batch normalization layers; the panchromatic image overlapping patch embedding module spatially decomposes the panchromatic image P into a plurality of mutually overlapping image patches +.>Wherein K is wide and high; and carrying out feature conversion on each patch to generate patch embedding of the package original information, wherein the formula is as follows:
wherein,representing a feature map consisting of n patch embeddings; d is the embedding dimension; />Embedding a function of a module for the panchromatic image overlay patch;
the hyperspectral image overlap patch embedding module formula is as follows:
further, the step (4) specifically includes the following steps: including a token mixer and a spectrum mixer;
the token mixer is defined as follows:
wherein,a cross-modal feature representation for the fusion; />And->Representing a pixel-by-pixel convolution and a standard 3 x 3 convolution, respectively; geLU is Gaussian error linear unit; repeating the above operations twice in a token mixer to balance the overhead and performance of the model;
the spectral mixer is defined as follows:
wherein,3 x 3 convolutions separable for depth; in two consecutive->A channel growth rate 16 is set in between to adequately model the relationship between adjacent spectra.
Further, the token mixer comprises spectrum-space interaction attention and spectrum information bottlenecks; first, the spectral-spatial interaction attention is defined as follows:
wherein,is a spectral global context;
then, introducing space priori knowledge of the full-color image P to define a cross-modal feature similarity matrix
Wherein,and->Respectively by->And->Remolding to obtain; second, the spectral-spatial interaction attention is updated as follows:
wherein,fusing spectral information from the hyperspectral image with spatial information of the panchromatic image for the updated global context; />The similarity matrix is normalized; />And->Is a learnable scalar; />Andmean and standard deviation; />Representation ofNumerical stability; finally, the spectrum-space interaction attention is evolved into a plurality of groups, defined as follows:
further, the spectral information bottleneck is defined as follows: the spectral information bottleneck is defined as follows:
wherein,、/>、/>for the full link layer, for the drop rate, set to 16.
Further, the step (5) specifically comprises the following steps: fused feature representation that highly aggregates spectra and spatial informationThrough a convolution layer, the dimensions are reconstructed and restored,
generating residual hyperspectral imagesThe formula is as follows:
wherein,representing model parameters;
finally, a fused hyperspectral image is obtained
An electronic device according to the present invention includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program implementing any one of the methods of full color sharpening based on cross-modal feature attention interactions when loaded into the processor.
A storage medium according to the present invention stores a computer program which, when executed by a processor, implements a cross-modal feature attention interaction-based panchromatic sharpening method according to any one of the claims.
The beneficial effects are that: compared with the prior art, the invention has the following remarkable advantages: the model of the invention is formed by stacking convolution layers completely, but combines the advantages of convolution and self-attention mechanisms. The convolution is effectively combined with the self-attention, so that the fitting speed of a model is increased, and the dependence of a self-attention mechanism on a large-scale training data set is reduced; aiming at high-dimensional and heterogeneous hyperspectral images and panchromatic images, hyperspectral image features and panchromatic image features are directly interacted, and self-attention mechanism operation of image data of different modes is realized with less cost.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram of the overall architecture of the AIDB-Net of the present invention;
fig. 3 is a diagram showing the internal constitution of the SSIA of the present invention.
Detailed Description
The technical scheme of the invention is further described below with reference to the accompanying drawings.
As shown in fig. 1, an embodiment of the present invention provides a full-color sharpening method based on cross-modal feature attention interaction, including the following steps:
(1) Acquiring and preprocessing a data set, and manufacturing a training sample image; the method comprises the following steps: acquiring a data set for training, cutting hyperspectral images in the data set, generating a plurality of image patches, and performingIs a label hyperspectral imageWherein->Is the number of wide, high and wave bands; for->Is subjected to average pooling in the visible light wave band to generate a full-color imageThe method comprises the steps of carrying out a first treatment on the surface of the Using Gaussian filter pairs->Spatially blurring and downsampling it by a factor of N to generate a hyperspectral image of low resolution +.>The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>,/>The method comprises the steps of carrying out a first treatment on the surface of the P and->Make up input pair->Is a label.
(2) Up-sampling the input low-resolution hyperspectral image by using an up-sampling method based on depth image prior to obtain a hyperspectral image under the space scale of the full-color image; the method comprises the following steps: high spectrum image with low resolution is obtained by utilizing an up-sampling method based on depth image prioriUpsampling to the P spatial scale of the panchromatic image to obtain an upsampled hyperspectral imageThe method comprises the steps of carrying out a first treatment on the surface of the The formula is as follows:
wherein,initializing a noise tensor for the random used as input; />Is neural network->Is a parameter of optimization; />An energy function driven by the task; the formula is as follows:
wherein,indicating a downsampling operation.
(3) As shown in fig. 2, the up-sampled hyperspectral image and panchromatic image are decomposed and converted using an overlap patch embedding module to obtain spectral and spatial feature representations, respectively; the method comprises the following steps: for different input images P andrespectively designing a full-color image overlapping patch embedding module and a hyperspectral image overlapping patch embedding module, wherein the patch embedding module consists of convolution and batch normalization layers; the panchromatic image overlapping patch embedding module spatially decomposes the panchromatic image P into a plurality of mutually overlapping image patches +.>Wherein K is wide and high; feature conversion is performed on each patch to generate patch embedding of package original informationThe formula is as follows:
wherein,representing a feature map consisting of n patch embeddings; d is the embedding dimension; />Embedding a function of a module for the panchromatic image overlay patch;
the hyperspectral image overlap patch embedding module formula is as follows:
(4) As shown in fig. 3, a plurality of spectrum-space attention interaction modules are utilized to directly interact and fuse spectrum features and space features to obtain fused cross-modal feature representation; the method comprises the following steps: including a token mixer and a spectrum mixer;
the token mixer is defined as follows:
wherein,a cross-modal feature representation for the fusion; />And->Representing a pixel-by-pixel convolution and a standard 3 x 3 convolution, respectively; geLU is Gaussian error linear unit; repeating the above operations twice in a token mixer to balance the overhead and performance of the model;
the token mixer comprises spectrum-space interaction attention and spectrum information bottlenecks; first, the spectral-spatial interaction attention is defined as follows:
wherein,is a spectral global context;
then, introducing space priori knowledge of the full-color image P to define a cross-modal feature similarity matrix
Wherein,and->Respectively by->And->Remolding to obtain; second, the spectral-spatial interaction attention is updated as follows:
wherein,fusing spectral information from the hyperspectral image with spatial information of the panchromatic image for the updated global context; />The similarity matrix is normalized; />And->Is a learnable scalar; />Andmean and standard deviation; />Representing the numerical stability; finally, the spectrum-space interaction attention is evolved into a plurality of groups, defined as follows:
the spectral information bottleneck is defined as follows: the spectral information bottleneck is defined as follows:
wherein,、/>、/>for the full link layer, for the drop rate, set to 16.
The spectral mixer is defined as follows:
wherein,3 x 3 convolutions separable for depth; in two consecutive->A channel growth rate 16 is set in between to adequately model the relationship between adjacent spectra.
(5) And restoring dimensions by using an image reconstruction layer to obtain a hyperspectral image with high spatial resolution. The method comprises the following steps: fused feature representation that highly aggregates spectra and spatial informationThrough a convolution layer, the dimensions are reconstructed and restored,
generating residual hyperspectral imagesThe formula is as follows:
wherein,representing model parameters;
finally, a fused hyperspectral image is obtained
The embodiment of the invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program realizes any full-color sharpening method based on cross-modal feature attention interaction when being loaded into the processor.
Embodiments of the present invention also provide a storage medium storing a computer program which, when executed by a processor, implements a cross-modal feature attention interaction-based panchromatic sharpening method as defined in any one of the claims.

Claims (10)

1. A method of full color sharpening based on cross-modal feature attention interactions, comprising the steps of:
(1) Acquiring and preprocessing a data set, and manufacturing a training sample image;
(2) Up-sampling the input low-resolution hyperspectral image by using an up-sampling method based on depth image prior to obtain a hyperspectral image under the space scale of the full-color image;
(3) Decomposing and converting the up-sampled hyperspectral image and the full-color image by using an overlapped patch embedding module to respectively obtain spectral characteristic and spatial characteristic representation;
(4) Utilizing a plurality of spectrum-space attention interaction modules to directly interact and fuse spectrum characteristics and space characteristics to obtain fused cross-modal characteristic representation;
(5) And restoring dimensions by using an image reconstruction layer to obtain a hyperspectral image with high spatial resolution.
2. The full-color sharpening method based on cross-modal feature attention interaction of claim 1, wherein the step (1) is specifically as follows: acquiring a data set for training, cropping hyperspectral images in the data set, generating a plurality of image patches as tag hyperspectral imagesWherein->Is the number of wide, high and wave bands; for a pair ofIs subjected to an averaging pooling in the visible light band to produce a full-color image +.>The method comprises the steps of carrying out a first treatment on the surface of the Using Gaussian filter pairs->Proceeding withSpatially blurring and downsampling it by a factor of N, generating a hyperspectral image of low resolution +.>The method comprises the steps of carrying out a first treatment on the surface of the Wherein,,/>the method comprises the steps of carrying out a first treatment on the surface of the P and->Make up input pair->Is a label.
3. The full-color sharpening method based on cross-modal feature attention interaction of claim 1, wherein the step (2) is specifically as follows: high spectrum image with low resolution is obtained by utilizing an up-sampling method based on depth image prioriUpsampling to the spatial scale of the panchromatic image P, obtaining an upsampled hyperspectral image +.>The method comprises the steps of carrying out a first treatment on the surface of the The formula is as follows:
wherein,initializing a noise tensor for the random used as input; />Is neural network->Is a parameter of optimization;an energy function driven by the task; the formula is as follows:
wherein,indicating a downsampling operation.
4. A method of full color sharpening based on cross-modal feature attention interactions according to claim 1, wherein said step (3) is specifically as follows: for different input images P andrespectively designing a full-color image overlapping patch embedding module and a hyperspectral image overlapping patch embedding module, wherein the patch embedding module consists of convolution and batch normalization layers; the panchromatic image overlapping patch embedding module spatially decomposes the panchromatic image P into a plurality of mutually overlapping image patches +.>Wherein K is wide and high; and carrying out feature conversion on each patch to generate patch embedding of the package original information, wherein the formula is as follows:
wherein,representing a feature map consisting of n patch embeddings; d is the embedding dimension; />Embedding a function of a module for the panchromatic image overlay patch;
the hyperspectral image overlap patch embedding module formula is as follows:
5. a method of full color sharpening based on cross-modal feature attention interactions according to claim 1, wherein said step (4) is specifically as follows: including a token mixer and a spectrum mixer;
the token mixer is defined as follows:
wherein,a cross-modal feature representation for the fusion; />And->Representing a pixel-by-pixel convolution and a standard 3 x 3 convolution, respectively; geLU is Gaussian error linear unit; repeating the above operations twice in a token mixer to balance the overhead and performance of the model;
the spectral mixer is defined as follows:
wherein,3 x 3 convolution for depth separationThe method comprises the steps of carrying out a first treatment on the surface of the In two consecutive->A channel growth rate 16 is set in between to adequately model the relationship between adjacent spectra.
6. A method of full color sharpening based on cross-modal feature attention interactions as recited in claim 5 wherein the token mixer includes spectral-spatial interaction attention and spectral information bottlenecks; first, the spectral-spatial interaction attention is defined as follows:
wherein,is a spectral global context;
then, introducing space priori knowledge of the full-color image P to define a cross-modal feature similarity matrix
Wherein,and->Respectively by->And->Remolding to obtain; second, the spectral-spatial interaction attention is updated as follows:
Wherein,fusing spectral information from the hyperspectral image with spatial information of the panchromatic image for the updated global context; />The similarity matrix is normalized; />And->Is a learnable scalar; />And->Mean and standard deviation; />Representing the numerical stability; finally, the spectrum-space interaction attention is evolved into a plurality of groups, defined as follows:
7. the method of full color sharpening based on cross-modal feature attention interactions of claim 6, wherein the spectral information bottleneck is defined as follows: the spectral information bottleneck is defined as follows:
wherein,、/>、/>for the full link layer, for the drop rate, set to 16.
8. A method of full color sharpening based on cross-modal feature attention interactions according to claim 1, wherein said step (5) is specifically as follows: fused feature representation that highly aggregates spectra and spatial informationThrough a convolution layer, the dimensions are reconstructed and restored,
generating residual hyperspectral imagesThe formula is as follows:
wherein,representing model parameters;
finally, a fused hyperspectral image is obtained
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program when loaded to the processor implements a method of full color sharpening based on cross-modal feature attention interactions according to any one of claims 1-8.
10. A storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements a cross-modal feature attention interaction based panchromatic sharpening method according to any one of claims 1-8.
CN202410161201.3A 2024-02-05 2024-02-05 Panchromatic sharpening method based on cross-modal feature attention interaction Active CN117726548B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410161201.3A CN117726548B (en) 2024-02-05 2024-02-05 Panchromatic sharpening method based on cross-modal feature attention interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410161201.3A CN117726548B (en) 2024-02-05 2024-02-05 Panchromatic sharpening method based on cross-modal feature attention interaction

Publications (2)

Publication Number Publication Date
CN117726548A true CN117726548A (en) 2024-03-19
CN117726548B CN117726548B (en) 2024-05-10

Family

ID=90210953

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410161201.3A Active CN117726548B (en) 2024-02-05 2024-02-05 Panchromatic sharpening method based on cross-modal feature attention interaction

Country Status (1)

Country Link
CN (1) CN117726548B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115272078A (en) * 2022-08-01 2022-11-01 西安交通大学 Hyperspectral image super-resolution reconstruction method based on multi-scale space-spectrum feature learning
CN116563113A (en) * 2023-05-17 2023-08-08 东华大学 Hyper-spectral image super-resolution method based on transposed convolution long-short-term memory network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115272078A (en) * 2022-08-01 2022-11-01 西安交通大学 Hyperspectral image super-resolution reconstruction method based on multi-scale space-spectrum feature learning
CN116563113A (en) * 2023-05-17 2023-08-08 东华大学 Hyper-spectral image super-resolution method based on transposed convolution long-short-term memory network

Also Published As

Publication number Publication date
CN117726548B (en) 2024-05-10

Similar Documents

Publication Publication Date Title
CN111259905B (en) Feature fusion remote sensing image semantic segmentation method based on downsampling
Yao et al. Wave-vit: Unifying wavelet and transformers for visual representation learning
CN109102469B (en) Remote sensing image panchromatic sharpening method based on convolutional neural network
CN110163801B (en) Image super-resolution and coloring method, system and electronic equipment
Wang et al. A review of image super-resolution approaches based on deep learning and applications in remote sensing
CN108460749B (en) Rapid fusion method of hyperspectral and multispectral images
CN110942424A (en) Composite network single image super-resolution reconstruction method based on deep learning
CN113240683B (en) Attention mechanism-based lightweight semantic segmentation model construction method
CN112669248A (en) Hyperspectral and panchromatic image fusion method based on CNN and Laplacian pyramid
CN117274608B (en) Remote sensing image semantic segmentation method based on space detail perception and attention guidance
CN115240066A (en) Remote sensing image mining area greening monitoring method and system based on deep learning
CN117726548B (en) Panchromatic sharpening method based on cross-modal feature attention interaction
Li et al. Edge‐aware image outpainting with attentional generative adversarial networks
CN111274936B (en) Multispectral image ground object classification method, system, medium and terminal
CN115578638A (en) Method for constructing multi-level feature interactive defogging network based on U-Net
CN112634153B (en) Image deblurring method based on edge enhancement
CN114638761A (en) Hyperspectral image panchromatic sharpening method, device and medium
Wang et al. Image super‐resolution based on self‐similarity generative adversarial networks
Li et al. LR‐RoadNet: A long‐range context‐aware neural network for road extraction via high‐resolution remote sensing images
Jiang et al. AGP-Net: Adaptive Graph Prior Network for Image Denoising
CN113298174B (en) Semantic segmentation model improvement method based on progressive feature fusion
CN116523759B (en) Image super-resolution reconstruction method and system based on frequency decomposition and restarting mechanism
Lee et al. Fully convolutional transformer with local-global attention
Yang et al. Semantic Segmentation of Remote Sensing Image Based on Two-time Augmentation and Atrous Convolution
Synthiya Vinothini et al. Attention-Based SRGAN for Super Resolution of Satellite Images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant