CN117726548A

CN117726548A - Panchromatic sharpening method based on cross-modal feature attention interaction

Info

Publication number: CN117726548A
Application number: CN202410161201.3A
Authority: CN
Inventors: 孙倩; 孙宇; 潘成胜
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2024-02-05
Filing date: 2024-02-05
Publication date: 2024-03-19
Anticipated expiration: 2044-02-05
Also published as: CN117726548B

Abstract

The invention discloses a full-color sharpening method based on cross-modal feature attention interaction, which comprises the following steps of: (1) Acquiring and preprocessing a data set, and manufacturing a training sample image; (2) Obtaining a hyperspectral image under the space scale of the full-color image by using an up-sampling method based on depth image priori; (3) Decomposing and converting the up-sampled hyperspectral image and the full-color image by using an overlapped patch embedding module to respectively obtain spectral characteristic and spatial characteristic representation; (4) Utilizing a plurality of spectrum-space attention interaction modules to directly interact and fuse spectrum characteristics and space characteristics to obtain fused cross-modal characteristic representation; (5) Restoring dimensions by using an image reconstruction layer to obtain a hyperspectral image with high spatial resolution; aiming at high-dimensional and heterogeneous hyperspectral images and panchromatic images, the invention directly interacts hyperspectral image features and panchromatic image features, and realizes self-attention mechanism operation of image data of different modes with less cost.

Description

Panchromatic sharpening method based on cross-modal feature attention interaction

Technical Field

The invention relates to the technical field of computer vision images, in particular to a full-color sharpening method based on cross-modal feature attention interaction.

Background

The existing full-color sharpening technology can be classified into a traditional method and a deep learning-based method, wherein the traditional method comprises component replacement-based, multi-resolution analysis-based and variation optimization-based methods, is limited by characteristic representation capability and nonlinear optimization capability, and fusion images generated by the traditional method have ubiquitous spectral distortion and spatial distortion and have poorer performance compared with the deep learning-based method. In the deep learning method, the method based on the convolutional neural network is widely studied, and a 2D or 3D convolutional kernel is naturally suitable for the image field, has the advantages of parameter sharing, sliding window, induction bias and the like, but convolution only can extract local features of an image, is insensitive to global features of the image, and is difficult to achieve proper feature fusion aiming at image data (hyperspectral images and panchromatic images) of different modes. Recently, full-color sharpening methods based on self-attention mechanisms have received a lot of attention, which have the advantages of long-distance relational modeling and dynamics, and the like, and can extract global features of images. However, the self-attention mechanism cannot extract local features of the image, and the cost is high in terms of parameter quantity and computational complexity, and the cost is further increased when the self-attention mechanism is applied to a high-dimensional hyperspectral image processing task. Models based on self-attention mechanisms often require a large number of sample image learning, and it is difficult to satisfy this condition for the case where remote sensing image data sets are rare.

Disclosure of Invention

The invention aims to provide a full-color sharpening method based on cross-modal feature attention interaction, which solves the problem that the full-color sharpening method based on a neural network can only extract a single local or full-image feature of an image; the method aims at solving the problems of low operation efficiency and high cost overhead of a self-attention mechanism aiming at high-dimensional heterogeneous hyperspectral images and full-color images.

The technical scheme is as follows: the invention discloses a full-color sharpening method based on cross-modal feature attention interaction, which comprises the following steps of:

(1) Acquiring and preprocessing a data set, and manufacturing a training sample image;

(2) Up-sampling the input low-resolution hyperspectral image by using an up-sampling method based on depth image prior to obtain a hyperspectral image under the space scale of the full-color image;

(3) Decomposing and converting the up-sampled hyperspectral image and the full-color image by using an overlapped patch embedding module to respectively obtain spectral characteristic and spatial characteristic representation;

(4) Utilizing a plurality of spectrum-space attention interaction modules to directly interact and fuse spectrum characteristics and space characteristics to obtain fused cross-modal characteristic representation;

(5) And restoring dimensions by using an image reconstruction layer to obtain a hyperspectral image with high spatial resolution.

Further, the step (1) specifically comprises the following steps: acquiring a data set for training, cropping hyperspectral images in the data set, generating a plurality of image patches as tag hyperspectral imagesWherein->Is the number of wide, high and wave bands; for->Is subjected to an averaging pooling in the visible light band to produce a full-color image +.>The method comprises the steps of carrying out a first treatment on the surface of the Using Gaussian filter pairs->Spatially blurring and downsampling it by a factor of N to generate a low resolution hyperspectral imageThe method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>，/>The method comprises the steps of carrying out a first treatment on the surface of the P and->Make up input pair->Is a label.

Further, the step (2) specifically includes the following steps: high spectrum image with low resolution is obtained by utilizing an up-sampling method based on depth image prioriUpsampling to the P spatial scale of the panchromatic image to obtain an upsampled hyperspectral imageThe method comprises the steps of carrying out a first treatment on the surface of the The formula is as follows:

；

wherein,initializing a noise tensor for the random used as input; />Is neural network->Is a parameter of optimization; />An energy function driven by the task; the formula is as follows:

；

wherein,indicating a downsampling operation.

Further, the step (3) specifically includes the following steps: for different input images P andrespectively designing a full-color image overlapping patch embedding module and a hyperspectral image overlapping patch embedding module, wherein the patch embedding module consists of convolution and batch normalization layers; the panchromatic image overlapping patch embedding module spatially decomposes the panchromatic image P into a plurality of mutually overlapping image patches +.>Wherein K is wide and high; and carrying out feature conversion on each patch to generate patch embedding of the package original information, wherein the formula is as follows:

；

wherein,representing a feature map consisting of n patch embeddings; d is the embedding dimension; />Embedding a function of a module for the panchromatic image overlay patch;

the hyperspectral image overlap patch embedding module formula is as follows:

。

further, the step (4) specifically includes the following steps: including a token mixer and a spectrum mixer;

the token mixer is defined as follows:

；

wherein,a cross-modal feature representation for the fusion; />And->Representing a pixel-by-pixel convolution and a standard 3 x 3 convolution, respectively; geLU is Gaussian error linear unit; repeating the above operations twice in a token mixer to balance the overhead and performance of the model;

the spectral mixer is defined as follows:

；

wherein,3 x 3 convolutions separable for depth; in two consecutive->A channel growth rate 16 is set in between to adequately model the relationship between adjacent spectra.

Further, the token mixer comprises spectrum-space interaction attention and spectrum information bottlenecks; first, the spectral-spatial interaction attention is defined as follows:

；

wherein,is a spectral global context;

then, introducing space priori knowledge of the full-color image P to define a cross-modal feature similarity matrix：

；

Wherein,and->Respectively by->And->Remolding to obtain; second, the spectral-spatial interaction attention is updated as follows:

；

wherein,fusing spectral information from the hyperspectral image with spatial information of the panchromatic image for the updated global context; />The similarity matrix is normalized; />And->Is a learnable scalar; />Andmean and standard deviation; />Representation ofNumerical stability; finally, the spectrum-space interaction attention is evolved into a plurality of groups, defined as follows:

。

further, the spectral information bottleneck is defined as follows: the spectral information bottleneck is defined as follows:

；

wherein,、/>、/>for the full link layer, for the drop rate, set to 16.

Further, the step (5) specifically comprises the following steps: fused feature representation that highly aggregates spectra and spatial informationThrough a convolution layer, the dimensions are reconstructed and restored,

generating residual hyperspectral imagesThe formula is as follows:

；

wherein,representing model parameters;

finally, a fused hyperspectral image is obtained：

。

An electronic device according to the present invention includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program implementing any one of the methods of full color sharpening based on cross-modal feature attention interactions when loaded into the processor.

A storage medium according to the present invention stores a computer program which, when executed by a processor, implements a cross-modal feature attention interaction-based panchromatic sharpening method according to any one of the claims.

The beneficial effects are that: compared with the prior art, the invention has the following remarkable advantages: the model of the invention is formed by stacking convolution layers completely, but combines the advantages of convolution and self-attention mechanisms. The convolution is effectively combined with the self-attention, so that the fitting speed of a model is increased, and the dependence of a self-attention mechanism on a large-scale training data set is reduced; aiming at high-dimensional and heterogeneous hyperspectral images and panchromatic images, hyperspectral image features and panchromatic image features are directly interacted, and self-attention mechanism operation of image data of different modes is realized with less cost.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a diagram of the overall architecture of the AIDB-Net of the present invention;

fig. 3 is a diagram showing the internal constitution of the SSIA of the present invention.

Detailed Description

The technical scheme of the invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, an embodiment of the present invention provides a full-color sharpening method based on cross-modal feature attention interaction, including the following steps:

(1) Acquiring and preprocessing a data set, and manufacturing a training sample image; the method comprises the following steps: acquiring a data set for training, cutting hyperspectral images in the data set, generating a plurality of image patches, and performingIs a label hyperspectral imageWherein->Is the number of wide, high and wave bands; for->Is subjected to average pooling in the visible light wave band to generate a full-color imageThe method comprises the steps of carrying out a first treatment on the surface of the Using Gaussian filter pairs->Spatially blurring and downsampling it by a factor of N to generate a hyperspectral image of low resolution +.>The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>，/>The method comprises the steps of carrying out a first treatment on the surface of the P and->Make up input pair->Is a label.

(2) Up-sampling the input low-resolution hyperspectral image by using an up-sampling method based on depth image prior to obtain a hyperspectral image under the space scale of the full-color image; the method comprises the following steps: high spectrum image with low resolution is obtained by utilizing an up-sampling method based on depth image prioriUpsampling to the P spatial scale of the panchromatic image to obtain an upsampled hyperspectral imageThe method comprises the steps of carrying out a first treatment on the surface of the The formula is as follows:

；

wherein,indicating a downsampling operation.

(3) As shown in fig. 2, the up-sampled hyperspectral image and panchromatic image are decomposed and converted using an overlap patch embedding module to obtain spectral and spatial feature representations, respectively; the method comprises the following steps: for different input images P andrespectively designing a full-color image overlapping patch embedding module and a hyperspectral image overlapping patch embedding module, wherein the patch embedding module consists of convolution and batch normalization layers; the panchromatic image overlapping patch embedding module spatially decomposes the panchromatic image P into a plurality of mutually overlapping image patches +.>Wherein K is wide and high; feature conversion is performed on each patch to generate patch embedding of package original informationThe formula is as follows:

；

the hyperspectral image overlap patch embedding module formula is as follows:

。

(4) As shown in fig. 3, a plurality of spectrum-space attention interaction modules are utilized to directly interact and fuse spectrum features and space features to obtain fused cross-modal feature representation; the method comprises the following steps: including a token mixer and a spectrum mixer;

the token mixer is defined as follows:

；

the token mixer comprises spectrum-space interaction attention and spectrum information bottlenecks; first, the spectral-spatial interaction attention is defined as follows:

；

wherein,is a spectral global context;

；

wherein,fusing spectral information from the hyperspectral image with spatial information of the panchromatic image for the updated global context; />The similarity matrix is normalized; />And->Is a learnable scalar; />Andmean and standard deviation; />Representing the numerical stability; finally, the spectrum-space interaction attention is evolved into a plurality of groups, defined as follows:

。

the spectral information bottleneck is defined as follows: the spectral information bottleneck is defined as follows:

；

wherein,、/>、/>for the full link layer, for the drop rate, set to 16.

The spectral mixer is defined as follows:

；

(5) And restoring dimensions by using an image reconstruction layer to obtain a hyperspectral image with high spatial resolution. The method comprises the following steps: fused feature representation that highly aggregates spectra and spatial informationThrough a convolution layer, the dimensions are reconstructed and restored,

generating residual hyperspectral imagesThe formula is as follows:

；

wherein,representing model parameters;

finally, a fused hyperspectral image is obtained：

。

The embodiment of the invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program realizes any full-color sharpening method based on cross-modal feature attention interaction when being loaded into the processor.

Embodiments of the present invention also provide a storage medium storing a computer program which, when executed by a processor, implements a cross-modal feature attention interaction-based panchromatic sharpening method as defined in any one of the claims.

Claims

1. A method of full color sharpening based on cross-modal feature attention interactions, comprising the steps of:

2. The full-color sharpening method based on cross-modal feature attention interaction of claim 1, wherein the step (1) is specifically as follows: acquiring a data set for training, cropping hyperspectral images in the data set, generating a plurality of image patches as tag hyperspectral imagesWherein->Is the number of wide, high and wave bands; for a pair ofIs subjected to an averaging pooling in the visible light band to produce a full-color image +.>The method comprises the steps of carrying out a first treatment on the surface of the Using Gaussian filter pairs->Proceeding withSpatially blurring and downsampling it by a factor of N, generating a hyperspectral image of low resolution +.>The method comprises the steps of carrying out a first treatment on the surface of the Wherein,，/>the method comprises the steps of carrying out a first treatment on the surface of the P and->Make up input pair->Is a label.

3. The full-color sharpening method based on cross-modal feature attention interaction of claim 1, wherein the step (2) is specifically as follows: high spectrum image with low resolution is obtained by utilizing an up-sampling method based on depth image prioriUpsampling to the spatial scale of the panchromatic image P, obtaining an upsampled hyperspectral image +.>The method comprises the steps of carrying out a first treatment on the surface of the The formula is as follows:

；

wherein,initializing a noise tensor for the random used as input; />Is neural network->Is a parameter of optimization;an energy function driven by the task; the formula is as follows:

；

wherein,indicating a downsampling operation.

4. A method of full color sharpening based on cross-modal feature attention interactions according to claim 1, wherein said step (3) is specifically as follows: for different input images P andrespectively designing a full-color image overlapping patch embedding module and a hyperspectral image overlapping patch embedding module, wherein the patch embedding module consists of convolution and batch normalization layers; the panchromatic image overlapping patch embedding module spatially decomposes the panchromatic image P into a plurality of mutually overlapping image patches +.>Wherein K is wide and high; and carrying out feature conversion on each patch to generate patch embedding of the package original information, wherein the formula is as follows:

；

the hyperspectral image overlap patch embedding module formula is as follows:

。

5. a method of full color sharpening based on cross-modal feature attention interactions according to claim 1, wherein said step (4) is specifically as follows: including a token mixer and a spectrum mixer;

the token mixer is defined as follows:

；

the spectral mixer is defined as follows:

；

wherein,3 x 3 convolution for depth separationThe method comprises the steps of carrying out a first treatment on the surface of the In two consecutive->A channel growth rate 16 is set in between to adequately model the relationship between adjacent spectra.

6. A method of full color sharpening based on cross-modal feature attention interactions as recited in claim 5 wherein the token mixer includes spectral-spatial interaction attention and spectral information bottlenecks; first, the spectral-spatial interaction attention is defined as follows:

；

wherein,is a spectral global context;

；

Wherein,and->Respectively by->And->Remolding to obtain; second, the spectral-spatial interaction attention is updated as follows：

；

Wherein,fusing spectral information from the hyperspectral image with spatial information of the panchromatic image for the updated global context; />The similarity matrix is normalized; />And->Is a learnable scalar; />And->Mean and standard deviation; />Representing the numerical stability; finally, the spectrum-space interaction attention is evolved into a plurality of groups, defined as follows:

。

7. the method of full color sharpening based on cross-modal feature attention interactions of claim 6, wherein the spectral information bottleneck is defined as follows: the spectral information bottleneck is defined as follows:

；

wherein,、/>、/>for the full link layer, for the drop rate, set to 16.

8. A method of full color sharpening based on cross-modal feature attention interactions according to claim 1, wherein said step (5) is specifically as follows: fused feature representation that highly aggregates spectra and spatial informationThrough a convolution layer, the dimensions are reconstructed and restored,

generating residual hyperspectral imagesThe formula is as follows:

；

wherein,representing model parameters;

finally, a fused hyperspectral image is obtained：

。

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program when loaded to the processor implements a method of full color sharpening based on cross-modal feature attention interactions according to any one of claims 1-8.

10. A storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements a cross-modal feature attention interaction based panchromatic sharpening method according to any one of claims 1-8.