WO2024082796A1 - Spectral cross-domain transfer super-resolution reconstruction method for multi-domain image - Google Patents

Spectral cross-domain transfer super-resolution reconstruction method for multi-domain image Download PDF

Info

Publication number
WO2024082796A1
WO2024082796A1 PCT/CN2023/113283 CN2023113283W WO2024082796A1 WO 2024082796 A1 WO2024082796 A1 WO 2024082796A1 CN 2023113283 W CN2023113283 W CN 2023113283W WO 2024082796 A1 WO2024082796 A1 WO 2024082796A1
Authority
WO
WIPO (PCT)
Prior art keywords
domain
spectral
image
model
cross
Prior art date
Application number
PCT/CN2023/113283
Other languages
French (fr)
Chinese (zh)
Inventor
张艳宁
张磊
魏巍
任维鑫
王昊宇
Original Assignee
西北工业大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西北工业大学 filed Critical 西北工业大学
Publication of WO2024082796A1 publication Critical patent/WO2024082796A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0985Hyperparameter optimisation; Meta-learning; Learning-to-learn
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/10Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture

Definitions

  • the present invention belongs to the technical field of image processing, and in particular relates to a spectral cross-domain migration super-resolution reconstruction method.
  • Hyperspectral images are images that collect dozens or even hundreds of continuous spectral bands for each pixel in the visible and infrared spectrum. Compared with traditional RGB images, hyperspectral images provide richer spectral information and can identify the spectral characteristics of materials, thereby conducting a more detailed analysis of surface materials.
  • Hyperspectral images can be applied to environmental remote sensing, agriculture, forestry, geological exploration, urban planning and other fields. For example, in the agricultural field, hyperspectral images can be used to quickly identify, classify, monitor and manage crops, thereby improving crop yield and quality. In environmental monitoring, hyperspectral images can be used to identify and monitor harmful substances in water bodies, as well as to monitor vegetation coverage and land use changes. In the field of urban planning, hyperspectral images can be used to measure urban green space coverage and building heights, optimize urban planning and facility layout, etc.
  • hyperspectral images as images with rich spectral information, have broad application prospects.
  • hyperspectral images are not as widely used as general cameras.
  • the existing spectral super-resolution methods can be roughly divided into two categories.
  • One is the traditional method, such as (1) spectral super-resolution method based on spectral decomposition:
  • This method uses the spectral decomposition algorithm to decompose and reconstruct the spectral signal, thereby achieving spectral super-resolution.
  • the super-resolution spectral imaging technology "Coupled Nonnegative Matrix Factorization Unmixing for Hyperspectral and Multispectral Data Fusion" based on the non-negative matrix factorization (NMF) algorithm can decompose and reconstruct the spectral signal, thereby achieving spectral super-resolution imaging.
  • NMF non-negative matrix factorization
  • Spectral super-resolution method based on sparse representation This method uses a sparse representation algorithm, such as an algorithm based on dictionary learning, to decompose and reconstruct the spectral signal, thereby achieving spectral super-resolution.
  • a sparse representation algorithm such as an algorithm based on dictionary learning
  • the super-resolution spectral imaging technology "Spectral Reflectance Recovery from a Single RGB Image” based on the sparse representation algorithm can sparsely represent and reconstruct the spectral signal, thereby achieving spectral super-resolution imaging.
  • Spectral super-resolution method based on spectral library and model This method uses the spectral library and model to train and optimize the spectral signal model, thereby achieving spectral super-resolution.
  • Super-resolution spectral imaging technology can model and predict spectral signals, thereby achieving super-resolution spectral imaging. These traditional methods often have the problems of slow computing speed and poor reconstruction effect.
  • Another method is based on deep learning. This method uses deep learning networks, such as convolutional neural networks (CNN) "Pixel-aware Deep Function-mixture Network for Spectral Super-Resolution", Transformer “MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction", etc., to train and learn spectral signals, thereby achieving super-resolution of spectra.
  • CNN convolutional neural networks
  • MST++ Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction
  • the present invention provides a spectral cross-domain migration super-resolution reconstruction method for multi-domain images, through an image spectral cross-domain migration super-resolution reconstruction method based on cross-domain transferable knowledge learning and target domain rapid adaptation learning for multi-domain image scenes.
  • Spectral super-resolution reconstruction from RGB images to hyperspectral images is achieved.
  • a model structure design based on a transferable dictionary is adopted to learn features that can be transferred across domains; a source domain pre-training strategy based on a shared learnable mask is used to promote the model to learn general knowledge for reconstruction; a fine-tuning method based on model-agnostic meta-learning is used to learn a general model with strong generalization ability, so that it can adapt to the data of the target domain of the test after a few iterations on the test data.
  • the present invention can mine cross-domain shared knowledge to improve generalization ability, thereby improving the effect of cross-domain spectral super-resolution reconstruction.
  • Step 1 For RGB image Among them, h and w represent the height and width of the image, which are marked as hsi represents the hyperspectral image corresponding to the input image img;
  • embedding( ⁇ ) represents the embedding layer, which is instantiated by a convolutional layer with a convolution kernel size of 3 and a convolution stride of 1, and e represents the hidden layer features after embedding;
  • Step 2 Randomly mask the hidden layer feature e obtained in step 1 in the form of a cube, randomly sample a cube of fixed size on the image, and then replace the feature at that position with a shared learnable mask;
  • SpectralTransformerBlock( ⁇ ) represents the inter-spectral Transformer module, s represents the hidden layer features obtained by the inter-spectral Transformer module; multiple inter-spectral Transformer modules are stacked in the spectral reconstruction model, and the output of the previous module is the input of the next module;
  • ⁇ i is a learnable scaling factor
  • WQ , WK , WV are learnable projection matrices
  • CrossAttention(S, Z) attention(SW Q , ZW K , ZW V ), which stands for cross attention
  • map( ⁇ ) is a mapper, which maps the information of s to the latent space. It is instantiated by a multi-layer fully connected neural network and follows the information bottleneck structure in structure;
  • Generator( ⁇ ) represents the generator network, which receives the vector randomVector randomly sampled from the Gaussian distribution and the domain information of the image itself obtained from s through map and generates a transferable dictionary;
  • hsi′ represents the reconstructed hyperspectral image
  • the training process of the spectral reconstruction model is as follows:
  • a model-agnostic meta-learning fine-tuning method is used to learn a highly universal model parameter, so that the spectral reconstruction model can adapt to the characteristics of any domain after a few steps of self-supervised fine-tuning on the image of any domain.
  • the specific algorithm is as follows:
  • the specific method is: sample N tasks in the data set Each Task consists of K data pairs.
  • the self-supervised loss is calculated on K examples. And update the model parameters of the inner layer according to the gradient:
  • the supervised loss is calculated using the updated model parameters ⁇ ′ i And update the model parameters of the outer layer according to the gradient:
  • d( ⁇ ) represents the spectral response function from the hyperspectral image to the RGB image
  • f ⁇ ( ⁇ ) represents the neural network with parameter ⁇
  • x is the input image.
  • mse stands for mean square loss.
  • the present invention uses cube-level mask operations to force the model to learn the interactions between spectra and space. This knowledge of interactions is shared among all spectral reconstructions. And a generator is used to generate transferable patches, which are added to the spectral reconstruction as cross-domain shared knowledge.
  • the above design ensures that cross-domain shared knowledge can be mined from the perspective of the model to improve the generalization ability, thereby improving the effect of cross-domain spectral super-resolution reconstruction.
  • the model-agnostic meta-learning method does not directly learn the reconstruction from a single RGB image to a hyperspectral image, but learns that any image can be quickly adapted to the target image within a few iterations.
  • the model parameters with strong generalization in the target domain further enhance the versatility of the model.
  • the model has the ability of cross-domain migration and super-resolution reconstruction.
  • FIG1 is a schematic diagram of the model structure of a transferable dictionary.
  • It includes a model structure design based on a transferable dictionary to learn features that can be transferred across domains; a source domain pre-training strategy based on shared learnable masks to promote the model to learn general knowledge for reconstruction; and a fine-tuning method based on model-agnostic meta-learning to learn a general model with strong generalization ability, so that it can adapt to the data of the target domain of the test after a few iterations on the test data.
  • An image spectrum cross-domain migration super-resolution reconstruction method for multi-domain image scenes based on cross-domain transferable knowledge learning and target domain fast adaptation learning method includes the following aspects and steps:
  • Step 1 For RGB image Among them, h and w represent the height and width of the image, which are marked as Represents the hyperspectral image corresponding to the input image img.
  • the image is input to the encoding layer, and the number of channels of the input image is mapped from 3 to 31 to achieve preliminary spectral reconstruction and alignment.
  • e embedding(img)
  • embedding( ⁇ ) represents the embedding layer, which is instantiated by a convolutional layer with a kernel size of 3 and a convolution stride of 1
  • e represents the hidden layer features after embedding.
  • Step 2 Randomly mask the hidden features obtained in step 1 in the form of cubes, randomly sample cubes of fixed size on the image, and then replace the features at that position with a shared learnable mask.
  • SpectralTransformerBlock( ⁇ ) represents the inter-spectral Transformer module
  • s represents the hidden layer features obtained by the inter-spectral Transformer module. Note that we stack multiple modules in the model, and the output of the previous module is the input of the next module.
  • ⁇ i is a learnable scaling factor
  • WQ , WK , WV are learnable projection matrices
  • It is the input tensor, which is obtained by rearranging the shape of the feature tensor of the input image.
  • CrossAttention(S, Z) attention(SW Q , ZW K , ZW V ), which stands for cross attention
  • map( ⁇ ) is a mapper, which maps the information of s to the latent space. It is instantiated by a multi-layer fully connected neural network and follows the information bottleneck structure.
  • Generator( ⁇ ) represents the generator network, which receives the vector randomVector randomly sampled from the Gaussian distribution and the domain information of the image itself obtained from s through map and generates a transferable dictionary.
  • Step 5 Use the inter-spectral Transformer module mentioned in step 3 again to refine the features and finally obtain the reconstructed hyperspectral image, which can be expressed as:
  • hsi′ represents the reconstructed hyperspectral image
  • d( ⁇ ) represents the spectral response function from the hyperspectral image to the RGB image
  • f ⁇ ( ⁇ ) represents the neural network with parameter ⁇
  • x is the input image.
  • mse stands for mean square loss.
  • the present invention provides an image spectrum cross-domain migration super-resolution reconstruction method based on cross-domain transferable knowledge learning and target domain fast adaptation learning for multi-domain image scenes.
  • the specific process is as follows:
  • RGB images and hyperspectral image pairs ⁇ img i , hsi i ⁇ , for a given test set hsi i may not exist.
  • both RGB data and hyperspectral data are normalized to the range of [0, 1].
  • random cropping, random horizontal flipping, and random vertical flipping are used for the input image img i and its corresponding hsi i to enhance the generalization ability of the model.
  • the support set is used for the self-supervised loss of the inner layer.
  • the query set is used for the supervised loss of the outer layer Calculation.
  • the initial weights of the model are loaded with the pre-trained weights obtained by the preliminary pre-training based on random mini-batches, and the mask structure is removed to ensure that the model calculation graph of the meta-learning optimization process is as consistent as possible with the fine-tuning and inference during testing.
  • ⁇ ′i the iterative update of ⁇ ′i will last for p steps. If p is too small, it will be difficult to iterate to a solution that adapts to the task. If p is too large, the local parameters will be overfitted to the task. Therefore, p is generally taken as 10.
  • the learning rates ⁇ and ⁇ for the inner and outer layers are 1e-5 and 1e-6 respectively.
  • the trained model can be used to perform spectral super-resolution reconstruction from RGB to hyperspectral images.
  • the specific algorithm flow is shown in Algorithm 2:
  • the final algorithm output hsi′ is the hyperspectral image reconstructed from the input RGB image img.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

A spectral cross-domain transfer super-resolution reconstruction method for a multi-domain image. By means of a spectral image cross-domain transfer super-resolution reconstruction method, which is based on cross-domain transferable knowledge learning and rapid target-domain adaptation learning, for a multi-domain image scenario, spectral super-resolution reconstruction from an RGB image to a hyperspectral image is realized. A model structure design based on a transferable dictionary is used to learn cross-domain transferable features; a source-domain pre-training policy based on a shared learnable mask is used to facilitate a model in learning general knowledge for reconstruction; and a model-agnostic meta-learning fine-tuning method is used to learn a universal model with a strong generalization capability, such that the model can adapt to data of a target domain of a test by means of several iterations of test data. The method can mine cross-domain shared knowledge to improve the generalization capability, thereby improving the effect of spectral cross-domain super-resolution reconstruction.

Description

一种面向多域图像的光谱跨域迁移超分重建方法A spectral cross-domain migration super-resolution reconstruction method for multi-domain images 技术领域Technical Field
本发明属于图像处理技术领域,具体涉及一种光谱跨域迁移超分重建方法。The present invention belongs to the technical field of image processing, and in particular relates to a spectral cross-domain migration super-resolution reconstruction method.
背景技术Background technique
高光谱图像是指在可见光和红外线光谱范围内,对每个像素采集数十甚至上百个连续光谱带的图像。与传统的RGB图像相比,高光谱图像提供了更丰富的光谱信息,能够识别材料的光谱特征,从而对地表物质进行更为细致的分析。Hyperspectral images are images that collect dozens or even hundreds of continuous spectral bands for each pixel in the visible and infrared spectrum. Compared with traditional RGB images, hyperspectral images provide richer spectral information and can identify the spectral characteristics of materials, thereby conducting a more detailed analysis of surface materials.
高光谱图像可以应用于环境遥感、农业、林业、地质勘探、城市规划等领域。例如,在农业领域,利用高光谱图像可以对农作物进行快速识别、分类、监测和管理,提高农作物产量和质量。在环境监测中,高光谱图像可以用于识别和监测水体中的有害物质,以及监测植被覆盖度和土地利用变化情况。在城市规划领域,高光谱图像可以用于城市绿地覆盖度和建筑物高度的测量,优化城市规划与设施布局等。Hyperspectral images can be applied to environmental remote sensing, agriculture, forestry, geological exploration, urban planning and other fields. For example, in the agricultural field, hyperspectral images can be used to quickly identify, classify, monitor and manage crops, thereby improving crop yield and quality. In environmental monitoring, hyperspectral images can be used to identify and monitor harmful substances in water bodies, as well as to monitor vegetation coverage and land use changes. In the field of urban planning, hyperspectral images can be used to measure urban green space coverage and building heights, optimize urban planning and facility layout, etc.
综上所述,高光谱图像作为一种具有丰富光谱信息的图像,具有广泛的应用前景。In summary, hyperspectral images, as images with rich spectral information, have broad application prospects.
但是受限于高光谱相机的价格昂贵、成像速度慢、体积大等原因,高光谱图像并未如一般相机那样被广泛的应用。为了充分利用高光谱图像的优势,同时规避高光谱成像设备的问题,研究者提出了光谱超分的方法,旨在利用传统的RGB图像来估计和重建高光谱图像。However, due to the high price, slow imaging speed and large size of hyperspectral cameras, hyperspectral images are not as widely used as general cameras. In order to fully utilize the advantages of hyperspectral images and circumvent the problems of hyperspectral imaging equipment, researchers proposed a spectral super-resolution method, which aims to use traditional RGB images to estimate and reconstruct hyperspectral images.
根据重建方式,现有的光谱超分方法大致可分为两类。一种是传统的方法,如(1)基于光谱分解的光谱超分辨率方法:这种方法利用波谱分解算法对光谱信号进行分解和重构,从而实现光谱的超分辨。例如,基于非负矩阵分解(NMF)算法的超分辨光谱成像技术“Coupled Nonnegative Matrix Factorization Unmixing for Hyperspectral and Multispectral Data Fusion”可以对光谱信号进行分解和重构,从而实现光谱的超分辨成像。(2)基于稀疏表示的光谱超分辨率方法:这种方法利用稀疏表示算法,如基于字典学习的算法,对光谱信号进行分解和重构,从而实现光谱的超分辨。例如,基于稀疏表示算法的超分辨光谱成像技术“Spectral Reflectance Recovery from a Single RGB Image”可以对光谱信号进行稀疏表示和重构,从而实现光谱的超分辨成像。(3)基于光谱库和模型的光谱超分辨率方法:这种方法利用光谱库和模型,对光谱信号进行模型训练和优化,从而实现光谱的超分辨。例如,基于偏最小二乘回归(PLSR)算法的 超分辨光谱成像技术可以对光谱信号进行建模和预测,从而实现光谱的超分辨成像。这些基于传统的方法往往有着运算速度慢,重建效果差的问题。另外一种就是基于深度学习的方法,这种方法利用深度学习网络,如卷积神经网络(CNN)“Pixel-aware Deep Function-mixture Network for Spectral Super-Resolution”、Transformer“MST++:Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction”等,对光谱信号进行训练和学习,从而实现光谱的超分辨。虽然基于深度学习的方法在近些年取得了巨大的发展,并且在单个数据集上取得了优秀的性能,但是基于深度学习的方法在训练集之外的场景中测试时,性能会严重下降。According to the reconstruction method, the existing spectral super-resolution methods can be roughly divided into two categories. One is the traditional method, such as (1) spectral super-resolution method based on spectral decomposition: This method uses the spectral decomposition algorithm to decompose and reconstruct the spectral signal, thereby achieving spectral super-resolution. For example, the super-resolution spectral imaging technology "Coupled Nonnegative Matrix Factorization Unmixing for Hyperspectral and Multispectral Data Fusion" based on the non-negative matrix factorization (NMF) algorithm can decompose and reconstruct the spectral signal, thereby achieving spectral super-resolution imaging. (2) Spectral super-resolution method based on sparse representation: This method uses a sparse representation algorithm, such as an algorithm based on dictionary learning, to decompose and reconstruct the spectral signal, thereby achieving spectral super-resolution. For example, the super-resolution spectral imaging technology "Spectral Reflectance Recovery from a Single RGB Image" based on the sparse representation algorithm can sparsely represent and reconstruct the spectral signal, thereby achieving spectral super-resolution imaging. (3) Spectral super-resolution method based on spectral library and model: This method uses the spectral library and model to train and optimize the spectral signal model, thereby achieving spectral super-resolution. For example, based on the partial least squares regression (PLSR) algorithm Super-resolution spectral imaging technology can model and predict spectral signals, thereby achieving super-resolution spectral imaging. These traditional methods often have the problems of slow computing speed and poor reconstruction effect. Another method is based on deep learning. This method uses deep learning networks, such as convolutional neural networks (CNN) "Pixel-aware Deep Function-mixture Network for Spectral Super-Resolution", Transformer "MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction", etc., to train and learn spectral signals, thereby achieving super-resolution of spectra. Although deep learning-based methods have made great progress in recent years and have achieved excellent performance on a single data set, the performance of deep learning-based methods will be severely degraded when tested in scenarios outside the training set.
发明内容Summary of the invention
为了克服现有技术的不足,本发明提供了一种面向多域图像的光谱跨域迁移超分重建方法,通过面向多域图像场景的基于跨域可迁移知识学习和目标域快速适应学习方式的图像光谱跨域迁移超分重建方法。实现从RGB图像到高光谱图像的光谱超分重建。并采用基于可迁移字典的模型结构设计,学习可以跨域迁移的特征;基于共享可学习掩码的源域预训练策略,促进模型学习用于重建的通用知识;基于模型不可知的元学习的微调方法,用以学习一个通用的、泛化能力强的模型,使得在测试数据上经过几步迭代就能适应测试的目标域的数据。本发明可以挖掘出跨域共享的知识,以提高泛化能力,进而提高了跨域光谱超分重建的效果。In order to overcome the shortcomings of the prior art, the present invention provides a spectral cross-domain migration super-resolution reconstruction method for multi-domain images, through an image spectral cross-domain migration super-resolution reconstruction method based on cross-domain transferable knowledge learning and target domain rapid adaptation learning for multi-domain image scenes. Spectral super-resolution reconstruction from RGB images to hyperspectral images is achieved. A model structure design based on a transferable dictionary is adopted to learn features that can be transferred across domains; a source domain pre-training strategy based on a shared learnable mask is used to promote the model to learn general knowledge for reconstruction; a fine-tuning method based on model-agnostic meta-learning is used to learn a general model with strong generalization ability, so that it can adapt to the data of the target domain of the test after a few iterations on the test data. The present invention can mine cross-domain shared knowledge to improve generalization ability, thereby improving the effect of cross-domain spectral super-resolution reconstruction.
本发明解决其技术问题所采用的技术方案包括如下步骤:The technical solution adopted by the present invention to solve the technical problem includes the following steps:
步骤1:对于RGB图像其中,h和w分别表示图像的高度和宽度,其标注为hsi表示输入图像img对应的高光谱图像;Step 1: For RGB image Among them, h and w represent the height and width of the image, which are marked as hsi represents the hyperspectral image corresponding to the input image img;
将图像输入到编码层,将输入图像的通道数从3映射到31,实现初步的光谱重建与对齐;
e=embedding(img)
The image is input to the encoding layer, and the number of channels of the input image is mapped from 3 to 31 to achieve preliminary spectral reconstruction and alignment;
e = embedding(img)
其中,embedding(·)表示嵌入层,它由卷积核尺寸为3,卷积步长为1的卷积层实例化,e表示嵌入之后的隐藏层特征;Where embedding(·) represents the embedding layer, which is instantiated by a convolutional layer with a convolution kernel size of 3 and a convolution stride of 1, and e represents the hidden layer features after embedding;
步骤2:对步骤1得到的隐藏层特征e以cube的形式进行随机掩码,在图像上随机采样固定尺寸大小的cube,然后将该位置的特征置换为一个共享可学习掩码; Step 2: Randomly mask the hidden layer feature e obtained in step 1 in the form of a cube, randomly sample a cube of fixed size on the image, and then replace the feature at that position with a shared learnable mask;
步骤3:对步骤1得到的隐藏特征e,使用基于光谱间注意力模块进行细化,表述为:
s=SpectralTransformerBlock(e)
Step 3: The hidden feature e obtained in step 1 is refined using the inter-spectral attention module, which can be expressed as:
s=SpectralTransformerBlock(e)
其中,SpectralTransformerBlock(·)表示光谱间Transformer模块,s表示通过光谱间Transformer模块获得的隐层特征;在光谱重建模型中堆叠了多个光谱间Transformer模块,前一个模块的输出为后一个模块的输入;其中SpectralTransformerBlock(·)由SpectralAttention和FFN以及LayerNorm组成:
SpectralTransformerBlock(x)=t+(FFN(LayerNorm(t)))
Among them, SpectralTransformerBlock(·) represents the inter-spectral Transformer module, s represents the hidden layer features obtained by the inter-spectral Transformer module; multiple inter-spectral Transformer modules are stacked in the spectral reconstruction model, and the output of the previous module is the input of the next module; SpectralTransformerBlock(·) consists of SpectralAttention, FFN and LayerNorm:
SpectralTransformerBlock(x)=t+(FFN(LayerNorm(t)))
其中t=(x+SpectralAttention(LayerNorm(x))),LayerNorm表示层归一化操作,FFN(x)=(conv(gelu(conv(gelu(conv(x)))))),其中conv表示卷积层,gelu是非线性激活函数,x代表一个输入的张量;
attention(Q,K,V)=softmax(σiQKT)V
SpectralAttention(X)=attention(XWQ,XWK,XWV)
Where t = (x + SpectralAttention (LayerNorm (x))), LayerNorm represents the layer normalization operation, FFN (x) = (conv (gelu (conv (gelu (conv (x)))))), where conv represents the convolution layer, gelu is a nonlinear activation function, and x represents an input tensor;
attention(Q,K,V)=softmax(σ i QK T )V
SpectralAttention(X)=attention(XW Q ,XW K ,XW V )
其中σi是可学习的缩放因子,WQ,WK,WV是可学习的投影矩阵,是输入的张量,是将输入图像的特征张量进行重新排列形状得到的;where σi is a learnable scaling factor, WQ , WK , WV are learnable projection matrices, is the input tensor, which is obtained by rearranging the shape of the feature tensor of the input image;
步骤4:使用一个由多层全连接神经网络实例化的生成器网络生成可迁移字典,然后将隐层特征s按空间维度切分为一定大小的特征块,并使用交叉注意力机制交互生成器生成的可迁移字典与隐层特征s的特征块,以将跨域共享的知识注入到特征图中,形式化为:
z=Generator(randomVector+map(s))
c=CrossAttention(s,z)
Step 4: Use a generator network instantiated by a multi-layer fully connected neural network to generate a transferable dictionary, then divide the hidden layer feature s into feature blocks of a certain size according to the spatial dimension, and use the cross-attention mechanism to interact with the transferable dictionary generated by the generator and the feature blocks of the hidden layer feature s to inject cross-domain shared knowledge into the feature map, formalized as:
z=Generator(randomVector+map(s))
c = CrossAttention(s, z)
其中,CrossAttention(S,Z)=attention(SWQ,ZWK,ZWV),代表交叉注意力,map(·)是一个映射器,其作用是将s的信息映射到隐空间,它由多层全连接神经网络实例化,并且结构上遵循信息瓶颈结构;Generator(·)表示生成器网络,该网络接收从高斯分布随机采样的向量randomVector和经过map从s获得的图像本身的域信息并生成可迁移的字典; Among them, CrossAttention(S, Z) = attention(SW Q , ZW K , ZW V ), which stands for cross attention, map(·) is a mapper, which maps the information of s to the latent space. It is instantiated by a multi-layer fully connected neural network and follows the information bottleneck structure in structure; Generator(·) represents the generator network, which receives the vector randomVector randomly sampled from the Gaussian distribution and the domain information of the image itself obtained from s through map and generates a transferable dictionary;
步骤5:再次使用光谱间Transformer模块对特征进行细化,最终获得重建的高光谱图像,表示为:
hsi′=SpectralTransformerBlock(c)
Step 5: Use the inter-spectral Transformer module again to refine the features and finally obtain the reconstructed hyperspectral image, which is expressed as:
hsi′=SpectralTransformerBlock(c)
其中hsi′表示重建的高光谱图像。where hsi′ represents the reconstructed hyperspectral image.
优选地,所述光谱重建模型的训练过程如下:Preferably, the training process of the spectral reconstruction model is as follows:
采用基于模型不可知的元学习微调方法,学习一个通用性强的模型参数,使光谱重建模型在任意域图像上经过几步自监督的微调就能适应该域的特征,具体算法如下:A model-agnostic meta-learning fine-tuning method is used to learn a highly universal model parameter, so that the spectral reconstruction model can adapt to the characteristics of any domain after a few steps of self-supervised fine-tuning on the image of any domain. The specific algorithm is as follows:
首先,初始化模型参数;First, initialize the model parameters;
然后构造训练数据,其具体方法是:在数据集中采样N个Task每个Task由K个数据对构成。Then construct the training data. The specific method is: sample N tasks in the data set Each Task consists of K data pairs.
对于每个Task,在K个示例数据上计算自监督损失并根据梯度执行内层的模型参数更新:
For each task, the self-supervised loss is calculated on K examples. And update the model parameters of the inner layer according to the gradient:
对于所有Task,使用更新后的模型参数θ′i计算有监督损失并根据梯度执行外层的模型参数更新:
For all tasks, the supervised loss is calculated using the updated model parameters θ′ i And update the model parameters of the outer layer according to the gradient:
其中,表示自监督损失,其形式化为其中d(·)表示从高光谱图像到RGB图像的光谱响应函数,fθ(·)表示参数为θ的神经网络,x为输入图像。表示有监督损失,其形式化为其中mse表示均方损失。in, represents the self-supervised loss, which is formalized as where d(·) represents the spectral response function from the hyperspectral image to the RGB image, f θ (·) represents the neural network with parameter θ, and x is the input image. represents the supervised loss, which is formalized as Where mse stands for mean square loss.
本发明的有益效果如下:The beneficial effects of the present invention are as follows:
由于一般的深度学习模型过于依赖对训练数据集的记忆,过度拟合到了训练数据集上,没有充分学习到可以跨域共享的知识,本发明使用cube层面的掩码操作强制模型学习光谱间和空间上的互相作用,这种互相作用的知识是在所有光谱重建间共享的。并且使用生成器生成可迁移的patch,作为跨域共享知识加入到光谱重建中去。以上设计从模型的角度保证了可以挖掘出跨域共享的知识,以提高泛化能力,进而提高了跨域光谱超分重建的效果。而基于模型不可知的元学习方法不去直接学习从单个RGB图像到高光谱图像的重建,而是学习对于任意图像均可以在几步迭代内快速适应到目 标域的泛化性强的模型参数,更进一步增强了模型的通用性,同时搭配目标域的自监督快速微调,使得模型具备跨域迁移超分重建的能力。Since general deep learning models rely too much on the memory of the training data set and overfit to the training data set, and do not fully learn knowledge that can be shared across domains, the present invention uses cube-level mask operations to force the model to learn the interactions between spectra and space. This knowledge of interactions is shared among all spectral reconstructions. And a generator is used to generate transferable patches, which are added to the spectral reconstruction as cross-domain shared knowledge. The above design ensures that cross-domain shared knowledge can be mined from the perspective of the model to improve the generalization ability, thereby improving the effect of cross-domain spectral super-resolution reconstruction. The model-agnostic meta-learning method does not directly learn the reconstruction from a single RGB image to a hyperspectral image, but learns that any image can be quickly adapted to the target image within a few iterations. The model parameters with strong generalization in the target domain further enhance the versatility of the model. At the same time, combined with the self-supervised rapid fine-tuning in the target domain, the model has the ability of cross-domain migration and super-resolution reconstruction.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为可迁移字典的模型结构示意图。FIG1 is a schematic diagram of the model structure of a transferable dictionary.
具体实施方式Detailed ways
下面结合附图和实施例对本发明进一步说明。The present invention is further described below in conjunction with the accompanying drawings and embodiments.
如图1所示,为了重复发挥基于深度学习的光谱超分辨率方法的高性能,并缓解在训练集之外的场景测试性能严重下降的问题,我们提出了一种面向多域图像场景的基于跨域可迁移知识学习和目标域快速适应学习方式的图像光谱跨域迁移超分重建方法用于从RGB图像到高光谱图像的光谱超分重建。其包含一种基于可迁移字典的模型结构设计,用以学习可以跨域迁移的特征;包含一种基于共享可学习掩码的源域预训练策略,用以促进模型学习用于重建的通用知识;包含一种基于模型不可知的元学习的微调方法,用以学习一个通用的、泛化能力强的模型,使得在测试数据上经过几步迭代就能适应测试的目标域的数据。As shown in Figure 1, in order to repeatedly exert the high performance of the deep learning-based spectral super-resolution method and alleviate the problem of severe performance degradation in scene tests outside the training set, we proposed an image spectrum cross-domain migration super-resolution reconstruction method based on cross-domain transferable knowledge learning and target domain rapid adaptation learning for multi-domain image scenes for spectral super-resolution reconstruction from RGB images to hyperspectral images. It includes a model structure design based on a transferable dictionary to learn features that can be transferred across domains; a source domain pre-training strategy based on shared learnable masks to promote the model to learn general knowledge for reconstruction; and a fine-tuning method based on model-agnostic meta-learning to learn a general model with strong generalization ability, so that it can adapt to the data of the target domain of the test after a few iterations on the test data.
一种面向多域图像场景的基于跨域可迁移知识学习和目标域快速适应学习方式的图像光谱跨域迁移超分重建方法,包括以下方面和步骤:An image spectrum cross-domain migration super-resolution reconstruction method for multi-domain image scenes based on cross-domain transferable knowledge learning and target domain fast adaptation learning method includes the following aspects and steps:
光谱重建模型结构:Spectral reconstruction model structure:
步骤1:对于RGB图像其中,h和w分别表示图像的高度和宽度,其标注为表示输入图像img对应的高光谱图像。将图像输入到编码层,将输入图像的通道数从3映射到31,实现初步的光谱重建与对齐。
e=embedding(img)
Step 1: For RGB image Among them, h and w represent the height and width of the image, which are marked as Represents the hyperspectral image corresponding to the input image img. The image is input to the encoding layer, and the number of channels of the input image is mapped from 3 to 31 to achieve preliminary spectral reconstruction and alignment.
e = embedding(img)
其中,embedding(·)表示嵌入层,它由卷积核尺寸为3,卷积步长为1的卷积层实例化,e表示嵌入之后的隐藏层特征。Where embedding(·) represents the embedding layer, which is instantiated by a convolutional layer with a kernel size of 3 and a convolution stride of 1, and e represents the hidden layer features after embedding.
步骤2:对步骤1得到的隐层特征以cube的形式进行随机掩码,在图像上随机采样固定尺寸大小的cube,然后将该位置的特征置换为一个共享可学习掩码。Step 2: Randomly mask the hidden features obtained in step 1 in the form of cubes, randomly sample cubes of fixed size on the image, and then replace the features at that position with a shared learnable mask.
步骤3:对步骤:2得到的隐藏特征e,使用基于光谱间注意力模块进行细化。表述为:
s=SpectralTransformerBlock(e)
Step 3: Refine the hidden feature e obtained in step 2 using the inter-spectral attention module. It can be expressed as:
s=SpectralTransformerBlock(e)
其中,SpectralTransformerBlock(·)表示光谱间Transformer模块,s表示通过光谱间Transformer模块获得的隐层特征,注意在模型中我们堆叠了多个该模块,前一个模块的输出为后一个模块的输入。其中SpectralTransformerBlock(·)由SpectralAttention和FFN以及LayerNorm组成:
SpectralTransformerBlock(x)=t+(FFN(LayerNorm(i)))
Among them, SpectralTransformerBlock(·) represents the inter-spectral Transformer module, s represents the hidden layer features obtained by the inter-spectral Transformer module. Note that we stack multiple modules in the model, and the output of the previous module is the input of the next module. SpectralTransformerBlock(·) consists of SpectralAttention, FFN and LayerNorm:
SpectralTransformerBlock(x)=t+(FFN(LayerNorm(i)))
其中t=(x+SpectralAttention(LayerNorm(x))),LayerNorm表示层归一化操作,FFN(x)=(conv(gelu(conv(gelu(conv(x)))))),其中conv表示卷积层,gelu是非线性激活函数,x代表一个输入的张量。
attention(Q,K,V)=softmax(σiQKT)V
SpectralAttention(X)=attention(XWQ,XWK,XWV)
Where t = (x + SpectralAttention (LayerNorm (x))), LayerNorm represents the layer normalization operation, FFN (x) = (conv (gelu (conv (gelu (conv (x)))))), where conv represents the convolutional layer, gelu is a nonlinear activation function, and x represents an input tensor.
attention(Q,K,V)=softmax(σ i QK T )V
SpectralAttention(X)=attention(XW Q ,XW K ,XW V )
其中σi是可学习的缩放因子,WQ,WK,WV是可学习的投影矩阵,是输入的张量,是将输入图像的特征张量进行重新排列形状得到的。where σi is a learnable scaling factor, WQ , WK , WV are learnable projection matrices, It is the input tensor, which is obtained by rearranging the shape of the feature tensor of the input image.
步骤4:使用一个由多层全连接神经网络实例化的生成器网络生成可迁移字典,然后将隐层特征s按空间维度切分为一定大小的特征块,并使用交叉注意力机制交互生成器生成的可迁移字典与隐层特征s的特征块,以将跨域共享的知识注入到特征图中,形式化为:
z=Generator(randomVector+map(s))
c=CrossAttention(s,z)
Step 4: Use a generator network instantiated by a multi-layer fully connected neural network to generate a transferable dictionary, then divide the hidden layer feature s into feature blocks of a certain size according to the spatial dimension, and use the cross-attention mechanism to interact with the transferable dictionary generated by the generator and the feature blocks of the hidden layer feature s to inject cross-domain shared knowledge into the feature map, formalized as:
z=Generator(randomVector+map(s))
c = CrossAttention(s, z)
其中,CrossAttention(S,Z)=attention(SWQ,ZWK,ZWV),代表交叉注意力,map(·)是一个映射器,其作用是将s的信息映射到隐空间,它由多层全连接神经网络实例化,并且结构上遵循信息瓶颈结构;Generator(·)表示生成器网络,该网络接收从高斯分布随机采样的向量randomVector和经过map从s获得的图像本身的域信息并生成可迁移的字典;Among them, CrossAttention(S, Z) = attention(SW Q , ZW K , ZW V ), which stands for cross attention, map(·) is a mapper, which maps the information of s to the latent space. It is instantiated by a multi-layer fully connected neural network and follows the information bottleneck structure. Generator(·) represents the generator network, which receives the vector randomVector randomly sampled from the Gaussian distribution and the domain information of the image itself obtained from s through map and generates a transferable dictionary.
步骤5:再次使用步骤3提到的光谱间Transformer模块对特征进行细化,最终获得重建的高光谱图像,可以表示为:Step 5: Use the inter-spectral Transformer module mentioned in step 3 again to refine the features and finally obtain the reconstructed hyperspectral image, which can be expressed as:
hsi′=SpectralTransformerBlock(c) hsi′=SpectralTransformerBlock(c)
其中hsi′表示重建的高光谱图像。where hsi′ represents the reconstructed hyperspectral image.
基于模型不可知的元学习微调方法:Model-agnostic meta-learning fine-tuning method:
为了模型能在多域图像上都能获得良好的性能,我们提出了使用模型不可知的元学习微调方法,学习一个通用性强的模型参数,使得该模型在任意域图像上经过几步自监督的微调就可以很好的适应该域的特征,具体算法流程如算法1所示:
In order to achieve good performance on multi-domain images, we proposed a model-agnostic meta-learning fine-tuning method to learn a highly universal model parameter, so that the model can adapt well to the characteristics of any domain after a few steps of self-supervised fine-tuning on images in any domain. The specific algorithm flow is shown in Algorithm 1:
其中,表示自监督损失,其形式化为其中d(·)表示从高光谱图像到RGB图像的光谱响应函数,fθ(·)表示参数为θ的神经网络,x为输入图像。表示有监督损失,其形式化为其中mse表示均方损失。in, represents the self-supervised loss, which is formalized as where d(·) represents the spectral response function from the hyperspectral image to the RGB image, f θ (·) represents the neural network with parameter θ, and x is the input image. represents the supervised loss, which is formalized as Where mse stands for mean square loss.
具体实施例:Specific embodiment:
本发明提供了一种面向多域图像场景的基于跨域可迁移知识学习和目标域快速适应学习方式的图像光谱跨域迁移超分重建方法,具体过程如下:The present invention provides an image spectrum cross-domain migration super-resolution reconstruction method based on cross-domain transferable knowledge learning and target domain fast adaptation learning for multi-domain image scenes. The specific process is as follows:
1、数据预处理1. Data preprocessing
对于给定的训练集存在RGB图像和高光谱图像对{imgi,hsii},对于给定的测试集可能不存在hsii。在训练模型时,将RGB数据和高光谱数据均归一化到[0,1]范围内。For a given training set There are RGB images and hyperspectral image pairs {img i , hsi i }, for a given test set hsi i may not exist. When training the model, both RGB data and hyperspectral data are normalized to the range of [0, 1].
此外,对输入图像imgi及其对应的hsii采用随机裁剪、随机水平翻转、随机竖直翻转的数据增强方式,以增强模型的泛化能力。In addition, random cropping, random horizontal flipping, and random vertical flipping are used for the input image img i and its corresponding hsi i to enhance the generalization ability of the model.
2、基于随机小批量的初步预训练 2. Preliminary pre-training based on random mini-batches
由于模型不可知的元学习方法从头训练会导致学习不稳定,并且元学习内层的自监督步骤会导致有监督数据的利用率降低,因此,在初期使用基于随机小批量的方式训练模型,以使得模型可以快速稳定的收敛到一个较好的位置。具体的,从训练集中采样n个样本,共同组成一个批次,输入随机初始化的模型,并使用Adam作为优化器,mrae作为损失函数,以4e-4为初始学习率,使用余弦退火的方式逐步降低学习率,训练150个批次。Since the model-agnostic meta-learning method will lead to unstable learning when trained from scratch, and the self-supervision step in the meta-learning layer will reduce the utilization of supervised data, we use a random mini-batch method to train the model in the early stage so that the model can quickly and stably converge to a better position. We sample n samples from the dataset to form a batch, input a randomly initialized model, use Adam as the optimizer, mrae as the loss function, use 4e-4 as the initial learning rate, and gradually reduce the learning rate using cosine annealing for training for 150 batches.
3、模型不可知的元学习训练3. Model-agnostic meta-learning training
从数据集中采样任务集合每个任务包括支持集和查询集两部分,简单起见,被实例化为各自包含互不重叠的K各RGB和高光谱图像的样本对{(imgi,hsii),i=1,2,...,K},支持集用于内层的自监督损失计算,查询集用于外层的有监督损失计算。另外,在进行元学习训练时,模型初始权重加载由上述基于随机小批量的初步预训练获得的预训练权重,并且去掉掩码结构,以保证元学习优化过程的模型计算图尽可能和测试时的微调与推理一致。一般而言,对于θ′i的迭代更新会持续p步,如果p取得过小,将很难迭代到适应该任务的解,如果p取得过大,则会使得局部参数过拟合到该任务上,因此,一般情况下p取为10。对于内层和外层的学习率α,β分别取1e-5和1e-6。From the dataset Medium sampling task set Each task It includes two parts: support set and query set. For simplicity, they are instantiated as K non-overlapping pairs of RGB and hyperspectral images {(img i , hsi i ), i = 1, 2, ..., K}. The support set is used for the self-supervised loss of the inner layer. Compute, the query set is used for the supervised loss of the outer layer Calculation. In addition, when performing meta-learning training, the initial weights of the model are loaded with the pre-trained weights obtained by the preliminary pre-training based on random mini-batches, and the mask structure is removed to ensure that the model calculation graph of the meta-learning optimization process is as consistent as possible with the fine-tuning and inference during testing. Generally speaking, the iterative update of θ′i will last for p steps. If p is too small, it will be difficult to iterate to a solution that adapts to the task. If p is too large, the local parameters will be overfitted to the task. Therefore, p is generally taken as 10. The learning rates α and β for the inner and outer layers are 1e-5 and 1e-6 respectively.
4、目标域微调与重建推理4. Target Domain Fine-tuning and Reconstruction Reasoning
当学习完成,训练获得的模型就可以进行从RGB到高光谱图像的光谱超分辨率重建。具体的算法流程如算法2所示:
When learning is completed, the trained model can be used to perform spectral super-resolution reconstruction from RGB to hyperspectral images. The specific algorithm flow is shown in Algorithm 2:
最终获得的算法输出hsi′即为从输入RGB图像img重建的高光谱图像。 The final algorithm output hsi′ is the hyperspectral image reconstructed from the input RGB image img.

Claims (2)

  1. 一种面向多域图像的光谱跨域迁移超分重建方法,其特征在于,包括如下步骤:A spectral cross-domain migration super-resolution reconstruction method for multi-domain images, characterized by comprising the following steps:
    步骤1:对于RGB图像其中,h和w分别表示图像的高度和宽度,其标注为i=1...N,hsi表示输入图像img对应的高光谱图像;Step 1: For RGB image Among them, h and w represent the height and width of the image, which are marked as i=1...N, hsi represents the hyperspectral image corresponding to the input image img;
    将图像输入到编码层,将输入图像的通道数从3映射到31,实现初步的光谱重建与对齐;
    e=embedding(img)
    The image is input to the encoding layer, and the number of channels of the input image is mapped from 3 to 31 to achieve preliminary spectral reconstruction and alignment;
    e = embedding(img)
    其中,embedding(·)表示嵌入层,它由卷积核尺寸为3,卷积步长为1的卷积层实例化,e表示嵌入之后的隐藏层特征;Where embedding(·) represents the embedding layer, which is instantiated by a convolutional layer with a convolution kernel size of 3 and a convolution stride of 1, and e represents the hidden layer features after embedding;
    步骤2:对步骤1得到的隐藏层特征e以cube的形式进行随机掩码,在图像上随机采样固定尺寸大小的cube,然后将该位置的特征置换为一个共享可学习掩码;Step 2: Randomly mask the hidden layer feature e obtained in step 1 in the form of a cube, randomly sample a cube of fixed size on the image, and then replace the feature at that position with a shared learnable mask;
    步骤3:对步骤1得到的隐藏特征e,使用基于光谱间注意力模块进行细化,表述为:
    s=SpectralTransformerBlock(e)
    Step 3: The hidden feature e obtained in step 1 is refined using the inter-spectral attention module, which can be expressed as:
    s=SpectralTransformerBlock(e)
    其中,SpectralTransformerBlock(·)表示光谱间Transformer模块,s表示通过光谱间Transformer模块获得的隐层特征;在光谱重建模型中堆叠了多个光谱间Transformer模块,前一个模块的输出为后一个模块的输入;其中SpectralTransformerBlock(·)由SpectralAttention和FFN以及LayerNorm组成:
    SpectralTransformerBlock(x)=t+(FFN(LayerNorm(t)))
    Among them, SpectralTransformerBlock(·) represents the inter-spectral Transformer module, s represents the hidden layer features obtained by the inter-spectral Transformer module; multiple inter-spectral Transformer modules are stacked in the spectral reconstruction model, and the output of the previous module is the input of the next module; SpectralTransformerBlock(·) consists of SpectralAttention, FFN and LayerNorm:
    SpectralTransformerBlock(x)=t+(FFN(LayerNorm(t)))
    其中t=(x+SpectralAttention(LayerNorm(x))),LayerNorm表示层归一化操作,FFN(x)=(conv(gelu(conv(gelu(conv(x)))))),其中conv表示卷积层,gelu是非线性激活函数,x代表一个输入的张量;
    attention(Q,K,V)=softmax(σiQKT)V
    SpectralAttention(X)=attention(XWQ,XWK,XWV)
    Where t = (x + SpectralAttention (LayerNorm (x))), LayerNorm represents the layer normalization operation, FFN (x) = (conv (gelu (conv (gelu (conv (x)))))), where conv represents the convolution layer, gelu is a nonlinear activation function, and x represents an input tensor;
    attention(Q,K,V)=softmax(σ i QK T )V
    SpectralAttention(X)=attention(XW Q ,XW K ,XW V )
    其中σi是可学习的缩放因子,WQ,WK,WV是可学习的投影矩阵,是输入的张量,是将输入图像的特征张量进行重新排列形状得到的; where σ i is a learnable scaling factor, W Q , W K , W V are learnable projection matrices, is the input tensor, which is obtained by rearranging the shape of the feature tensor of the input image;
    步骤4:使用一个由多层全连接神经网络实例化的生成器网络生成可迁移字典,然后将隐层特征s按空间维度切分为一定大小的特征块,并使用交叉注意力机制交互生成器生成的可迁移字典与隐层特征s的特征块,以将跨域共享的知识注入到特征图中,形式化为:
    z=Generator(randomVector+map(s))
    c=CrossAttention(s,z)
    Step 4: Use a generator network instantiated by a multi-layer fully connected neural network to generate a transferable dictionary, then divide the hidden layer feature s into feature blocks of a certain size according to the spatial dimension, and use the cross-attention mechanism to interact with the transferable dictionary generated by the generator and the feature blocks of the hidden layer feature s to inject cross-domain shared knowledge into the feature map, formalized as:
    z=Generator(randomVector+map(s))
    c = CrossAttention(s, z)
    其中,CrossAttention(S,Z)=attention(SWQ,ZWK,ZWV),代表交叉注意力,map(·)是一个映射器,其作用是将s的信息映射到隐空间,它由多层全连接神经网络实例化,并且结构上遵循信息瓶颈结构;Generator(·)表示生成器网络,该网络接收从高斯分布随机采样的向量randomVector和经过map从s获得的图像本身的域信息并生成可迁移的字典;Among them, CrossAttention(S, Z) = attention(SW Q , ZW K , ZW V ), which stands for cross attention, map(·) is a mapper, which maps the information of s to the latent space. It is instantiated by a multi-layer fully connected neural network and follows the information bottleneck structure in structure; Generator(·) represents the generator network, which receives the vector randomVector randomly sampled from the Gaussian distribution and the domain information of the image itself obtained from s through map and generates a transferable dictionary;
    步骤5:再次使用光谱间Transformer模块对特征进行细化,最终获得重建的高光谱图像,表示为:
    hsi′=SpectralTransformerBlock(c)
    Step 5: Use the inter-spectral Transformer module again to refine the features and finally obtain the reconstructed hyperspectral image, which is expressed as:
    hsi′=SpectralTransformerBlock(c)
    其中hsi′表示重建的高光谱图像。where hsi′ represents the reconstructed hyperspectral image.
  2. 根据权利要求1所述的一种面向多域图像的光谱跨域迁移超分重建方法,其特征在于,所述光谱重建模型的训练过程如下:The spectral cross-domain migration super-resolution reconstruction method for multi-domain images according to claim 1 is characterized in that the training process of the spectral reconstruction model is as follows:
    采用基于模型不可知的元学习微调方法,学习一个通用性强的模型参数,使光谱重建模型在任意域图像上经过几步自监督的微调就能适应该域的特征,具体算法如下:A model-agnostic meta-learning fine-tuning method is used to learn a highly universal model parameter, so that the spectral reconstruction model can adapt to the characteristics of any domain after a few steps of self-supervised fine-tuning on the image of any domain. The specific algorithm is as follows:
    首先,初始化模型参数;First, initialize the model parameters;
    然后构造训练数据,其具体方法是:在数据集中采样N个Task每个Task由K个数据对构成;Then construct the training data. The specific method is: sample N tasks in the data set Each Task consists of K data pairs;
    对于每个Task,在K个示例数据上计算自监督损失并根据梯度执行内层的模型参数更新:
    For each task, the self-supervised loss is calculated on K examples. And update the model parameters of the inner layer according to the gradient:
    对于所有Task,使用更新后的模型参数θ′i计算有监督损失并根据梯度执行外层的模型参数更新:
    For all tasks, the supervised loss is calculated using the updated model parameters θ′ i And update the model parameters of the outer layer according to the gradient:
    其中,表示自监督损失,其形式化为其中d(·)表示从高光谱图像到RGB图像的光谱响应函数,fθ(·)表示参数为θ的神经网络,x为输入图像;表示有监督损失,其形式化为其中mse表示均方损失。 in, represents the self-supervised loss, which is formalized as where d(·) represents the spectral response function from the hyperspectral image to the RGB image, f θ (·) represents the neural network with parameter θ, and x is the input image; represents the supervised loss, which is formalized as Where mse stands for mean square loss.
PCT/CN2023/113283 2023-06-21 2023-08-16 Spectral cross-domain transfer super-resolution reconstruction method for multi-domain image WO2024082796A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310745724.8A CN116993584A (en) 2023-06-21 2023-06-21 Multi-domain image-oriented spectrum cross-domain migration super-resolution reconstruction method
CN202310745724.8 2023-06-21

Publications (1)

Publication Number Publication Date
WO2024082796A1 true WO2024082796A1 (en) 2024-04-25

Family

ID=88525616

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/113283 WO2024082796A1 (en) 2023-06-21 2023-08-16 Spectral cross-domain transfer super-resolution reconstruction method for multi-domain image

Country Status (2)

Country Link
CN (1) CN116993584A (en)
WO (1) WO2024082796A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232653A (en) * 2018-12-12 2019-09-13 天津大学青岛海洋技术研究院 The quick light-duty intensive residual error network of super-resolution rebuilding
CN111369433A (en) * 2019-11-12 2020-07-03 天津大学 Three-dimensional image super-resolution reconstruction method based on separable convolution and attention
CN111932461A (en) * 2020-08-11 2020-11-13 西安邮电大学 Convolutional neural network-based self-learning image super-resolution reconstruction method and system
CN114332649A (en) * 2022-03-07 2022-04-12 湖北大学 Cross-scene remote sensing image depth countermeasure transfer learning method based on dual-channel attention mechanism
US20220366536A1 (en) * 2021-04-13 2022-11-17 Hunan University High-resolution hyperspectral computational imaging method and system and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232653A (en) * 2018-12-12 2019-09-13 天津大学青岛海洋技术研究院 The quick light-duty intensive residual error network of super-resolution rebuilding
CN111369433A (en) * 2019-11-12 2020-07-03 天津大学 Three-dimensional image super-resolution reconstruction method based on separable convolution and attention
CN111932461A (en) * 2020-08-11 2020-11-13 西安邮电大学 Convolutional neural network-based self-learning image super-resolution reconstruction method and system
US20220366536A1 (en) * 2021-04-13 2022-11-17 Hunan University High-resolution hyperspectral computational imaging method and system and medium
CN114332649A (en) * 2022-03-07 2022-04-12 湖北大学 Cross-scene remote sensing image depth countermeasure transfer learning method based on dual-channel attention mechanism

Also Published As

Publication number Publication date
CN116993584A (en) 2023-11-03

Similar Documents

Publication Publication Date Title
WO2022160771A1 (en) Method for classifying hyperspectral images on basis of adaptive multi-scale feature extraction model
WO2019227518A1 (en) Convolutional neural network system based on memory
WO2022252272A1 (en) Transfer learning-based method for improved vgg16 network pig identity recognition
CN111860386B (en) Video semantic segmentation method based on ConvLSTM convolutional neural network
CN112132149B (en) Semantic segmentation method and device for remote sensing image
CN115690479A (en) Remote sensing image classification method and system based on convolution Transformer
CN107609638A (en) A kind of method based on line decoder and interpolation sampling optimization convolutional neural networks
WO2022095253A1 (en) Method for removing cloud and haze on basis of depth channel sensing
Chen et al. Single image super-resolution using deep CNN with dense skip connections and inception-resnet
CN105550649A (en) Extremely low resolution human face recognition method and system based on unity coupling local constraint expression
Su et al. LodgeNet: Improved rice lodging recognition using semantic segmentation of UAV high-resolution remote sensing images
CN115457311B (en) Hyperspectral remote sensing image band selection method based on self-expression transfer learning
CN108596044B (en) Pedestrian detection method based on deep convolutional neural network
Zhu et al. Identifying carrot appearance quality by an improved dense CapNet
CN107680081B (en) Hyperspectral image unmixing method based on convolutional neural network
Zhang et al. Deep learning based rapid diagnosis system for identifying tomato nutrition disorders
Zhou et al. MSAR‐DefogNet: Lightweight cloud removal network for high resolution remote sensing images based on multi scale convolution
CN111242028A (en) Remote sensing image ground object segmentation method based on U-Net
Chen-McCaig et al. Convolutional neural networks for texture recognition using transfer learning
WO2024082796A1 (en) Spectral cross-domain transfer super-resolution reconstruction method for multi-domain image
CN116977744A (en) Small sample cross-domain hyperspectral image classification method based on contrast learning and subspace
CN113066537B (en) Compound classification method based on graph neural network
CN112288694B (en) Method for identifying defects of power transformation equipment based on mask region convolution neural network
CN114862733A (en) Hyperspectral image fusion method combining spectrum unmixing prior and learnable degradation constraint
CN114519402A (en) Citrus disease and insect pest detection method based on neural network model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23878791

Country of ref document: EP

Kind code of ref document: A1