WO2024082796A1

WO2024082796A1 - Spectral cross-domain transfer super-resolution reconstruction method for multi-domain image

Info

Publication number: WO2024082796A1
Application number: PCT/CN2023/113283
Authority: WO
Inventors: 张艳宁; 张磊; 魏巍; 任维鑫; 王昊宇
Original assignee: 西北工业大学
Priority date: 2023-06-21
Filing date: 2023-08-16
Publication date: 2024-04-25
Also published as: CN116993584A

Abstract

A spectral cross-domain transfer super-resolution reconstruction method for a multi-domain image. By means of a spectral image cross-domain transfer super-resolution reconstruction method, which is based on cross-domain transferable knowledge learning and rapid target-domain adaptation learning, for a multi-domain image scenario, spectral super-resolution reconstruction from an RGB image to a hyperspectral image is realized. A model structure design based on a transferable dictionary is used to learn cross-domain transferable features; a source-domain pre-training policy based on a shared learnable mask is used to facilitate a model in learning general knowledge for reconstruction; and a model-agnostic meta-learning fine-tuning method is used to learn a universal model with a strong generalization capability, such that the model can adapt to data of a target domain of a test by means of several iterations of test data. The method can mine cross-domain shared knowledge to improve the generalization capability, thereby improving the effect of spectral cross-domain super-resolution reconstruction.

Description

A spectral cross-domain migration super-resolution reconstruction method for multi-domain images

Technical Field

The present invention belongs to the technical field of image processing, and in particular relates to a spectral cross-domain migration super-resolution reconstruction method.

Background technique

Hyperspectral images are images that collect dozens or even hundreds of continuous spectral bands for each pixel in the visible and infrared spectrum. Compared with traditional RGB images, hyperspectral images provide richer spectral information and can identify the spectral characteristics of materials, thereby conducting a more detailed analysis of surface materials.

Hyperspectral images can be applied to environmental remote sensing, agriculture, forestry, geological exploration, urban planning and other fields. For example, in the agricultural field, hyperspectral images can be used to quickly identify, classify, monitor and manage crops, thereby improving crop yield and quality. In environmental monitoring, hyperspectral images can be used to identify and monitor harmful substances in water bodies, as well as to monitor vegetation coverage and land use changes. In the field of urban planning, hyperspectral images can be used to measure urban green space coverage and building heights, optimize urban planning and facility layout, etc.

In summary, hyperspectral images, as images with rich spectral information, have broad application prospects.

However, due to the high price, slow imaging speed and large size of hyperspectral cameras, hyperspectral images are not as widely used as general cameras. In order to fully utilize the advantages of hyperspectral images and circumvent the problems of hyperspectral imaging equipment, researchers proposed a spectral super-resolution method, which aims to use traditional RGB images to estimate and reconstruct hyperspectral images.

According to the reconstruction method, the existing spectral super-resolution methods can be roughly divided into two categories. One is the traditional method, such as (1) spectral super-resolution method based on spectral decomposition: This method uses the spectral decomposition algorithm to decompose and reconstruct the spectral signal, thereby achieving spectral super-resolution. For example, the super-resolution spectral imaging technology "Coupled Nonnegative Matrix Factorization Unmixing for Hyperspectral and Multispectral Data Fusion" based on the non-negative matrix factorization (NMF) algorithm can decompose and reconstruct the spectral signal, thereby achieving spectral super-resolution imaging. (2) Spectral super-resolution method based on sparse representation: This method uses a sparse representation algorithm, such as an algorithm based on dictionary learning, to decompose and reconstruct the spectral signal, thereby achieving spectral super-resolution. For example, the super-resolution spectral imaging technology "Spectral Reflectance Recovery from a Single RGB Image" based on the sparse representation algorithm can sparsely represent and reconstruct the spectral signal, thereby achieving spectral super-resolution imaging. (3) Spectral super-resolution method based on spectral library and model: This method uses the spectral library and model to train and optimize the spectral signal model, thereby achieving spectral super-resolution. For example, based on the partial least squares regression (PLSR) algorithm Super-resolution spectral imaging technology can model and predict spectral signals, thereby achieving super-resolution spectral imaging. These traditional methods often have the problems of slow computing speed and poor reconstruction effect. Another method is based on deep learning. This method uses deep learning networks, such as convolutional neural networks (CNN) "Pixel-aware Deep Function-mixture Network for Spectral Super-Resolution", Transformer "MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction", etc., to train and learn spectral signals, thereby achieving super-resolution of spectra. Although deep learning-based methods have made great progress in recent years and have achieved excellent performance on a single data set, the performance of deep learning-based methods will be severely degraded when tested in scenarios outside the training set.

Summary of the invention

In order to overcome the shortcomings of the prior art, the present invention provides a spectral cross-domain migration super-resolution reconstruction method for multi-domain images, through an image spectral cross-domain migration super-resolution reconstruction method based on cross-domain transferable knowledge learning and target domain rapid adaptation learning for multi-domain image scenes. Spectral super-resolution reconstruction from RGB images to hyperspectral images is achieved. A model structure design based on a transferable dictionary is adopted to learn features that can be transferred across domains; a source domain pre-training strategy based on a shared learnable mask is used to promote the model to learn general knowledge for reconstruction; a fine-tuning method based on model-agnostic meta-learning is used to learn a general model with strong generalization ability, so that it can adapt to the data of the target domain of the test after a few iterations on the test data. The present invention can mine cross-domain shared knowledge to improve generalization ability, thereby improving the effect of cross-domain spectral super-resolution reconstruction.

The technical solution adopted by the present invention to solve the technical problem includes the following steps:

Step 1: For RGB image Among them, h and w represent the height and width of the image, which are marked as hsi represents the hyperspectral image corresponding to the input image img;

The image is input to the encoding layer, and the number of channels of the input image is mapped from 3 to 31 to achieve preliminary spectral reconstruction and alignment;
e = embedding(img)

Where embedding(·) represents the embedding layer, which is instantiated by a convolutional layer with a convolution kernel size of 3 and a convolution stride of 1, and e represents the hidden layer features after embedding;

Step 2: Randomly mask the hidden layer feature e obtained in step 1 in the form of a cube, randomly sample a cube of fixed size on the image, and then replace the feature at that position with a shared learnable mask;

Step 3: The hidden feature e obtained in step 1 is refined using the inter-spectral attention module, which can be expressed as:
s＝SpectralTransformerBlock(e)

Among them, SpectralTransformerBlock(·) represents the inter-spectral Transformer module, s represents the hidden layer features obtained by the inter-spectral Transformer module; multiple inter-spectral Transformer modules are stacked in the spectral reconstruction model, and the output of the previous module is the input of the next module; SpectralTransformerBlock(·) consists of SpectralAttention, FFN and LayerNorm:
SpectralTransformerBlock(x)=t+(FFN(LayerNorm(t)))

Where t = (x + SpectralAttention (LayerNorm (x))), LayerNorm represents the layer normalization operation, FFN (x) = (conv (gelu (conv (gelu (conv (x)))))), where conv represents the convolution layer, gelu is a nonlinear activation function, and x represents an input tensor;
attention(Q，K，V)＝softmax(σ _i QK ^T )V
SpectralAttention(X)＝attention(XW ^Q ，XW ^K ，XW ^V )

where _σi is a learnable scaling factor, ^WQ , ^WK , ^WV are learnable projection matrices, is the input tensor, which is obtained by rearranging the shape of the feature tensor of the input image;

Step 4: Use a generator network instantiated by a multi-layer fully connected neural network to generate a transferable dictionary, then divide the hidden layer feature s into feature blocks of a certain size according to the spatial dimension, and use the cross-attention mechanism to interact with the transferable dictionary generated by the generator and the feature blocks of the hidden layer feature s to inject cross-domain shared knowledge into the feature map, formalized as:
z＝Generator(randomVector+map(s))
c = CrossAttention(s, z)

Among them, CrossAttention(S, Z) = attention(SW ^Q , ZW ^K , ZW ^V ), which stands for cross attention, map(·) is a mapper, which maps the information of s to the latent space. It is instantiated by a multi-layer fully connected neural network and follows the information bottleneck structure in structure; Generator(·) represents the generator network, which receives the vector randomVector randomly sampled from the Gaussian distribution and the domain information of the image itself obtained from s through map and generates a transferable dictionary;

Step 5: Use the inter-spectral Transformer module again to refine the features and finally obtain the reconstructed hyperspectral image, which is expressed as:
hsi′＝SpectralTransformerBlock(c)

where hsi′ represents the reconstructed hyperspectral image.

Preferably, the training process of the spectral reconstruction model is as follows:

A model-agnostic meta-learning fine-tuning method is used to learn a highly universal model parameter, so that the spectral reconstruction model can adapt to the characteristics of any domain after a few steps of self-supervised fine-tuning on the image of any domain. The specific algorithm is as follows:

First, initialize the model parameters;

Then construct the training data. The specific method is: sample N tasks in the data set Each Task consists of K data pairs.

For each task, the self-supervised loss is calculated on K examples. And update the model parameters of the inner layer according to the gradient:

For all tasks, the supervised loss is calculated using the updated model parameters θ′ _i And update the model parameters of the outer layer according to the gradient:

in, represents the self-supervised loss, which is formalized as where d(·) represents the spectral response function from the hyperspectral image to the RGB image, f _θ (·) represents the neural network with parameter θ, and x is the input image. represents the supervised loss, which is formalized as Where mse stands for mean square loss.

The beneficial effects of the present invention are as follows:

Since general deep learning models rely too much on the memory of the training data set and overfit to the training data set, and do not fully learn knowledge that can be shared across domains, the present invention uses cube-level mask operations to force the model to learn the interactions between spectra and space. This knowledge of interactions is shared among all spectral reconstructions. And a generator is used to generate transferable patches, which are added to the spectral reconstruction as cross-domain shared knowledge. The above design ensures that cross-domain shared knowledge can be mined from the perspective of the model to improve the generalization ability, thereby improving the effect of cross-domain spectral super-resolution reconstruction. The model-agnostic meta-learning method does not directly learn the reconstruction from a single RGB image to a hyperspectral image, but learns that any image can be quickly adapted to the target image within a few iterations. The model parameters with strong generalization in the target domain further enhance the versatility of the model. At the same time, combined with the self-supervised rapid fine-tuning in the target domain, the model has the ability of cross-domain migration and super-resolution reconstruction.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG1 is a schematic diagram of the model structure of a transferable dictionary.

Detailed ways

The present invention is further described below in conjunction with the accompanying drawings and embodiments.

As shown in Figure 1, in order to repeatedly exert the high performance of the deep learning-based spectral super-resolution method and alleviate the problem of severe performance degradation in scene tests outside the training set, we proposed an image spectrum cross-domain migration super-resolution reconstruction method based on cross-domain transferable knowledge learning and target domain rapid adaptation learning for multi-domain image scenes for spectral super-resolution reconstruction from RGB images to hyperspectral images. It includes a model structure design based on a transferable dictionary to learn features that can be transferred across domains; a source domain pre-training strategy based on shared learnable masks to promote the model to learn general knowledge for reconstruction; and a fine-tuning method based on model-agnostic meta-learning to learn a general model with strong generalization ability, so that it can adapt to the data of the target domain of the test after a few iterations on the test data.

An image spectrum cross-domain migration super-resolution reconstruction method for multi-domain image scenes based on cross-domain transferable knowledge learning and target domain fast adaptation learning method includes the following aspects and steps:

Spectral reconstruction model structure:

Step 1: For RGB image Among them, h and w represent the height and width of the image, which are marked as Represents the hyperspectral image corresponding to the input image img. The image is input to the encoding layer, and the number of channels of the input image is mapped from 3 to 31 to achieve preliminary spectral reconstruction and alignment.
e = embedding(img)

Where embedding(·) represents the embedding layer, which is instantiated by a convolutional layer with a kernel size of 3 and a convolution stride of 1, and e represents the hidden layer features after embedding.

Step 2: Randomly mask the hidden features obtained in step 1 in the form of cubes, randomly sample cubes of fixed size on the image, and then replace the features at that position with a shared learnable mask.

Step 3: Refine the hidden feature e obtained in step 2 using the inter-spectral attention module. It can be expressed as:
s＝SpectralTransformerBlock(e)

Among them, SpectralTransformerBlock(·) represents the inter-spectral Transformer module, s represents the hidden layer features obtained by the inter-spectral Transformer module. Note that we stack multiple modules in the model, and the output of the previous module is the input of the next module. SpectralTransformerBlock(·) consists of SpectralAttention, FFN and LayerNorm:
SpectralTransformerBlock(x)=t+(FFN(LayerNorm(i)))

Where t = (x + SpectralAttention (LayerNorm (x))), LayerNorm represents the layer normalization operation, FFN (x) = (conv (gelu (conv (gelu (conv (x)))))), where conv represents the convolutional layer, gelu is a nonlinear activation function, and x represents an input tensor.
attention(Q，K，V)＝softmax(σ _i QK ^T )V
SpectralAttention(X)＝attention(XW ^Q ，XW ^K ，XW ^V )

where _σi is a learnable scaling factor, ^WQ , ^WK , ^WV are learnable projection matrices, It is the input tensor, which is obtained by rearranging the shape of the feature tensor of the input image.

Among them, CrossAttention(S, Z) = attention(SW ^Q , ZW ^K , ZW ^V ), which stands for cross attention, map(·) is a mapper, which maps the information of s to the latent space. It is instantiated by a multi-layer fully connected neural network and follows the information bottleneck structure. Generator(·) represents the generator network, which receives the vector randomVector randomly sampled from the Gaussian distribution and the domain information of the image itself obtained from s through map and generates a transferable dictionary.

Step 5: Use the inter-spectral Transformer module mentioned in step 3 again to refine the features and finally obtain the reconstructed hyperspectral image, which can be expressed as:

hsi′＝SpectralTransformerBlock(c)

where hsi′ represents the reconstructed hyperspectral image.

Model-agnostic meta-learning fine-tuning method:

In order to achieve good performance on multi-domain images, we proposed a model-agnostic meta-learning fine-tuning method to learn a highly universal model parameter, so that the model can adapt well to the characteristics of any domain after a few steps of self-supervised fine-tuning on images in any domain. The specific algorithm flow is shown in Algorithm 1:

Specific embodiment:

The present invention provides an image spectrum cross-domain migration super-resolution reconstruction method based on cross-domain transferable knowledge learning and target domain fast adaptation learning for multi-domain image scenes. The specific process is as follows:

1. Data preprocessing

For a given training set There are RGB images and hyperspectral image pairs {img _i , hsi _i }, for a given test set hsi _i may not exist. When training the model, both RGB data and hyperspectral data are normalized to the range of [0, 1].

In addition, random cropping, random horizontal flipping, and random vertical flipping are used for the input image img _i and its corresponding hsi _i to enhance the generalization ability of the model.

2. Preliminary pre-training based on random mini-batches

Since the model-agnostic meta-learning method will lead to unstable learning when trained from scratch, and the self-supervision step in the meta-learning layer will reduce the utilization of supervised data, we use a random mini-batch method to train the model in the early stage so that the model can quickly and stably converge to a better position. We sample n samples from the dataset to form a batch, input a randomly initialized model, use Adam as the optimizer, mrae as the loss function, use 4e-4 as the initial learning rate, and gradually reduce the learning rate using cosine annealing for training for 150 batches.

3. Model-agnostic meta-learning training

From the dataset Medium sampling task set Each task It includes two parts: support set and query set. For simplicity, they are instantiated as K non-overlapping pairs of RGB and hyperspectral images {(img _i , hsi _i ), i = 1, 2, ..., K}. The support set is used for the self-supervised loss of the inner layer. Compute, the query set is used for the supervised loss of the outer layer Calculation. In addition, when performing meta-learning training, the initial weights of the model are loaded with the pre-trained weights obtained by the preliminary pre-training based on random mini-batches, and the mask structure is removed to ensure that the model calculation graph of the meta-learning optimization process is as consistent as possible with the fine-tuning and inference during testing. Generally speaking, the iterative update of θ′i will last for p steps. If p is too small, it will be difficult to iterate to a solution that adapts to the task. If p is too large, the local parameters will be overfitted to the task. Therefore, p is generally taken as 10. The learning rates α and β for the inner and outer layers are 1e-5 and 1e-6 respectively.

4. Target Domain Fine-tuning and Reconstruction Reasoning

When learning is completed, the trained model can be used to perform spectral super-resolution reconstruction from RGB to hyperspectral images. The specific algorithm flow is shown in Algorithm 2:

The final algorithm output hsi′ is the hyperspectral image reconstructed from the input RGB image img.

Claims

A spectral cross-domain migration super-resolution reconstruction method for multi-domain images, characterized by comprising the following steps:

Step 1: For RGB image Among them, h and w represent the height and width of the image, which are marked as i＝1...N, hsi represents the hyperspectral image corresponding to the input image img;

The image is input to the encoding layer, and the number of channels of the input image is mapped from 3 to 31 to achieve preliminary spectral reconstruction and alignment;
e = embedding(img)

Where embedding(·) represents the embedding layer, which is instantiated by a convolutional layer with a convolution kernel size of 3 and a convolution stride of 1, and e represents the hidden layer features after embedding;

Step 2: Randomly mask the hidden layer feature e obtained in step 1 in the form of a cube, randomly sample a cube of fixed size on the image, and then replace the feature at that position with a shared learnable mask;

Step 3: The hidden feature e obtained in step 1 is refined using the inter-spectral attention module, which can be expressed as:
s＝SpectralTransformerBlock(e)

Among them, SpectralTransformerBlock(·) represents the inter-spectral Transformer module, s represents the hidden layer features obtained by the inter-spectral Transformer module; multiple inter-spectral Transformer modules are stacked in the spectral reconstruction model, and the output of the previous module is the input of the next module; SpectralTransformerBlock(·) consists of SpectralAttention, FFN and LayerNorm:
SpectralTransformerBlock(x)=t+(FFN(LayerNorm(t)))

Where t = (x + SpectralAttention (LayerNorm (x))), LayerNorm represents the layer normalization operation, FFN (x) = (conv (gelu (conv (gelu (conv (x)))))), where conv represents the convolution layer, gelu is a nonlinear activation function, and x represents an input tensor;
attention(Q，K，V)＝softmax(σ i QK T )V
SpectralAttention(X)＝attention(XW Q ，XW K ，XW V )

where σ i is a learnable scaling factor, W Q , W K , W V are learnable projection matrices, is the input tensor, which is obtained by rearranging the shape of the feature tensor of the input image;

Step 4: Use a generator network instantiated by a multi-layer fully connected neural network to generate a transferable dictionary, then divide the hidden layer feature s into feature blocks of a certain size according to the spatial dimension, and use the cross-attention mechanism to interact with the transferable dictionary generated by the generator and the feature blocks of the hidden layer feature s to inject cross-domain shared knowledge into the feature map, formalized as:
z＝Generator(randomVector+map(s))
c = CrossAttention(s, z)

Among them, CrossAttention(S, Z) = attention(SW Q , ZW K , ZW V ), which stands for cross attention, map(·) is a mapper, which maps the information of s to the latent space. It is instantiated by a multi-layer fully connected neural network and follows the information bottleneck structure in structure; Generator(·) represents the generator network, which receives the vector randomVector randomly sampled from the Gaussian distribution and the domain information of the image itself obtained from s through map and generates a transferable dictionary;

Step 5: Use the inter-spectral Transformer module again to refine the features and finally obtain the reconstructed hyperspectral image, which is expressed as:
hsi′＝SpectralTransformerBlock(c)

where hsi′ represents the reconstructed hyperspectral image.
The spectral cross-domain migration super-resolution reconstruction method for multi-domain images according to claim 1 is characterized in that the training process of the spectral reconstruction model is as follows:

A model-agnostic meta-learning fine-tuning method is used to learn a highly universal model parameter, so that the spectral reconstruction model can adapt to the characteristics of any domain after a few steps of self-supervised fine-tuning on the image of any domain. The specific algorithm is as follows:

First, initialize the model parameters;

Then construct the training data. The specific method is: sample N tasks in the data set Each Task consists of K data pairs;

For each task, the self-supervised loss is calculated on K examples. And update the model parameters of the inner layer according to the gradient:

For all tasks, the supervised loss is calculated using the updated model parameters θ′ i And update the model parameters of the outer layer according to the gradient:

in, represents the self-supervised loss, which is formalized as where d(·) represents the spectral response function from the hyperspectral image to the RGB image, f θ (·) represents the neural network with parameter θ, and x is the input image; represents the supervised loss, which is formalized as Where mse stands for mean square loss.