CN116993584A

CN116993584A - Multi-domain image-oriented spectrum cross-domain migration super-resolution reconstruction method

Info

Publication number: CN116993584A
Application number: CN202310745724.8A
Authority: CN
Inventors: 张艳宁; 张磊; 魏巍; 任维鑫; 王昊宇
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2023-06-21
Filing date: 2023-06-21
Publication date: 2023-11-03
Also published as: WO2024082796A1

Abstract

The invention discloses a multi-domain image-oriented spectrum cross-domain migration super-division reconstruction method, which is an image spectrum cross-domain migration super-division reconstruction method based on a cross-domain migratable knowledge learning and target domain fast adaptation learning mode. And realizing spectrum super-resolution reconstruction from an RGB image to a hyperspectral image. Model structural design based on a transferable dictionary is adopted to learn the characteristics which can be transferred across domains; facilitating model learning of generic knowledge for reconstruction based on a source domain pre-training strategy that shares a learnable mask; the fine tuning method based on model-agnostic meta-learning is used for learning a general model with strong generalization capability, so that the data of a tested target domain can be adapted to the test data through a plurality of steps of iteration. The invention can mine the knowledge of cross-domain sharing to improve generalization capability and further improve the effect of cross-domain spectrum super-division reconstruction.

Description

Multi-domain image-oriented spectrum cross-domain migration super-resolution reconstruction method

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a spectrum cross-domain migration super-division reconstruction method.

Background

Hyperspectral images refer to images in which tens or even hundreds of consecutive spectral bands are acquired for each pixel in the visible and infrared spectral ranges. Compared with the traditional RGB image, the hyperspectral image provides more abundant spectral information, and can identify the spectral characteristics of the material, so that the earth surface substances are analyzed more carefully.

The hyperspectral image can be applied to the fields of environmental remote sensing, agriculture, forestry, geological exploration, urban planning and the like. For example, in the agricultural field, the hyperspectral image can be used for rapidly identifying, classifying, monitoring and managing crops, so that the yield and quality of the crops are improved. In environmental monitoring, hyperspectral images can be used to identify and monitor harmful substances in a body of water, as well as to monitor vegetation coverage and land use changes. In the field of urban planning, the hyperspectral image can be used for measuring urban green land coverage and building height, optimizing urban planning and facility layout and the like.

In conclusion, the hyperspectral image has wide application prospect as an image with abundant spectral information.

However, hyperspectral images are not widely used as general cameras because hyperspectral cameras are expensive, have a slow imaging speed, have a large volume, and the like. In order to fully utilize the advantages of the hyperspectral image and simultaneously avoid the problems of the hyperspectral imaging device, researchers propose a method for spectrum hyperspectral, aiming at estimating and reconstructing the hyperspectral image by using the traditional RGB image.

Existing spectral superdivision methods can be broadly divided into two categories according to the reconstruction scheme. One is a conventional method, such as (1) a spectral super-resolution method based on spectral decomposition: the method utilizes a spectrum decomposition algorithm to decompose and reconstruct spectrum signals, thereby realizing super-resolution of spectrum. For example, a super-resolution spectral imaging technique "Coupled Nonnegative Matrix Factorization Unmixing for Hyperspectral and Multispectral Data Fusion" based on a non-Negative Matrix Factorization (NMF) algorithm may decompose and reconstruct the spectral signals to achieve super-resolution imaging of the spectrum. (2) a sparse representation-based spectral super-resolution method: the method utilizes a sparse representation algorithm, such as an algorithm based on dictionary learning, to decompose and reconstruct the spectrum signals, thereby realizing super-resolution of the spectrum. For example, a super-resolution spectral imaging technique "Spectral Reflectance Recovery from a Single RGB Image" based on a sparse representation algorithm may perform sparse representation and reconstruction on the spectral signals, thereby implementing super-resolution imaging of the spectrum. (3) Spectral library and model based spectral super resolution method: the method utilizes a spectrum library and a model to perform model training and optimization on a spectrum signal, thereby realizing super-resolution of the spectrum. For example, super-resolution spectral imaging techniques based on Partial Least Squares Regression (PLSR) algorithms can model and predict the spectral signals to achieve super-resolution imaging of the spectrum. These conventional methods often have problems of slow operation speed and poor reconstruction effect. Another approach, which is based on deep learning, is to train and learn the optical signal using a deep learning network, such as a Convolutional Neural Network (CNN) 'Pixel-aware Deep Function-mixture Network for Spectral Super-Resolution', a transducer 'mst++: multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction', etc., to achieve super-Resolution of the spectrum. Although deep learning-based methods have evolved tremendously in recent years and achieved excellent performance on a single dataset, deep learning-based methods severely degrade performance when tested in a scenario outside the training set.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a multi-domain image-oriented spectrum cross-domain migration super-resolution reconstruction method, which is based on a cross-domain migratable knowledge learning and target domain fast adaptation learning mode through multi-domain image scene oriented image spectrum cross-domain migration super-resolution reconstruction method. And realizing spectrum super-resolution reconstruction from an RGB image to a hyperspectral image. Model structural design based on a transferable dictionary is adopted to learn the characteristics which can be transferred across domains; facilitating model learning of generic knowledge for reconstruction based on a source domain pre-training strategy that shares a learnable mask; the fine tuning method based on model-agnostic meta-learning is used for learning a general model with strong generalization capability, so that the data of a tested target domain can be adapted to the test data through a plurality of steps of iteration. The invention can mine the knowledge of cross-domain sharing to improve generalization capability and further improve the effect of cross-domain spectrum super-division reconstruction.

The technical scheme adopted by the invention for solving the technical problems comprises the following steps:

step 1: for RGB imagesWherein h and w represent the height and width of the image, respectively, which are denoted +.>hsi represents a hyperspectral image corresponding to the input image img;

inputting the image into a coding layer, and mapping the channel number of the input image from 3 to 31 to realize preliminary spectrum reconstruction and alignment;

e＝embedding(img)

wherein, ebedding (·) represents the embedded layer, which is instantiated by a convolution layer with a convolution kernel size of 3 and a convolution step size of 1, representing the hidden layer characteristics after embedding;

step 2: randomly masking the hidden layer features obtained in the step 1 in a cube form, randomly sampling the cube with a fixed size on an image, and replacing the features at the position with a shared learnable mask;

step 3: and (2) refining the hidden characteristic e obtained in the step (1) by using an inter-spectrum attention-based module, wherein the hidden characteristic e is expressed as follows:

s＝SpectralTransformerBlock(e)

wherein spectralTransformaerBlock (·) represents an inter-spectral Transformer module, s represents hidden layer features obtained by the inter-spectral Transformer module; stacking a plurality of inter-spectrum transducer modules in a spectrum reconstruction model, wherein the output of the former module is the input of the latter module; wherein SpectralTransformaerBlock (.) consists of SpectralAttention and FFN and LayerNorm:

SpectralTransformerBlock(x)＝t+(FFN(LayerNorm(t)))

where t= (x+spectroalattention (LayerNorm (x))), layerNorm represents the layer normalization operation, FFN (x) = (conv (gel (conv (x)))))) where conv represents the convolutional layer, gel is a nonlinear activation function, and x represents one input tensor;

attention(Q，K，V)＝softmax(σ _i QK ^T )V

SpectralAttention(X)＝attention(XW ^Q ，XW ^K ，XW ^V )

wherein σ_i Is a learnable scale factor, W ^Q ，W ^K ，W ^V Is a matrix of projections that can be learned,the input tensor is obtained by rearranging the shape of the characteristic tensor of the input image;

step 4: generating a migratable dictionary by using a generator network instantiated by a multi-layer fully-connected neural network, then dividing hidden layer features s into feature blocks with a certain size according to space dimensions, and injecting knowledge shared across domains into a feature graph by using the migratable dictionary generated by a cross-attention mechanism interaction generator and the feature blocks of the hidden layer features s to form:

z＝Generator(randomVector+map(s))

c＝CrossAttention(s，z)

wherein Cross sAttention (S, Z) =attention (SW) ^Q ，ZW ^K ，ZW ^V ) Representing cross-attention, map (-) is a mapper that acts to map s' information into hidden space, which is instantiated by a multi-layer fully-connected neural network, and structurally follows the information bottleneck structure; the Generator (-) represents a Generator network that receives the vector random vector randomly sampled from the gaussian distribution and the domain information of the image itself obtained from s via map and generates a migratable dictionary;

step 5: the features are refined again using the inter-spectrum transducer module, and finally a reconstructed hyperspectral image is obtained, expressed as:

hsi′＝SpectralTransformerBlock(c)

where hsi' represents the reconstructed hyperspectral image.

Preferably, the training process of the spectrum reconstruction model is as follows:

by adopting a meta-learning fine tuning method based on model agnostic, a model parameter with strong universality is learned, so that the spectrum reconstruction model can adapt to the characteristics of any domain through a plurality of steps of self-supervision fine tuning on the image of the domain, and the specific algorithm is as follows:

firstly, initializing model parameters;

then constructing training data, wherein the specific method is as follows: sampling N tasks in a datasetEach Task consists of K data pairs.

For each Task, self-supervised losses are computed over K example dataAnd performing model parameter updating of the inner layer according to the gradient:

for all tasks, the updated model parameters θ 'are used' _i Calculating supervised lossesAnd performing model parameter updating of the outer layer according to the gradient:

wherein ,representing self-supervision loss in the form of +.>Where d (·) represents the spectral response function from the hyperspectral image to the RGB image, f _θ (. Cndot.) represents a neural network with a parameter θ, and x is the input image. />Representing a supervised loss in the form of +.>Where mse represents the mean square loss.

The beneficial effects of the invention are as follows:

because the general deep learning model is too dependent on the memory of the training data set, is excessively fitted to the training data set, and does not learn enough knowledge that can be shared across domains, the masking operation of the cube level is used to force the model to learn interactions between spectrums and spaces, and the knowledge of the interactions is shared among all spectrum reconstructions. And using a generator to generate a migratable patch to be added to the spectral reconstruction as cross-domain shared knowledge. The design ensures that the knowledge of cross-domain sharing can be mined from the angle of the model so as to improve the generalization capability and further improve the effect of cross-domain spectrum super-division reconstruction. The model-agnostic meta-learning method does not directly learn the reconstruction from a single RGB image to a hyperspectral image, but learns model parameters with strong generalization, which can be quickly adapted to a target domain in a plurality of steps of iteration for any image, further enhances the universality of the model, and simultaneously matches with the self-supervision quick fine tuning of the target domain, so that the model has the capacity of cross-domain migration and super-division reconstruction.

Drawings

FIG. 1 is a schematic diagram of a model structure of a migratable dictionary.

Detailed Description

The invention will be further described with reference to the drawings and examples.

As shown in fig. 1, in order to repeatedly exert the high performance of the spectral super-resolution method based on deep learning and alleviate the problem of serious degradation of the scene test performance outside the training set, we propose a multi-domain image scene-oriented image spectral cross-domain migration super-resolution reconstruction method based on the cross-domain migratable knowledge learning and the target domain fast adaptation learning mode for spectral super-resolution reconstruction from RGB images to hyperspectral images. The model structure design based on the transferable dictionary is used for learning the characteristics which can be transferred across domains; a source domain pre-training strategy based on a shared learner mask to facilitate model learning of common knowledge for reconstruction; the method comprises a fine tuning method based on model-agnostic meta-learning, which is used for learning a general model with strong generalization capability, so that the data of a tested target domain can be adapted to the test data through a plurality of steps of iteration.

A multi-domain image scene-oriented image spectrum cross-domain migration super-resolution reconstruction method based on a cross-domain migratable knowledge learning and target domain rapid adaptation learning mode comprises the following aspects and steps:

spectral reconstruction model structure:

step 1: for RGB imagesWherein h and w represent the height and width of the image, respectively, which are denoted +.>Representing a hyperspectral image corresponding to the input image img. The image is input to the coding layer, the channel number of the input image is mapped from 3 to 31, and preliminary spectrum reconstruction and alignment are realized.

e＝embedding(img)

Where ebedding (·) represents the embedded layer, instantiated by a convolution layer with a convolution kernel size of 3 and a convolution step size of 1, e represents the hidden layer feature after embedding.

Step 2: and (3) carrying out random masking on the hidden layer features obtained in the step (1) in a form of cube, randomly sampling the cube with a fixed size on the image, and replacing the features at the position with a shared learnable mask.

Step 3: the method comprises the following steps: 2, refining the obtained hidden characteristic e by using an inter-spectrum attention module. The expression is as follows:

s＝SpectralTransformerBlock(e)

where spectralTransformaerBlock (·) represents the inter-spectral Transformer module, s represents the hidden layer features obtained by the inter-spectral Transformer module, note that we stack multiple modules in the model, the output of the former module being the input of the latter. Wherein SpectralTransformaerBlock (.) consists of SpectralAttention and FFN and LayerNorm:

SpectralTransformerBlock(x)＝t+(FFN(LayerNorm(t)))

where t= (x+spectroalattention (LayerNorm (x))), layerNorm represents the layer normalization operation, FFN (x) = (conv (gel (conv (x)))))) where conv represents the convolutional layer, gel is a nonlinear activation function, and x represents the tensor of an input.

attention(Q，K，V)＝softmax(σ _i QK ^T )V

SpectralAttention(X)＝attention(XW ^Q ，XW ^K ，XW ^V )

wherein σ_i Is learnableScaling factor, W ^Q ，W ^K ，W ^V Is a matrix of projections that can be learned,is the input tensor, which is obtained by rearranging the feature tensor of the input image.

z＝Generator(randomVector+map(s))

c＝CrossAttention(s，z)

step 5: the features are refined again using the inter-spectrum transducer module mentioned in step 3, and the reconstructed hyperspectral image is finally obtained, which can be expressed as:

hsi′＝SpectralTransformerBlock(c)

where hsi' represents the reconstructed hyperspectral image.

Model-agnostic meta-learning fine tuning method:

in order to obtain good performance of the model on multi-domain images, we propose to learn a model parameter with strong universality by using a meta-learning fine tuning method which is unknown to the model, so that the model can be well adapted to the characteristics of the domain through fine tuning of several steps of self-supervision on any domain image, and a specific algorithm flow is shown as an algorithm 1:

Specific examples:

the invention provides a multi-domain image scene-oriented image spectrum cross-domain migration super-resolution reconstruction method based on a cross-domain migratable knowledge learning and target domain fast adaptation learning mode, which comprises the following specific processes:

1. data preprocessing

For a given training setThere are RGB image and hyperspectral image pairs { img ] _i ，hsi _i For a given test set ∈ }>There may be no hsi _i . In training the model, both RGB data and hyperspectral data are normalized to [0,1 ]]Within the range.

Furthermore, for the input image img _i Hsi corresponding thereto _i By usingData enhancement modes of random clipping, random horizontal overturn and random vertical overturn are adopted to enhance the generalization capability of the model.

2. Preliminary pre-training based on random small lot size

Since model-agnostic meta-learning methods lead to learning instability from scratch and the self-supervision step of the meta-learning inner layer leads to reduced utilization of supervised data, the model is trained in an early stage using a random small batch-based manner, so that the model can quickly and stably converge to a better position. Specifically, from the training setAnd (3) sampling n samples to form a batch, inputting a randomly initialized model, using Adam as an optimizer, mrae as a loss function, using 4e-4 as an initial learning rate, gradually reducing the learning rate in a cosine annealing mode, and training 150 batches.

3. Model agnostic meta-learning training

From a datasetMid-sampling task set->Every task->Comprising two parts of a support set and a query set, instantiated for simplicity as a sample pair { (img) each containing K RGB and hyperspectral images that do not overlap each other _i ，hsi _i ) I=1, 2,..the, K }, support set self-supervising loss for inner layer +.>Calculating, query set for supervised loss of the outer layer ∈>And (5) calculating. In addition, in the case of the optical fiber,when the meta learning training is carried out, the model initial weight is loaded with the pre-training weight obtained by the preliminary pre-training based on random small batches, and a mask structure is removed, so that the model calculation graph in the meta learning optimization process is ensured to be consistent with fine adjustment and reasoning during testing as much as possible. Generally, for θ' _i If p gets too small it will be difficult to iterate to a solution that fits the task, if p gets too large it will cause local parameters to fit too much to the task, so p is typically taken as 10. For the learning rate alpha, beta of the inner layer and the outer layer, 1e-5 and 1e-6 are respectively taken.

4. Target domain fine tuning and reconstruction reasoning

When learning is completed, the model obtained by training can be subjected to spectral super-resolution reconstruction from RGB to hyperspectral images. The specific algorithm flow is shown in algorithm 2:

the final obtained algorithm output hsi' is the hyperspectral image reconstructed from the input RGB image img.

Claims

1. A multi-domain image-oriented spectrum cross-domain migration super-division reconstruction method is characterized by comprising the following steps:

step 1: for RGB imagesWherein h and w represent the height and width of the image, respectively, which are labeledhsi represents a hyperspectral image corresponding to the input image img;

e＝embedding(img)

wherein, ebedding (·) represents the embedded layer, which is instantiated by a convolution layer with a convolution kernel size of 3 and a convolution step size of 1, e represents the hidden layer feature after embedding;

step 2: carrying out random masking on the hidden layer feature e obtained in the step 1 in a cube form, randomly sampling a cube with a fixed size on an image, and replacing the feature at the position with a shared learnable mask;

s＝SpectralTransformerBlock(e)

SpectralTransformerBlock(x)＝t+(FFN(LayerNorm(t)))

attention(Q，K，V)＝softmax(σ _i QK ^T )V

SpectralAttention(X)＝attention(XW ^Q ，XW ^K ，XW ^V )

z＝Generator(randomVector+map(s))

c＝CrossAttention(s，z)

hsi′＝SpectralTransformerBlock(c)

where hsi' represents the reconstructed hyperspectral image.

2. The multi-domain image-oriented spectrum cross-domain migration super-resolution reconstruction method according to claim 1, wherein the training process of the spectrum reconstruction model is as follows:

firstly, initializing model parameters;

then constructing training data, wherein the specific method is as follows: sampling N tasks in a datasetEach Task is composed of K data pairs;

for each Task, atCalculating self-supervising losses over K example dataAnd performing model parameter updating of the inner layer according to the gradient:

wherein ,representing self-supervision loss in the form of +.>Where d (·) represents the spectral response function from the hyperspectral image to the RGB image, f _θ (. Cndot.) represents a neural network with a parameter θ, x being the input image; />Representing a supervised loss in the form of +.>Where mse represents the mean square loss.