CN118212127A - Misregistration-based physical instruction generation type hyperspectral super-resolution countermeasure method - Google Patents

Misregistration-based physical instruction generation type hyperspectral super-resolution countermeasure method Download PDF

Info

Publication number
CN118212127A
CN118212127A CN202410085428.4A CN202410085428A CN118212127A CN 118212127 A CN118212127 A CN 118212127A CN 202410085428 A CN202410085428 A CN 202410085428A CN 118212127 A CN118212127 A CN 118212127A
Authority
CN
China
Prior art keywords
hyperspectral
convolution
multispectral
image
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410085428.4A
Other languages
Chinese (zh)
Inventor
徐凯
陈咏夷
汪安铃
朱洲
汪帮俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202410085428.4A priority Critical patent/CN118212127A/en
Publication of CN118212127A publication Critical patent/CN118212127A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

Compared with the prior art, the method solves the problem that the reconstruction result is unstable due to the fact that the accurate registration is needed and the necessary physical meaning is lacking in the hyperspectral data synthesis process. The invention comprises the following steps: obtaining and synthesizing hyperspectral and multispectral data sets with different resolutions, constructing an anti-hyperspectral super-resolution model generated based on unregistered physical guidance, training the anti-hyperspectral super-resolution model generated based on unregistered physical guidance, obtaining true hyperspectral and multispectral remote sensing images to be enhanced, and obtaining a super-resolution reconstruction result. The invention provides a multi-mode fusion network reconstruction high-resolution hyperspectral fusion image based on rich spectral information of hyperspectral images and rich spatial information of multispectral images, and an enhanced image with high resolution, high signal-to-noise ratio and identifiable characteristics is obtained.

Description

Misregistration-based physical instruction generation type hyperspectral super-resolution countermeasure method
Technical Field
The invention relates to the technical field of optical remote sensing image processing, in particular to an anti-hyperspectral super-resolution method based on unregistered physical guidance generation.
Background
Remote sensing images are the most critical data base in the ground monitoring industry. Due to the limited incident energy, there is always a trade-off between spectral resolution, spatial resolution and image signal-to-noise ratio in designing imaging sensors. For onboard spectral sensors, the spatial resolution of the hyperspectral image is typically below 1 meter/pixel, and for on-board sensors, the resolution is even as low as 30 meters/pixel. Typically, the spatial resolution of the hyperspectral image is sacrificed, thereby impeding subsequent tasks. Image fusion is taken as one of important steps of remote sensing image processing, and is an important means for improving insufficient spatial resolution of hyperspectral data at present. One practical hyperspectral image super-resolution solution is to use a low spatial resolution hyperspectral image (LR-HSI) and a high spatial resolution multispectral image (HR-MSI) and fuse them into a target resolution hyperspectral image (HR-HSI). By carrying out super-resolution reconstruction on the hyperspectral data, an image which has high spatial resolution and contains fine spectral information can be generated, and the practical value of the remote sensing data is remarkably improved.
Currently, three methods for super-resolution of hyperspectral data are mainly used: model optimization method, image fusion method and deep learning method. The model optimization method can realize high-fidelity image reconstruction under the assistance of the degradation model by applying priori information to the image to be solved. However, in the face of complex natural scenarios, designing a suitable and efficient a priori constraints is a cumbersome exploratory effort, often accompanied by a highly complex solution process. The image fusion method aims at achieving the aim of superresolution by fusing complementary information of a well-registered multispectral image or panchromatic image and a hyperspectral image, but the fusion-based method generally requires high-resolution multispectral image or panchromatic image and low-resolution hyperspectral image pairs, but due to sensor limitations and image registration problems, obtaining such image pairs is generally difficult and expensive. The deep learning method gets rid of the prior process of artificial design, and realizes the high fitting of complex functions by automatically mining a large amount of data. However, the large-scale and high-quality training set makes the supervision method difficult to effectively apply to actual demands.
The existing hyperspectral super-resolution reconstruction method mainly has the following problems: good registration of LR-HSI and HR-MSI is required, and the reconstruction accuracy of HR-HSI is greatly dependent on the registration accuracy of different modalities; common convolution cannot capture local or non-local context information along multiple directions, resulting in irrelevant areas interfering with feature learning, features at different locations should have different importance for fusion tasks for unregistered images; the lack of interpretability and the lack of necessary physical guidance in the process of super-resolution reconstruction lead to instability of the reconstruction process, inability to intuitively analyze the synthesis process, and limited generalization capability. Aiming at the problems, the invention provides a method for generating an anti-hyperspectral super-resolution based on unregistered physical guidance.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method for generating an anti-hyperspectral super-resolution method based on unregistered physical guidance. Firstly, respectively extracting the characteristics of hyperspectrum and multispectral by utilizing a double-branch network, simultaneously inspiring stereoscopic vision imaging, simulating a stereoscopic vision structure to realize geometric registration of different modes, and learning and matching the characteristics among different mode information by parameter sharing, thereby effectively avoiding complex characteristic matching operation; then, learning complementary attention information of different modes, and collecting context information in horizontal and vertical directions, so that pixels with different distances have the same expression opportunity; then, constructing a generator network and a discriminator network, which are used for antagonistically super-resolution hyperspectral images and improving visual fidelity, wherein the generator is used for reconstructing each pixel position and generating an abundance map corresponding to each object, and a multi-scale guiding filter structure is adopted to transmit the structural information of the multispectral images to an output image so as to avoid edge artifacts and simultaneously maintain the spectral characteristics of the original hyperspectral images; in addition, in consideration of the problem of poor stability caused by lack of necessary physical information in the existing super-resolution reconstruction method, an imaging mechanism and spectrum abundance information are introduced, a standard spectrum library is used for generating a final spectrum data cube, and high-quality hyperspectral data is synthesized.
The method for generating the anti-hyperspectral super-resolution based on the unregistered physical instruction specifically comprises the following steps of:
11 Generating hyperspectral and multispectral datasets: training and testing data sets were constructed using Wald protocol. Using the original hyperspectral image as tag data, and guiding space downsampling by utilizing edge information and spectral distribution in the hyperspectral image in a training sample; denoising before spectrum downsampling, so as to generate hyperspectral and multispectral image data sets;
12 Constructing an anti-hyperspectral super-resolution model based on unregistered physical instruction, wherein the model consists of a stackable stereo cross attention module, a generator module, a physical guide reconstruction module and a discriminator module;
13 Training against hyperspectral super-resolution models based on unregistered physical guideline generation: taking the preprocessed hyperspectral multispectral training data sets with different resolutions as input, and fully training the network to obtain a trained unregistered physical instruction generation type antagonistic hyperspectral super-resolution model;
14 Hyperspectral and multispectral remote sensing image hyperspectral reconstruction result acquisition: and inputting the acquired pair-wise real hyperspectral and multispectral remote sensing images to be processed into a trained unregistered physical instruction generation type anti-hyperspectral super-resolution model to obtain a reconstructed high-resolution hyperspectral remote sensing image.
Further, the acquiring of the synthetic hyperspectral multispectral dataset of different resolutions comprises the steps of:
21 Carrying out Gaussian blur kernel filtering with window size of 5 multiplied by 5, 0 mean value and standard deviation of 2 and four bilinear interpolation downsampling on the original reference hyperspectral image to obtain a hyperspectral image with low spatial resolution for training;
22 Generating a high resolution multispectral image using a spectral response function of the corresponding multispectral satellite for the original reference hyperspectral image;
23 Respectively adding Gaussian white noise into the high-resolution multispectral image and the low-resolution hyperspectral image;
24 Using the original hyperspectral image as a trained label image;
25 A 128 x 128 block is cut out of the central region of the original reference data as a test set, the remaining region is used for training, and the training region is randomly cut to a size of 128 x 128 for training in each iteration;
25 For the overlapping part of the training set and the test set which are randomly divided, filling with 0 pixels;
26 Hyperspectral and multispectral misregistration states were simulated by translating 1 pixel in the horizontal, vertical, and diagonal directions of the training set and the test set.
Further, the construction of the unregistered physical guideline generating type contrast hyperspectral super resolution model includes the following steps:
31 Constructing a stackable stereo cross-attention module for improving reconstruction quality by using complementary information between non-strictly registered hyperspectral and multispectral images, wherein the module is composed of a double-branch parallel weight sharing network and comprises a stereo cross fusion sub-module and a cross convergence fusion self-attention sub-module:
311 A stereo cross fusion sub-module is constructed, the sub-module is composed of a multi-head translation convolution block (MTConv) and a gating feedforward network (FFN), and the two sub-blocks are connected by residual errors, which can be expressed as follows:
XMTConv=MTConv(LN(X))+X
XFFN=FFN(LN(XMTConv))+XMTConv
Wherein MTConv represents a multi-head translational convolution block, X represents an input of the multi-head translational convolution block, X MTConv represents an output of the multi-head translational convolution block, FFN represents a gated feed-forward network, LN represents layer normalization Layer Normalization, and X FFN represents an output of the gated feed-forward network;
3111 Constructing a multi-head translational convolution block (MTConv) which is sequentially composed of a normalization layer, a1×1 convolution layer, a3×3 depth convolution layer, a gate unit, a channel attention and a1×1 convolution layer;
The gate unit (Gating) operation first divides the input X into two sub-features X 1,X2 along the channel dimension and multiplies them, which can be expressed as:
Wherein, Representing an element-wise multiplication operation, gating (X) represents the output of the gate unit;
The channel attention (SCA) after the gate unit can be expressed in a simplified manner as follows:
SCA(X)=X*W*pool(X)
wherein, is the channel product operation; w represents a learnable matrix, pool is a global average pooling operation, X represents the input of a channel attention block, SCA (X) represents the output of the channel attention block;
3112 A gating feedforward network is constructed, which is sequentially composed of a normalization layer, 1 multiplied by 1 convolution, a gate unit and 1 multiplied by 1 convolution;
312 Constructing a cross-convergence fusion self-attention sub-module for capturing multi-directional cross-information fusion unregistered images:
3121 Generating a Q (query), K (key value), V (value) matrix using three 1X 1 convolutions, and applying a softmax function to obtain corresponding attention weights, for the two-branch parallel stereo cross fusion sub-module output feature map X HSI,XMSI∈RH×W×C, wherein H, W, C is height, width, and channel number, respectively; taking the pixels of the cross shape in the row and column directions, wherein the total of the pixels is (H+W-1), and the vector size is (H+W-1) multiplied by C; the cross-correlation of the two modes can be expressed as:
Wherein, Channel-dimensional eigenvectors of query matrix Q HSI representing hyperspectral branches at position u,/>Channel-dimensional eigenvectors of query matrix Q MSI representing multispectral branches at position u,/>The i-th element on the channel dimension eigenvector of position u of key-value matrix K HSI representing hyperspectral branches,/>The key value matrix K MSI representing the multispectral branches has the ith element on the channel dimension eigenvector of position u, i e { 1..h+w-1 }, H, W, C being the height, width and channel number of the eigenvector, respectively, T representing the matrix transpose operation, a HSI→MSI、AMSI→HSI representing the attention response graphs on the hyperspectral branches and the multispectral branches, respectively;
3122 Re-weighting the obtained attention map A and the value matrix V through the aggregation operation of the features, and finally capturing the remote context information in the horizontal and vertical directions at the position u in the interactive features through a residual connection;
32 Constructing a generator network for cascading multispectral and hyperspectral features of multiple scales, and generating a predicted abundance diagram corresponding to each pixel position and each object, wherein the generator network comprises a shallow coding submodule and a guided filter up-sampling submodule:
321 The shallow coding module is constructed, and the hyperspectral branch and the multispectral branch are respectively formed by stacking two convolution layers with the convolution kernel size of 3 multiplied by 3, and each convolution layer is connected with a ReLU layer;
322 A sampling sub-module on the guide filter is constructed, the module is divided into three phases, and the hyperspectral characteristic diagrams of the three phases are sequentially up-sampled to the following phases through a bicubic interpolation algorithm H×w, wherein H, W is the length and width of the multispectral guide feature map, respectively;
The up-sampled hyperspectral image is guided and filtered band by band, and the pixel information of the low spatial resolution image, the relative coordinate information of the high spatial resolution image and the multispectral image features for guiding are fused through a multilayer perceptron (MLP), which can be expressed as:
Where k is the hyperspectral band index, F G (. Cndot.) denotes the guided filtering algorithm, M g is the guided multispectral image, Is the generated abundance map of the ith stage, each band of which can obtain enhanced spatial detail information from M g,/>A feature map representing upsampling;
33 A physical guidance reconstruction module is constructed, spectral data at a pixel position i is constructed by using a spectral Library and an abundance diagram, ASTER SPECTRAL Library Version 2 is selected as a basic spectral Library, and the basic spectral Library is composed of seven types of object spectrums including artificial materials, merle, minerals, rocks, soil, plants and water bodies, and 2443 ground object spectrums are obtained; the final spectral intensity s l can be expressed as:
Wherein, Representing a spectral library of object N g, a l,i representing abundance values at pixel location i and object number i, and t representing a quantitatively corrected atmospheric absorption factor.
34 A discriminator network is constructed which consists of a spatial discriminator and a spectral discriminator:
341 A) constructing a spatial discriminator network, which is sequentially composed of two convolution blocks of a convolution layer stack with a convolution kernel size of 3×3, a step size of 1 and a filling of 1, three convolution blocks of a convolution layer stack with a convolution kernel size of 4×4, a step size of 2 and a filling of 1, three convolution blocks of a convolution layer stack with a convolution kernel size of 4×4, a step size of 1 and a filling of 1, two convolution blocks of a convolution layer stack with a convolution kernel size of 3×3, a step size of 1 and a filling of 1, stacking batch normalization operation and Leakey ReLU operation after each convolution block, and mapping output into probability by final Sigmoid operation for judging true or false;
342 A spectrum discriminator is constructed, which is formed by sequentially stacking five Linear full-connection layers, each full-connection layer is connected with Leakey ReLU operation stack, then a Dense Block structure is formed by sequentially forming an average self-adaptive pooling layer, a1 multiplied by 1 convolution layer, a Leakey ReLU layer and a3 multiplied by 3 convolution layer, and the final Sigmoid operation maps the output into probability for discriminating true and false;
Further, the training of the unregistered physical guideline generating type contrast hyperspectral super resolution model includes the steps of:
41 Inputting the preprocessed simulated hyperspectral image and the multispectral image into a dual-branched stackable stereo cross fusion module:
411 Hyperspectral branching and multispectral branching respectively execute a stereo cross fusion sub-module:
4111 For hyperspectral branches, inputting a hyperspectral image with low spatial resolution, performing 3×3 convolution operation once, sequentially passing through three multi-head translation convolution blocks stacked, sequentially performing normalization operation, 1×1 convolution operation, 3×3 depth convolution operation, gate unit operation, channel attention and 1×1 convolution operation, and splicing the multi-head translation convolution blocks at each stage and the initial input at the stage together through residual connection to serve as the output of a gate-controlled feedforward network;
finally, a gating feedforward network is executed, and normalization operation, 1 multiplied by 1 convolution operation, gate unit and 1 multiplied by 1 convolution operation are sequentially executed to obtain the output of the hyperspectral branch;
4112 For multispectral branches, inputting a multispectral image with high spatial resolution, and performing 3×3 convolution operation once, and sequentially performing normalization operation, 1×1 convolution operation, 3×3 depth convolution operation, gate unit operation, channel attention and 1×1 convolution operation through three multi-head translation convolution blocks stacked in sequence, wherein the multispectral branches are the same as the multispectral branches; the multi-head translation convolution block of each stage is connected with the initial input of the stage through residual errors and spliced together to serve as the output of the gate-controlled feedforward network;
finally, a gating feedforward network is executed, and normalization operation, 1 multiplied by 1 convolution operation, gate unit and 1 multiplied by 1 convolution operation are sequentially executed to obtain the output of multispectral branches;
412 After executing the multi-headed translational convolution block at each stage of each branch, individually executing a cross-convergence fusion self-attention sub-module for weight sharing and feature fusion:
4121 Taking the complementary characteristics of the paired multispectral and hyperspectral generated by the multi-head translation convolution block as input, executing three 1×1 convolutions to generate Q (query), K (key value) and V (value) matrixes, calculating the cross correlation between each pair of query matrixes Q and key matrixes K, and applying a softmax function to obtain corresponding attention weights, wherein the method can be expressed as follows:
Wherein, Channel-dimensional eigenvectors of query matrix Q HSI representing hyperspectral branches at position u,/>Channel-dimensional eigenvectors of query matrix Q MSI representing multispectral branches at position u,/>The i-th element on the channel dimension eigenvector of position u of key-value matrix K HSI representing hyperspectral branches,/>The key matrix representing the multispectral branch is at the ith element of the channel dimensional feature vector at position u, i e { 1..H+W-1 }, H, W, C are the height, width and number of channels of the feature map, respectively, T represents the matrix transpose operation, a HSI→MSI、AMSI→HSI represents the attention response map on the hyperspectral and multispectral branches, respectively;
4122 Re-weighting the attention response graphs on the hyperspectral branches and the multispectral branches with a value matrix V, and then executing a residual error connection addition operation to obtain paired output characteristic graphs which are used as the input of the multi-head translation convolution block in the next stage;
42 After the stereo cross fusion sub-module is executed, the extracted deep features are input into a generator network to generate a predicted abundance diagram corresponding to each pixel position and each object:
421 Executing a shallow encoding module): respectively executing two convolution layers with convolution kernel size of 3×3 on the input of the hyperspectral branch and the multispectral branch, and obtaining edge and contour information to generate potential codes after one-time ReLU nonlinear activation operation after each convolution;
422 A pilot filter up-sampling sub-module for performing three stages, and pilot filtering is performed on the up-sampled hyperspectral image band by band, and outputs of the stages are as follows:
Performing once bicubic interpolation algorithm up-sampling to hyperspectral branch input feature map of first stage Size H, W is the length and width of the multispectral guide feature map, respectively; fusing the up-sampled hyperspectral features and multispectral features through a multilayer perceptron to obtain output of a first stage;
Performing sequential bicubic interpolation algorithm upsampling to the output of the first stage The size is fused through multispectral characteristics, and output of a second stage is obtained;
performing sequential bicubic interpolation algorithm up-sampling to H multiplied by W on the output of the second stage, and fusing through multispectral features to obtain the output of the third stage;
Finally, performing one-time jump connection, and adding the hyperspectral branch input feature map of the first stage and the output of the third stage to obtain an output abundance map of the generator;
43 Executing a physical guideline reconstruction module, constructing spectral data at pixel location i using the spectral library and the abundance map, the recovered spectral intensity s l can be expressed as:
Wherein, A spectral library representing the object N g, a l,i representing the abundance values at the pixel position i and at the object number i, t representing the quantized corrected atmospheric absorption factor;
44 A discriminator network for estimating the output quality of the generator, which is divided into a spatial discriminator and a spectral discriminator, in particular:
441 A spatial discriminator network is implemented, wherein the spatial discriminator network layers output as follows:
Performing convolution operation with the convolution kernel size of 3 multiplied by 3, the step length of 1 and the filling of 1 twice, and performing batch normalization operation and Leakey ReLU operation once to obtain a first output;
Performing convolution operation with the convolution kernel size of 4 multiplied by 4, the step length of 2 and the filling of 1 twice, and performing batch normalization operation and Leakey ReLU operation once to obtain a second output;
performing convolution operation with the convolution kernel size of 4 multiplied by 4, the step length of 1 and the filling of 1 twice, and performing batch normalization operation and Leakey ReLU operation once to obtain a third output;
Finally, sigmoid operation is executed to estimate the true probability of the output image;
442 A spectrum discriminator network is executed, specifically five layers of Linear full connection layers are sequentially executed, each full connection layer is connected with Leakey ReLU operation, and the final Sigmoid operation maps output into probability for discriminating true and false;
45 Using content loss L content, spatial gradient loss Spectral gradient loss/>And resistance lossAs a loss function of the network model:
The content loss constrains the generated image at the pixel level, ensuring that the generated image is consistent with the overall texture and tone of the reference image, the L content loss is expressed as follows:
Gen and Ref are respectively generated images and reference images, H, W, B is the height, width and wave band number of the images, and i, j and k represent the position coordinates of the three-dimensional cube image in a space rectangular coordinate system;
Spatial gradient loss function A spatial structure for constraining a generated image expressed as:
Wherein, v x and v y are respectively a horizontal gradient extraction operator and a vertical gradient extraction operator, gen and Ref are respectively a generated image and a reference image, H, W, B is the height, width and band number of the image, and i, j and k represent the position coordinates of the three-dimensional cube image in a space rectangular coordinate system;
Spectral gradient loss function For ensuring spectral consistency between the generated image and the reference image expressed as:
wherein, z is a spectrum extraction gradient operator, gen and Ref are respectively generated images and reference images, H, W, B is the height, width and band number of the images, and i, j and k represent the position coordinates of the three-dimensional cube image in a space rectangular coordinate system;
the antagonism loss function is used to constrain the output of the generator, expressed as:
Wherein, The probability that the generated image is judged to be a real image by the discriminator, theta is the parameter of the discriminator network, and N is the sample number used for updating the model parameter in one training;
the overall loss function consists of a weighted sum of four loss functions:
where α and β are weight parameters for controlling the spatial and spectral gradients of the weights, set to 1 in the present invention;
46 Repeating the training of the discriminator and the generator alternately, and completing the training when the discriminator cannot distinguish true from false, namely the discrimination probability is 0.5, so as to obtain a fully trained network model;
5) Acquiring real hyperspectral and multispectral remote sensing image data to be fused into pairs which are not strictly registered;
6) And inputting the true hyperspectral and multispectral remote sensing images which are to be fused and are not in strict registration into a reconstruction model for fusion treatment, so as to obtain the reconstructed high-resolution hyperspectral enhanced image.
Advantageous effects
The invention relates to an anti-hyperspectral super-resolution method based on unregistered physical guidance generation, which fully improves the defects of the existing fusion method. Aiming at the problem that the existing method requires strict registration of a low-spatial resolution hyperspectral image and a high-spatial resolution hyperspectral image, the simulated stereoscopic vision structure realizes geometric registration of different modes, and the characteristics among different modes of information are learned and matched through parameter sharing, so that complex characteristic matching operation is effectively avoided; aiming at the problem that complementary information of different modes cannot be fully integrated, a cross convergence fusion self-attention module for capturing a multi-direction cross information fusion unregistered image is constructed, complementary attention information of different modes is learned, and context information in horizontal and vertical directions is collected, so that pixels with different distances have the same expression opportunity, and fusion performance is improved; aiming at the problem of poor stability caused by lack of necessary physical information in the existing super-resolution reconstruction method, an imaging mechanism and spectral abundance information are introduced, so that the interpretability and generalization capability of the model are enhanced; meanwhile, a generator and a discriminator network are introduced, so that the visual fidelity of the reconstruction result is improved.
Drawings
FIG. 1 is a process sequence diagram of the present invention;
FIG. 2 is a flow chart for synthesizing different resolution hyperspectral multispectral datasets in accordance with the present invention;
FIG. 3 is an overall block diagram of the antagonistic hyperspectral super-resolution method according to the present invention;
FIG. 4 is a detailed block diagram of the antagonistic hyperspectral super resolution method network according to the present invention;
FIG. 5 is a diagram of a cross-convergence fusion self-attention submodule architecture in accordance with the present invention;
Detailed Description
For a further understanding and appreciation of the structural features and advantages achieved by the present invention, the following description is provided in connection with the accompanying drawings, which are presently preferred embodiments and are incorporated in the accompanying drawings, in which:
as shown in fig. 1, the method for generating the anti-hyperspectral super-resolution based on the unregistered physical instruction according to the present invention is characterized by comprising the following steps:
the first step, hyperspectral and multispectral data sets with different resolutions are synthesized, and the specific steps are as follows:
(1) Carrying out Gaussian blur kernel filtering with window size of 5 multiplied by 5, 0 mean value and standard deviation of 2 and four bilinear interpolation downsampling on an original reference hyperspectral image to obtain a hyperspectral image with low spatial resolution for training;
(2) As shown in fig. 2, a high resolution multispectral image is generated using the spectral response function of the corresponding multispectral satellite for the original reference hyperspectral image;
(3) Respectively adding Gaussian white noise into the high-resolution multispectral image and the low-resolution hyperspectral image;
(4) Adopting an original hyperspectral image as a trained tag image;
(5) A 128×128 block is cut out of a central region of the original reference data as a test set, the remaining region is used for training, and the training region is randomly cut into a size of 128×128 for training in each iteration;
(6) The portion of the randomly partitioned training set that overlaps the test set is filled with 0 pixels.
(7) The hyperspectral and multispectral misregistration states were simulated by translating the training set and the test set by 1 pixel in the horizontal, vertical, and diagonal directions, as shown in fig. 2.
Secondly, an unregistered physical guidance generation type anti-hyperspectral super-resolution model is constructed, and as shown in fig. 3, the model consists of a stackable stereo cross attention module, a generator module, a physical guidance reconstruction module and a discriminator module, and the specific steps for constructing the modules are as follows:
(1) As shown in fig. 4, a stackable stereo cross-attention module is constructed to improve reconstruction quality using complementary information between non-tightly registered hyperspectral and multispectral images, the module being composed of a dual-branch parallel weight sharing network, including a stereo cross-fusion sub-module and a cross-convergence fusion self-attention sub-module; (1-1) constructing a stereo cross fusion sub-module, wherein the sub-module consists of a multi-head translational convolution block (MTConv) and a gating feedforward network (FFN), and the two sub-blocks are connected by adopting residual errors, and can be expressed as follows:
XMTConv=MTConv(LN(X))+X
XFFN=FFN(LN(XMTConv))+XMTConv
Wherein MTConv represents a multi-head translational convolution block, X represents an input of the multi-head translational convolution block, X MTConv represents an output of the multi-head translational convolution block, FFN represents a gated feed-forward network, LN represents layer normalization Layer Normalization, and X FFN represents an output of the gated feed-forward network;
(1-1-1) constructing a multi-head translational convolution block (MTConv) consisting of, in order, a normalization layer, a1 x1 convolution layer, a3 x 3 depth convolution layer, a gate unit, a channel attention, and a1 x1 convolution layer;
The gate unit (Gating) operation first divides the input X into two sub-features X 1,X2 along the channel dimension and multiplies them, which can be expressed as:
Wherein, Representing an element-wise multiplication operation, gating (X) represents the output of the gate unit;
The channel attention (SCA) after the gate unit can be expressed in a simplified manner as follows:
SCA(X)=X*W*pool(X)
wherein, is the channel product operation; w represents a learnable matrix, pool is a global average pooling operation, X represents the input of a channel attention block, SCA (X) represents the output of the channel attention block;
(1-1-2) constructing a gating feedforward network which is sequentially composed of a normalization layer, a1×1 convolution, a gate unit and a1×1 convolution;
(1-2) as shown in fig. 5, a cross-convergence fusion self-attention sub-module for capturing a multi-directional cross-information fusion unregistered image is constructed:
(1-2-1) generating a Q (query), K (key value), V (value) matrix using three 1X 1 convolutions, and applying a softmax function to obtain corresponding attention weights, for an output signature X HSI,XMSI∈RH×W×C of a two-branch parallel stereo cross-fusion sub-module, wherein H, W, C is height, width, and channel number, respectively; taking the pixels of the cross shape in the row and column directions, wherein the total of the pixels is (H+W-1), and the vector size is (H+W-1) multiplied by C; the cross-correlation of the two modes can be expressed as:
Wherein, Channel-dimensional eigenvectors of query matrix Q HSI representing hyperspectral branches at position u,/>Channel-dimensional eigenvectors of query matrix Q MSI representing multispectral branches at position u,/>The i-th element on the channel dimension eigenvector of position u of key-value matrix K HSI representing hyperspectral branches,/>The key value matrix K MSI representing the multispectral branches has the ith element on the channel dimension eigenvector of position u, i e { 1..h+w-1 }, H, W, C being the height, width and channel number of the eigenvector, respectively, T representing the matrix transpose operation, a HSI→MSI、AMSI→HSI representing the attention response graphs on the hyperspectral branches and the multispectral branches, respectively;
(1-2-2) re-weighting the obtained attention map a and the value matrix V by means of an aggregation operation of the features, and finally capturing the horizontal and vertical remote context information at the position u in the interactive feature through a residual connection;
(2) As shown in fig. 4, a generator network of multi-spectral and hyperspectral features cascading multiple scales is constructed, and a predicted abundance map corresponding to each pixel position and each object is generated, which is composed of a shallow coding submodule and a guided filter up-sampling submodule:
(2-1) constructing a shallow coding module, wherein the hyperspectral branch and the multispectral branch are respectively formed by stacking two convolution layers with the convolution kernel size of 3 multiplied by 3, and each convolution layer is connected with one Relu layer;
(2-2) constructing a guided filter up-sampling sub-module which is divided into three stages, and sequentially up-sampling the hyperspectral characteristic diagrams of the three stages to the three stages by a bicubic interpolation algorithm H×w, wherein H, W is the length and width of the multispectral guide feature map, respectively;
The up-sampled hyperspectral image is guided and filtered band by band, and the pixel information of the low spatial resolution image, the relative coordinate information of the high spatial resolution image and the multispectral image features for guiding are fused through a multilayer perceptron (MLP), which can be expressed as:
Where k is the hyperspectral band index, F G (. Cndot.) denotes the guided filtering algorithm, M g is the guided multispectral image, Is the generated abundance map of the ith stage, each band of which can obtain enhanced spatial detail information from M g,/>A feature map representing upsampling;
(3) As shown in fig. 4, a physical guidance reconstruction module is constructed, spectral data at a pixel position i is constructed by using a spectral Library and an abundance map, ASTER SPECTRAL Library Version 2 is selected as a basic spectral Library, and the basic spectral Library is composed of seven types of object spectra including artificial materials, merle, minerals, rocks, soil, plants and water, and 2443 ground object spectra in total; the final spectral intensity s l can be expressed as:
Wherein, A spectral library representing the object N g, a l,i representing the abundance values at the pixel position i and at the object number i, t representing the quantized corrected atmospheric absorption factor;
(4) As shown in fig. 4, a discriminator network is constructed, which consists of a spatial discriminator and a spectral discriminator:
(4-1) constructing a spatial discriminator network which is sequentially composed of two convolution blocks of a convolution layer stack with a convolution kernel size of 3×3, a step size of 1 and a filling of 1, three convolution blocks of a convolution layer stack with a convolution kernel size of 4×4, a step size of 2 and a filling of 1, three convolution blocks of a convolution layer stack with a convolution kernel size of 4×4, a step size of 1 and a filling of 1, two convolution blocks of a convolution layer stack with a convolution kernel size of 3×3, a step size of 1 and a filling of 1, stacking batch normalization operations and Leakey ReLU operations after each convolution block, and mapping an output into a probability by a final Sigmoid operation for discriminating true or false;
(4-2) constructing a spectrum discriminator which is formed by sequentially stacking five Linear full-connection layers, each full-connection layer is followed by Leakey ReLU operations, then a Dense Block structure which is sequentially formed by an average self-adaptive pooling layer, a1×1 convolution layer, a Leakey ReLU layer and a3×3 convolution layer, and finally, mapping the output into probability by Sigmoid operations for discriminating true and false;
Thirdly, training the unregistered physical guidance generation type anti-hyperspectral super-resolution model, taking preprocessed hyperspectral multispectral training data sets with different resolutions as input, fully training a network, and obtaining the trained unregistered physical guidance generation type anti-hyperspectral super-resolution model, wherein the specific steps are as follows:
(1) Inputting the preprocessed simulated hyperspectral image and multispectral image into a double-branch stackable stereo cross fusion module:
(1-1) hyperspectral branching and multispectral branching respectively perform stereo cross fusion sub-modules:
(1-1-1) inputting a low spatial resolution hyperspectral image for a hyperspectral branch, performing 3×3 convolution operation once, sequentially performing normalization operation, 1×1 convolution operation, 3×3 depth convolution operation, gate unit operation, channel attention and 1×1 convolution operation through three stacked multi-head translation convolution blocks, and splicing the multi-head translation convolution blocks at each stage and the initial input at the stage together through residual connection to serve as the output of a gate-controlled feedforward network;
finally, a gating feedforward network is executed, and normalization operation, 1 multiplied by 1 convolution operation, gate unit and 1 multiplied by 1 convolution operation are sequentially executed to obtain the output of the hyperspectral branch;
(1-1-2) for multispectral branches, inputting a high spatial resolution multispectral image, performing a3×3 convolution operation once, and sequentially performing a normalization operation, a1×1 convolution operation, a3×3 depth convolution operation, a gate unit operation, a channel attention and a1×1 convolution operation through three multi-headed translation convolution blocks stacked in sequence, as with the hyperspectral branches; the multi-head translation convolution block of each stage is connected with the initial input of the stage through residual errors and spliced together to serve as the output of the gate-controlled feedforward network;
finally, a gating feedforward network is executed, and normalization operation, 1 multiplied by 1 convolution operation, gate unit and 1 multiplied by 1 convolution operation are sequentially executed to obtain the output of multispectral branches;
(1-2) after executing the multi-headed translational convolution block at each stage of each branch, separately executing a cross-convergence fusion self-attention sub-module for weight sharing and feature fusion:
(1-2-1) taking the paired multispectral and hyperspectral complementary features generated by the multi-headed translational convolution block as input, performing three 1×1 convolutions to generate Q (query), K (key value) and V (value) matrices, calculating the cross-correlation between each pair of query matrix Q and key matrix K, and applying a Softmax function to obtain corresponding attention weights, which can be expressed as follows:
Wherein, Channel-dimensional eigenvectors of query matrix Q HSI representing hyperspectral branches at position u,/>Channel-dimensional eigenvectors of query matrix Q MSI representing multispectral branches at position u,/>The i-th element on the channel dimension eigenvector of position u of key-value matrix K HSI representing hyperspectral branches,/>The key matrix representing the multispectral branch is at the ith element of the channel dimensional feature vector at position u, i e { 1..H+W-1 }, H, W, C are the height, width and number of channels of the feature map, respectively, T represents the matrix transpose operation, a HSI→MSI、AMSI→HSI represents the attention response map on the hyperspectral and multispectral branches, respectively;
(1-2-2) re-weighting the attention response graphs on the hyperspectral branches and the multispectral branches with a value matrix V, and performing a residual connection addition operation again to obtain paired output characteristic graphs which are used as the input of a multi-head translation convolution block in the next stage;
(2) After the stereo cross fusion sub-module is executed, the extracted deep features are input into a generator network to generate predicted abundance diagrams corresponding to each pixel position and each object, and the specific process is as follows:
(2-1) executing a shallow encoding module: respectively executing two convolution layers with convolution kernel size of 3×3 on the input of the hyperspectral branch and the multispectral branch, and obtaining edge and contour information to generate potential codes after one-time ReLU nonlinear activation operation after each convolution;
(2-2) performing a three-stage guided filter upsampling submodule for performing guided filtering on the upsampled hyperspectral image band by band, the outputs of each stage being as follows:
Performing once bicubic interpolation algorithm up-sampling to hyperspectral branch input feature map of first stage Size H, W is the length and width of the multispectral guide feature map, respectively; fusing the up-sampled hyperspectral features and multispectral features through a multilayer perceptron to obtain output of a first stage;
Performing sequential bicubic interpolation algorithm upsampling to the output of the first stage The size is fused through multispectral characteristics, and output of a second stage is obtained;
performing sequential bicubic interpolation algorithm up-sampling to H multiplied by W on the output of the second stage, and fusing through multispectral features to obtain the output of the third stage;
Finally, performing one-time jump connection, and adding the hyperspectral branch input feature map of the first stage and the output of the third stage to obtain an output abundance map of the generator;
(3) Executing the physical guideline reconstruction module to construct spectral data at pixel location i using the spectral library and the abundance map, the recovered spectral intensity s l can be expressed as:
Wherein, A spectral library representing the object N g, a l,i representing the abundance values at the pixel position i and at the object number i, t representing the quantized corrected atmospheric absorption factor;
(4) A discriminator network is implemented for estimating the output quality of the generator, which is divided into a spatial discriminator and a spectral discriminator, the following steps are performed:
(4-1) executing a spatial discriminator network, wherein the spatial discriminator network layers output as follows:
Performing convolution operation with the convolution kernel size of 3 multiplied by 3, the step length of 1 and the filling of 1 twice, and performing batch normalization operation and Leakey ReLU operation once to obtain a first output;
Performing convolution operation with the convolution kernel size of 4 multiplied by 4, the step length of 2 and the filling of 1 twice, and performing batch normalization operation and Leakey ReLU operation once to obtain a second output;
performing convolution operation with the convolution kernel size of 4 multiplied by 4, the step length of 1 and the filling of 1 twice, and performing batch normalization operation and Leakey ReLU operation once to obtain a third output;
Finally, sigmoid operation is executed to estimate the true probability of the output image;
(4-2) executing a spectrum discriminator network, specifically, sequentially executing five layers of Linear full connection layers, each full connection layer being followed by Leakey ReLU operations, and mapping the output into probability by the final Sigmoid operation for discriminating true and false;
(5) Using content loss L content, spatial gradient loss Spectral gradient loss/>And resistance lossAs a loss function of the network model:
The content loss constrains the generated image at the pixel level, ensuring that the generated image is consistent with the overall texture and tone of the reference image, the L content loss is expressed as follows:
Gen and Ref are respectively generated images and reference images, H, W, B is the height, width and wave band number of the images, and i, j and k represent the position coordinates of the three-dimensional cube image in a space rectangular coordinate system;
Spatial gradient loss function A spatial structure for constraining a generated image expressed as:
Wherein, v x and v y are respectively a horizontal gradient extraction operator and a vertical gradient extraction operator, gen and Ref are respectively a generated image and a reference image, H, W, B is the height, width and band number of the image, and i, j and k represent the position coordinates of the three-dimensional cube image in a space rectangular coordinate system;
Spectral gradient loss function For ensuring spectral consistency between the generated image and the reference image expressed as:
wherein, z is a spectrum extraction gradient operator, gen and Ref are respectively generated images and reference images, H, W, B is the height, width and band number of the images, and i, j and k represent the position coordinates of the three-dimensional cube image in a space rectangular coordinate system;
the antagonism loss function is used to constrain the output of the generator, expressed as:
Wherein, The probability that the generated image is judged to be a real image by the discriminator, theta is the parameter of the discriminator network, and N is the sample number used for updating the model parameter in one training;
the overall loss function consists of a weighted sum of four loss functions:
where α and β are weight parameters for controlling the spatial and spectral gradients of the weights, set to 1 in the present invention;
(6) And repeatedly and alternately executing the training of the discriminator and the generator, and completing the training when the discriminator cannot distinguish true from false, namely the discrimination probability is 0.5, so as to obtain a fully trained network model.
And fourthly, acquiring real hyperspectral and multispectral remote sensing image data to be fused into a pair which is not strictly registered.
And fifthly, inputting the true hyperspectral and multispectral remote sensing images which are not in strict registration to be fused into a reconstruction model for fusion treatment, and obtaining the reconstructed high-resolution hyperspectral enhanced image.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (4)

1. An anti-hyperspectral super-resolution method based on unregistered physical instruction generation, comprising the steps of:
11 Generating hyperspectral and multispectral datasets: training and testing data sets were constructed using Wald protocol. Using the original hyperspectral image as tag data, and guiding space downsampling by utilizing edge information and spectral distribution in the hyperspectral image in a training sample; denoising before spectrum downsampling, so as to generate hyperspectral and multispectral image data sets;
12 Constructing an anti-hyperspectral super-resolution model based on unregistered physical instruction, wherein the model consists of a stackable stereo cross attention module, a generator module, a physical guide reconstruction module and a discriminator module;
13 Training against hyperspectral super-resolution models based on unregistered physical guideline generation: taking the preprocessed hyperspectral multispectral training data sets with different resolutions as input, and fully training the network to obtain a trained unregistered physical instruction generation type antagonistic hyperspectral super-resolution model;
14 Hyperspectral and multispectral remote sensing image hyperspectral reconstruction result acquisition: and inputting the acquired pair-wise real hyperspectral and multispectral remote sensing images to be processed into a trained unregistered physical instruction generation type anti-hyperspectral super-resolution model to obtain a reconstructed high-resolution hyperspectral remote sensing image.
2. The unregistered physical guideline generating type anti-hyperspectral super resolution model according to claim 1, wherein the acquiring of synthetic hyperspectral multispectral datasets of different resolutions comprises the steps of:
21 Carrying out Gaussian blur kernel filtering with window size of 5 multiplied by 5, 0 mean value and standard deviation of 2 and four bilinear interpolation downsampling on the original reference hyperspectral image to obtain a hyperspectral image with low spatial resolution for training;
22 Generating a high resolution multispectral image using a spectral response function of the corresponding multispectral satellite for the original reference hyperspectral image;
23 Respectively adding Gaussian white noise into the high-resolution multispectral image and the low-resolution hyperspectral image;
24 Using the original hyperspectral image as a trained label image;
25 A 128 x 128 block is cut out of the central region of the original reference data as a test set, the remaining region is used for training, and the training region is randomly cut to a size of 128 x 128 for training in each iteration;
26 For the overlapping part of the training set and the test set which are randomly divided, filling with 0 pixels;
27 Hyperspectral and multispectral misregistration states were simulated by translating 1 pixel in the horizontal, vertical, and diagonal directions of the training set and the test set.
3. The unregistered physical guideline-generated anti-hyperspectral super resolution model according to claim 1, wherein the building of the unregistered physical guideline-generated anti-hyperspectral super resolution model includes the steps of:
31 Constructing a stackable stereo cross attention module for improving reconstruction quality by using complementary information between non-strictly registered hyperspectral and multispectral images, wherein the module is composed of a double-branch parallel weight sharing network and comprises a stereo cross fusion sub-module and a cross convergence fusion self attention sub-module;
311 A stereo cross fusion sub-module is constructed, the sub-module is composed of a multi-head translation convolution block (MTConv) and a gating feedforward network (FFN), and the two sub-blocks are connected by residual errors, which can be expressed as follows:
XMTConv=MTConv(LN(X))+X
XFFN=FFN(LN(XMTConv))+XMTConv
Wherein MTConv represents a multi-head translational convolution block, X represents an input of the multi-head translational convolution block, X MTConv represents an output of the multi-head translational convolution block, FFN represents a gated feed-forward network, LN represents layer normalization Layer Normalization, and X FFN represents an output of the gated feed-forward network;
3111 Constructing a multi-head translational convolution block (MTConv) which is sequentially composed of a normalization layer, a1×1 convolution layer, a3×3 depth convolution layer, a gate unit, a channel attention and a1×1 convolution layer;
The gate unit (Gating) operation first divides the input X into two sub-features X 1,X2 along the channel dimension and multiplies them, which can be expressed as:
Wherein, Representing an element-wise multiplication operation, gating (X) represents the output of the gate unit;
The channel attention (SCA) after the gate unit can be expressed in a simplified manner as follows:
SCA(X)=X*W*pool(X)
wherein, is the channel product operation; w represents a learnable matrix, pool is a global average pooling operation, X represents the input of a channel attention block, SCA (X) represents the output of the channel attention block;
3112 A gating feedforward network is constructed, which is sequentially composed of a normalization layer, 1 multiplied by 1 convolution, a gate unit and 1 multiplied by 1 convolution;
312 Constructing a cross-convergence fusion self-attention sub-module for capturing multi-directional cross-information fusion unregistered images:
3121 Generating a Q (query), K (key value), V (value) matrix using three 1X 1 convolutions, and applying a softmax function to obtain corresponding attention weights, for the two-branch parallel stereo cross fusion sub-module output feature map X HSI,XMSI∈RH ×W×C, wherein H, W, C is height, width, and channel number, respectively; taking the pixels of the cross shape in the row and column directions, wherein the total of the pixels is (H+W-1), and the vector size is (H+W-1) multiplied by C; the cross-correlation of the two modes can be expressed as:
Wherein, Channel-dimensional eigenvectors of query matrix Q HSI representing hyperspectral branches at position u,/>Channel-dimensional eigenvectors of query matrix Q MSI representing multispectral branches at position u,/>The i-th element on the channel dimension eigenvector of position u of key-value matrix K HSI representing hyperspectral branches,/>The key value matrix K MSI representing the multispectral branches has the ith element on the channel dimension eigenvector of position u, i e { 1..h+w-1 }, H, W, C being the height, width and channel number of the eigenvector, respectively, T representing the matrix transpose operation, a HSI→MSI、AMSI→HSI representing the attention response graphs on the hyperspectral branches and the multispectral branches, respectively;
3122 Re-weighting the obtained attention map A and the value matrix V through the aggregation operation of the features, and finally capturing the remote context information in the horizontal and vertical directions at the position u in the interactive features through a residual connection;
32 Constructing a generator network for cascading multispectral and hyperspectral features of multiple scales, and generating a predicted abundance diagram corresponding to each pixel position and each object, wherein the generator network comprises a shallow coding submodule and a guided filter up-sampling submodule:
321 The shallow coding module is constructed, and the hyperspectral branch and the multispectral branch are respectively formed by stacking two convolution layers with the convolution kernel size of 3 multiplied by 3, and each convolution layer is connected with a ReLU layer;
322 A sampling sub-module on the guide filter is constructed, the module is divided into three phases, and the hyperspectral characteristic diagrams of the three phases are sequentially up-sampled to the following phases through a bicubic interpolation algorithm H×w, wherein H, W is the length and width of the multispectral guide feature map, respectively;
The up-sampled hyperspectral image is guided and filtered band by band, and the pixel information of the low spatial resolution image, the relative coordinate information of the high spatial resolution image and the multispectral image features for guiding are fused through a multilayer perceptron (MLP), which can be expressed as:
Where k is the hyperspectral band index, F G (. Cndot.) is the guided filtering algorithm, M g is the guided multispectral image, Is the generated abundance map of the ith stage, each band of which can obtain enhanced spatial detail information from M g,/>A feature map representing upsampling;
33 A physical guidance reconstruction module is constructed, spectral data at a pixel position i is constructed by using a spectral Library and an abundance diagram, ASTER SPECTRAL Library Version 2 is selected as a basic spectral Library, and the basic spectral Library is composed of seven types of object spectrums including artificial materials, merle, minerals, rocks, soil, plants and water bodies, and 2443 ground object spectrums are obtained; the final spectral intensity s l can be expressed as:
Wherein, A spectral library representing the object N g, a l,i representing the abundance values at the pixel position i and at the object number i, t representing the quantized corrected atmospheric absorption factor;
34 A discriminator network is constructed which consists of a spatial discriminator and a spectral discriminator:
341 A) constructing a spatial discriminator network, which is sequentially composed of two convolution blocks of a convolution layer stack with a convolution kernel size of 3×3, a step size of 1 and a filling of 1, three convolution blocks of a convolution layer stack with a convolution kernel size of 4×4, a step size of 2 and a filling of 1, three convolution blocks of a convolution layer stack with a convolution kernel size of 4×4, a step size of 1 and a filling of 1, two convolution blocks of a convolution layer stack with a convolution kernel size of 3×3, a step size of 1 and a filling of 1, stacking batch normalization operation and Leakey ReLU operation after each convolution block, and mapping output into probability by final Sigmoid operation for judging true or false;
342 A spectrum discriminator is constructed, which is formed by sequentially stacking five Linear full-connection layers, each full-connection layer is connected with Leakey ReLU operation stack, then a Dense Block structure is formed by sequentially forming an average self-adaptive pooling layer, a1×1 convolution layer, a Leakey ReLU layer and a3×3 convolution layer, and the final Sigmoid operation maps the output into probability for discriminating true and false.
4. The unregistered physical guideline-generated anti-hyperspectral super resolution model according to claim 1, wherein the training the unregistered physical guideline-generated anti-hyperspectral super resolution model comprises the steps of:
41 Inputting the preprocessed simulated hyperspectral image and the multispectral image into a dual-branched stackable stereo cross fusion module:
411 Hyperspectral branching and multispectral branching respectively execute a stereo cross fusion sub-module:
4111 For hyperspectral branches, inputting a hyperspectral image with low spatial resolution, performing 3×3 convolution operation once, sequentially passing through three multi-head translation convolution blocks stacked, sequentially performing normalization operation, 1×1 convolution operation, 3×3 depth convolution operation, gate unit operation, channel attention and 1×1 convolution operation, and splicing the multi-head translation convolution blocks at each stage and the initial input at the stage together through residual connection to serve as the output of a gate-controlled feedforward network;
finally, a gating feedforward network is executed, and normalization operation, 1 multiplied by 1 convolution operation, gate unit and 1 multiplied by 1 convolution operation are sequentially executed to obtain the output of the hyperspectral branch;
4112 For multispectral branches, inputting a multispectral image with high spatial resolution, and performing 3×3 convolution operation once, and sequentially performing normalization operation, 1×1 convolution operation, 3×3 depth convolution operation, gate unit operation, channel attention and 1×1 convolution operation through three multi-head translation convolution blocks stacked in sequence, wherein the multispectral branches are the same as the multispectral branches; the multi-head translation convolution block of each stage is connected with the initial input of the stage through residual errors and spliced together to serve as the output of the gate-controlled feedforward network;
finally, a gating feedforward network is executed, and normalization operation, 1 multiplied by 1 convolution operation, gate unit and 1 multiplied by 1 convolution operation are sequentially executed to obtain the output of multispectral branches;
412 After executing the multi-headed translational convolution block at each stage of each branch, individually executing a cross-convergence fusion self-attention sub-module for weight sharing and feature fusion:
4121 Taking the complementary characteristics of the paired multispectral and hyperspectral generated by the multi-head translation convolution block as input, executing three 1×1 convolutions to generate Q (query), K (key value) and V (value) matrixes, calculating the cross correlation between each pair of query matrixes Q and key matrixes K, and applying a Softmax function to obtain corresponding attention weights, wherein the method can be expressed as follows:
Wherein, Channel-dimensional eigenvectors of query matrix Q HSI representing hyperspectral branches at position u,/>Channel-dimensional eigenvectors of query matrix Q MSI representing multispectral branches at position u,/>The i-th element on the channel dimension eigenvector of position u of key-value matrix K HSI representing hyperspectral branches,/>The key matrix representing the multispectral branch is at the ith element of the channel dimensional feature vector at position u, i e { 1..H+W-1 }, H, W, C are the height, width and number of channels of the feature map, respectively, T represents the matrix transpose operation, a HSI→MSI、AMSI→HSI represents the attention response map on the hyperspectral and multispectral branches, respectively;
4122 Re-weighting the attention response graphs on the hyperspectral branches and the multispectral branches with a value matrix V, and then executing a residual error connection addition operation to obtain paired output characteristic graphs which are used as the input of the multi-head translation convolution block in the next stage;
42 After the stereo cross fusion sub-module is executed, the extracted deep features are input into a generator network to generate predicted abundance diagrams corresponding to each pixel position and each object, and the specific process is as follows:
421 Executing a shallow encoding module): respectively executing two convolution layers with convolution kernel size of 3×3 on the input of the hyperspectral branch and the multispectral branch, and obtaining edge and contour information to generate potential codes after one-time ReLU nonlinear activation operation after each convolution;
422 A pilot filter up-sampling sub-module for performing three stages, and pilot filtering is performed on the up-sampled hyperspectral image band by band, and outputs of the stages are as follows:
Performing once bicubic interpolation algorithm up-sampling to hyperspectral branch input feature map of first stage Size H, W is the length and width of the multispectral guide feature map, respectively; fusing the up-sampled hyperspectral features and multispectral features through a multilayer perceptron to obtain output of a first stage;
Performing sequential bicubic interpolation algorithm upsampling to the output of the first stage The size is fused through multispectral characteristics, and output of a second stage is obtained;
performing sequential bicubic interpolation algorithm up-sampling to H multiplied by W on the output of the second stage, and fusing through multispectral features to obtain the output of the third stage;
Finally, performing one-time jump connection, and adding the hyperspectral branch input feature map of the first stage and the output of the third stage to obtain an output abundance map of the generator;
43 Executing a physical guideline reconstruction module, constructing spectral data at pixel location i using the spectral library and the abundance map, the recovered spectral intensity s l can be expressed as:
Wherein, A spectral library representing the object N g, a l,i representing the abundance values at the pixel position i and at the object number i, t representing the quantized corrected atmospheric absorption factor;
44 A discriminator network for estimating the output quality of the generator, which is divided into a spatial discriminator and a spectral discriminator, in particular:
441 A spatial discriminator network is implemented, wherein the spatial discriminator network layers output as follows:
Performing convolution operation with the convolution kernel size of 3 multiplied by 3, the step length of 1 and the filling of 1 twice, and performing batch normalization operation and Leakey ReLU operation once to obtain a first output;
Performing convolution operation with the convolution kernel size of 4 multiplied by 4, the step length of 2 and the filling of 1 twice, and performing batch normalization operation and Leakey ReLU operation once to obtain a second output;
performing convolution operation with the convolution kernel size of 4 multiplied by 4, the step length of 1 and the filling of 1 twice, and performing batch normalization operation and Leakey ReLU operation once to obtain a third output;
Finally, sigmoid operation is executed to estimate the true probability of the output image;
442 A spectrum discriminator network is executed, specifically five layers of Linear full connection layers are sequentially executed, each full connection layer is connected with Leakey ReLU operation, and the final Sigmoid operation maps output into probability for discriminating true and false;
45 Using content loss L content, spatial gradient loss Spectral gradient loss/>And resistance loss/>As a loss function of the network model:
The content loss constrains the generated image at the pixel level, ensuring that the generated image is consistent with the overall texture and tone of the reference image, the L content loss is expressed as follows:
Gen and Ref are respectively generated images and reference images, H, W, B is the height, width and wave band number of the images, and i, j and k represent the position coordinates of the three-dimensional cube image in a space rectangular coordinate system;
Spatial gradient loss function A spatial structure for constraining a generated image expressed as:
Wherein, And/>Respectively a horizontal gradient extraction operator and a vertical gradient extraction operator, wherein Gen and Ref are respectively generated images and reference images, H, W, B is the height, width and wave band number of the images, and i, j and k represent the position coordinates of the three-dimensional cube image in a space rectangular coordinate system;
Spectral gradient loss function For ensuring spectral consistency between the generated image and the reference image expressed as:
Wherein, Is a spectrum extraction gradient operator, gen and Ref are respectively generated images and reference images, H, W, B is the height, width and wave band number of the images, and i, j and k represent the position coordinates of the three-dimensional cube image in a space rectangular coordinate system;
Resistance loss function The output for the constraint generator is expressed as:
Wherein, The probability that the generated image is judged to be a real image by the discriminator, theta is the parameter of the discriminator network, and N is the sample number used for updating the model parameter in one training;
the overall loss function consists of a weighted sum of four loss functions:
where α and β are weight parameters for controlling the spatial and spectral gradients of the weights, set to 1 in the present invention;
46 The training of the discriminator and the generator is repeatedly and alternately executed, and when the discriminator cannot distinguish true from false, namely, the discrimination probability is 0.5, the training is completed, and a fully trained network model is obtained.
CN202410085428.4A 2024-01-20 2024-01-20 Misregistration-based physical instruction generation type hyperspectral super-resolution countermeasure method Pending CN118212127A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410085428.4A CN118212127A (en) 2024-01-20 2024-01-20 Misregistration-based physical instruction generation type hyperspectral super-resolution countermeasure method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410085428.4A CN118212127A (en) 2024-01-20 2024-01-20 Misregistration-based physical instruction generation type hyperspectral super-resolution countermeasure method

Publications (1)

Publication Number Publication Date
CN118212127A true CN118212127A (en) 2024-06-18

Family

ID=91455347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410085428.4A Pending CN118212127A (en) 2024-01-20 2024-01-20 Misregistration-based physical instruction generation type hyperspectral super-resolution countermeasure method

Country Status (1)

Country Link
CN (1) CN118212127A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118411443A (en) * 2024-07-02 2024-07-30 烟台大学 CT image generation method, system, device and medium based on DR image

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118411443A (en) * 2024-07-02 2024-07-30 烟台大学 CT image generation method, system, device and medium based on DR image

Similar Documents

Publication Publication Date Title
CN108537742B (en) Remote sensing image panchromatic sharpening method based on generation countermeasure network
CN110363215B (en) Method for converting SAR image into optical image based on generating type countermeasure network
CN112836773B (en) Hyperspectral image classification method based on global attention residual error network
CN113128134B (en) Mining area ecological environment evolution driving factor weight quantitative analysis method
CN113469094A (en) Multi-mode remote sensing data depth fusion-based earth surface coverage classification method
CN109146787B (en) Real-time reconstruction method of dual-camera spectral imaging system based on interpolation
He et al. DsTer: A dense spectral transformer for remote sensing spectral super-resolution
CN118212127A (en) Misregistration-based physical instruction generation type hyperspectral super-resolution countermeasure method
Long et al. Dual self-attention Swin transformer for hyperspectral image super-resolution
CN116993826A (en) Scene new view generation method based on local space aggregation nerve radiation field
CN116258976A (en) Hierarchical transducer high-resolution remote sensing image semantic segmentation method and system
CN117911830B (en) Global interaction hyperspectral multi-spectral cross-modal fusion method for spectrum fidelity
Wang et al. Local–global feature-aware transformer based residual network for hyperspectral image denoising
CN113570536A (en) Panchromatic and multispectral image real-time fusion method based on CPU and GPU cooperative processing
CN117315169A (en) Live-action three-dimensional model reconstruction method and system based on deep learning multi-view dense matching
CN117314811A (en) SAR-optical image fusion method based on hybrid model
CN114550014A (en) Road segmentation method and computer device
Liang et al. Blind super-resolution of single remotely sensed hyperspectral image
CN117237623B (en) Semantic segmentation method and system for remote sensing image of unmanned aerial vehicle
CN111274936B (en) Multispectral image ground object classification method, system, medium and terminal
CN115760670B (en) Unsupervised hyperspectral fusion method and device based on network implicit priori
CN115909077A (en) Hyperspectral image change detection method based on unsupervised spectrum unmixing neural network
CN115496788A (en) Deep completion method using airspace propagation post-processing module
CN116958800A (en) Remote sensing image change detection method based on hierarchical attention residual unet++
Li et al. Spectral-Learning-Based Transformer Network for the Spectral Super-Resolution of Remote-Sensing Degraded Images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination