CN116091916A - Multi-scale hyperspectral image algorithm and system for reconstructing corresponding RGB images - Google Patents
Multi-scale hyperspectral image algorithm and system for reconstructing corresponding RGB images Download PDFInfo
- Publication number
- CN116091916A CN116091916A CN202211469458.2A CN202211469458A CN116091916A CN 116091916 A CN116091916 A CN 116091916A CN 202211469458 A CN202211469458 A CN 202211469458A CN 116091916 A CN116091916 A CN 116091916A
- Authority
- CN
- China
- Prior art keywords
- image
- processing
- characteristic
- scale
- spectrum
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 24
- 238000012545 processing Methods 0.000 claims abstract description 102
- 238000000034 method Methods 0.000 claims abstract description 57
- 238000001228 spectrum Methods 0.000 claims abstract description 53
- 238000007781 pre-processing Methods 0.000 claims abstract description 18
- 230000003595 spectral effect Effects 0.000 claims description 31
- 230000008569 process Effects 0.000 claims description 30
- 238000010586 diagram Methods 0.000 claims description 21
- 230000006870 function Effects 0.000 claims description 20
- 230000004913 activation Effects 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000012549 training Methods 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 238000000638 solvent extraction Methods 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 description 9
- 238000004590 computer program Methods 0.000 description 7
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000007547 defect Effects 0.000 description 4
- 238000011161 development Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 241000282414 Homo sapiens Species 0.000 description 2
- 244000007853 Sarothamnus scoparius Species 0.000 description 2
- 238000002679 ablation Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/194—Terrestrial scenes using hyperspectral data, i.e. more or other wavelengths than RGB
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A40/00—Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
- Y02A40/10—Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Medical Informatics (AREA)
- Remote Sensing (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a multi-scale hyperspectral image algorithm and a system for reconstructing corresponding hyperspectral images from RGB images, belonging to the field of computer vision hyperspectral images; the algorithm comprises the following steps: processing the original RGB image by a multi-scale processing module, and outputting a feature map Y' i The method comprises the steps of carrying out a first treatment on the surface of the Map Y' i Adding and stacking the multi-scale preprocessing information with the original RGB image in the dimension of a spectrum channel to obtain a characteristic image with the multi-scale preprocessing information; the characteristic image with the multi-scale preprocessing information sequentially passes through 3 space-spectrum converter combined processing modules to perform space-spectrum dimension combined processing to obtain the characteristic image with the space-spectrum characteristic information; will be provided with space-spectral characteristic informationThe characteristic images are processed through a spectrum dimension transducer to obtain characteristic images with spectrum dimension characteristic information fully extracted through the transducer; and (3) adjusting the channel dimension to be output by a target 31 channel through a 3X 3 convolution layer to finally obtain the hyperspectral image.
Description
Technical Field
The invention belongs to the field of computer vision hyperspectral image reconstruction, and particularly relates to a hyperspectral image algorithm and a hyperspectral image system corresponding to multi-scale reconstruction of RGB images.
Background
Visual information is very important to human beings, and at least more than eighty percent of external information is obtained visually. The light rays carry the spectral information peculiar to the object when the light rays irradiate the surface of the object and reflect the light rays, the light rays are limited by the human eye structure, only the information in the visible light range can be read by naked eyes, but the hyperspectral imager can capture information in any wave band, and information which is difficult to find by naked eyes can be found. The hyperspectral imager can completely record the spectrum information of the picture, and compared with a common color camera, the hyperspectral imager can simultaneously acquire the space information and the spectrum information of a detected object. Based on the advantages, the hyperspectral image has particularly wide application in the fields of medical image processing, remote sensing, object tracking and the like.
With the development of the related field of hyperspectral image processing, the demand for hyperspectral images becomes larger and larger, but the traditional hyperspectral images are difficult to acquire and high in cost, so that the development of the field is limited. Most conventional hyperspectral image acquisition methods typically use spectrometers of spatial or spectral scanning technology, such as push broom scanners, broom scanners and band sequential scanners; however, the hyperspectral imager has obvious defects, the hyperspectral imager is quite large in size and complex in operation, so that the hyperspectral image is difficult to obtain and high in cost; in recent years, with the development of convolutional neural network theory, a large number of Convolutional Neural Network (CNN) based methods have been applied to reconstruction work, and have achieved relatively good effects; although the traditional deep learning reconstruction algorithm based on CNN solves the hardware problems of high cost and high difficulty, the reconstruction quality of the deep learning reconstruction algorithm based on CNN still does not reach the higher standard due to the defect of the CNN model in acquiring the spatial remote correlation and the self-similarity in the spectrum.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a multi-scale hyperspectral image algorithm and a system for reconstructing corresponding hyperspectral images from RGB images.
The aim of the invention can be achieved by the following technical scheme:
a multi-scale algorithm for reconstructing a corresponding hyperspectral image from an RGB image, comprising the steps of:
s1, processing an original RGB image X through a multi-scale processing module, and outputting a feature map Y' i ;
S2, the characteristic diagram Y' i Adding and stacking the characteristic image F with the original RGB image X in the dimension of a spectrum channel to obtain a characteristic image F with multi-scale preprocessing information;
s3, sequentially passing the characteristic image F with the multi-scale preprocessing information through 3 space-spectrum converter combined processing modules, and carrying out space-spectrum dimension combined processing to obtain the characteristic image with the space-spectrum characteristic information;
s4, processing the characteristic image obtained by the processing of S3 through a spectrum dimension transducer to obtain a characteristic image with spectrum dimension characteristic information fully extracted through the transducer;
s5, the characteristic image with the spectral dimension characteristic information fully extracted by the transducer is subjected to channel dimension adjustment to a target 31 channel output through a 3X 3 convolution layer, and finally a hyperspectral image is obtained.
Further, the processing in S1 obtains a characteristic diagram Y' i The method comprises the following steps:
s11, copying 3 copies of an original RGB image, and then respectively performing downsampling processing at downsampling rates of 8 times, 4 times and 2 times;
s12, after downsampling, carrying out convolution operation on the downsampled feature image through a convolution layer with a 3 multiplied by 3 convolution kernel; each of the other layers, except the first layer, is additively fused with the results of the previous layer immediately after the convolution operation, as shown in the following equation:
Y i =Conv(Down r (X))⊕Y i-1
wherein Downr (. Cndot.) represents a downsampling operation, r is a downsampling rate, conv (-) represents a two-dimensional convolution operation with a convolution kernel size of 3×3, and the # -symbol represents a channel number stacking operation; all convolution operations in the multi-scale processing module use the LeakyReLU activation function;representing the intermediate process output of the i-th layer, where i e (1, 2, 3);
s13, obtaining Y from S12 i Input into a spectrum converter module, capture the context information among spectrum channels and obtain Y' i The method comprises the steps of carrying out a first treatment on the surface of the The following formula is shown:
wherein ,intermediate process output representing layer i, +.>Up, representing the final result of the ith layer r (. Cndot.) represents an upsampling operation, r is the upsampling rate, conv (-) represents a two-dimensional convolution operation with a convolution kernel size of 1X 1, tran spe (. Cndot.) represents the transform processing for the spectral dimension.
Further, in S2, the step of obtaining the feature image F with the multiscale preprocessing information includes:
s21, the original RGB image is adjusted to be the sum feature map Y 'by using a 3X 3 convolution layer' i The same spatial dimensions (rh' 3 ×rw' 3 ) Output result is
S22, aiming at a characteristic diagram Y 'in the spectrum dimension' i Stacking with the original RGB image, i.e. c 0 +c 1, wherein Finally, the characteristic image with multi-scale preprocessing information is obtained>
Further, in S3, the feature image is sequentially subjected to a spatial transform process and a spectral transform process in each spatial-spectral transform joint processing module.
Further, the processing steps of the characteristic image in each space-spectrum converter joint processing module are as follows:
s31, space Transformer processing:
1) The input of the layer is set as a characteristic image F epsilon R c×h×w The characteristic image F is divided into small windows with the window size of m in an average way, and the characteristic image F is obtained through the partitioning operation i ∈R c×m×m Flattening and transposing to obtainThen, performing self-attention processing on the obtained one-dimensional feature map; the specific formula is as follows:
F={F 1 ,F 2 ,…,F N },N=hw/m 2
A i =Attention(F i W Q ,F i W K ,F i W V ),i=1,…,N
wherein WQ ,W K and WV ∈R c×c Respectively representIs a learnable parameter,/-a projection matrix of (2)>Outputting a result finally for each window; wherein the Attention () operation adds relative position codes while realizing self-Attention calculation, and the specific formula is as follows:
wherein B is a relative positional offset, which is of a shape R (2m-1)×(2m-1) Is a learning parameter of (a);
2) The results of the self-attention calculations are then integrated by a simple multi-layer perceptronIn the whole process, training difficulty is reduced through jump connection, and the calculation process can be as follows:
F out =MLP(LN(F′))+F
wherein F' and F out Processing results of Spa and MLP respectively, and F out Representing the final result of the space Transformer processing, LN (·) symbols representing layer normalization;
s32, inputting the characteristic diagram result subjected to the space Transformer processing into a spectrum Transformer processing block;
1) Let the input feature diagram be H E R c×h×w Firstly, flattening and transpose a characteristic diagram H into H E R hw×c H.epsilon.R hw ×c Via W Q ,W K and WV ∈R c×c Linear projection to Q, K, V ε R hw×c The method comprises the steps of carrying out a first treatment on the surface of the The self-attention calculation process is as follows:
Attention(Q,K,V)=SoftMax(σK T Q)V
wherein σ∈R1 Is a parameter that can be learned;
2) Subsequently, the self-Attention result Attention (Q, K, V) is linearly projected and the relative position code is added, and the specific procedure is given by the following formula:
Spe(H)=Attention(Q,K,V)W+φ(V)
wherein ,W∈Rc×c Is a learnable parameter, the sign phi (·) represents the relative position code, which contains two 3 x 3 volumesLayering and a GELU activation function;
3) Then, a feedforward network is used for integrating the weight matrix obtained by the formula processing, and training difficulty is reduced by jump connection in the whole process, tran spe The calculation of (-) can be represented by the following formula:
H′=Spe(LN(H))+H
H out =FFN(LN(H′))+H′
wherein H' and H out The treatment results of Spe and FFN, respectively, while H out Representing the final result of the spectral transducer processing, the LN (·) symbol represents the layer normalization.
Further, the feed forward network is composed of a 1×1 convolution layer, a GELU activation function, a 3×3 convolution layer, a GELU activation function, and a 1×1 convolution layer in this order.
Further, in S5, the step of acquiring the hyperspectral image is:
s51, setting a convolution layer with an input of 32 and an output of 31, wherein the convolution kernel size is 3 multiplied by 3, and the step size and the filling size are set to be 1;
s52, inputting the feature image obtained in the S4 and fully extracting the spectral dimension feature information through the transducer into a convolution layer, and obtaining a hyperspectral picture with 31 channels and the same spatial resolution as the input RGB picture through a LeakyRelu activation function.
A system for multi-scale reconstruction of a corresponding hyperspectral image from an RGB image, comprising: the device comprises a multi-scale processing unit, a dimension superposition unit, a space-spectrum transducer combined processing unit, a spectrum dimension transducer processing unit and a hyperspectral image output unit;
a multi-scale processing unit for processing the original RGB image X by a multi-scale processing module and outputting a characteristic image Y' i ;
A dimension superposition unit for superposing the feature map Y' i Adding and stacking the characteristic image F with the original RGB image X in the dimension of a spectrum channel to obtain a characteristic image F with multi-scale preprocessing information;
the space-spectrum converter combined processing unit is used for processing the characteristic image F with the multi-scale preprocessing information through the 3 space-spectrum converter combined processing modules in sequence to obtain the characteristic image with the space-spectrum characteristic information;
the spectral dimension converter processing unit is used for processing the characteristic image obtained by the S3 processing through one spectral dimension converter to obtain a characteristic image with spectral dimension characteristic information fully extracted through the converter;
and the hyperspectral image output unit is used for adjusting the channel dimension to the target 31 channel output through a 3×3 convolution layer to finally obtain the hyperspectral image.
A computer storage medium storing a readable program which when executed performs the algorithm described above.
An apparatus, comprising: one or more processors, memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the algorithms described above.
The invention has the beneficial effects that: through a multi-scale space-spectrum joint processing network architecture based on a transducer, the corresponding hyperspectral picture can be reconstructed from RGB with high efficiency, low cost and accuracy; compared with the traditional hyperspectral imager and the deep learning algorithm based on CNN, the method can greatly improve the speed of acquiring the hyperspectral picture and reduce the cost of acquiring the hyperspectral picture, overcomes the limitation of a CNN model in acquiring the space remote correlation and the self-similarity in the spectrum, adopts a multi-scale hierarchical feature extraction mode, and avoids the defect of information loss in the encoding and decoding process caused by the U-Net network used in the current work. The method can greatly improve the accuracy of reconstructing the hyperspectral picture, and has higher practical value.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to those skilled in the art that other drawings can be obtained according to these drawings without inventive effort.
FIG. 1 is a block diagram of the algorithm of the present invention for reconstructing a corresponding hyperspectral image from an RGB image;
FIG. 2 is a block diagram of a multi-scale processing module of the present invention;
FIG. 3 is a graph of true hyperspectral image and reconstructed hyperspectral image and error of both in the experiment of the present invention;
fig. 4 is a graph showing the comparison of the true values and the generated values of all bands at a certain coordinate point in a hyperspectral image in the experiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, a multi-scale algorithm and system for reconstructing a corresponding hyperspectral image from an RGB image, comprising the steps of:
s1, the original RGB image X epsilon R 3×128×128 Output of feature map Y 'via multi-scale processing Module (MSP) processing' i ;
The frame structure of the multi-scale processing module is shown in fig. 2, and the specific steps of S1 are as follows:
s11, carrying out downsampling reconstruction on the image; the original RGB image of 3×128×128 is copied 3 copies and then downsampled to 192×16×16, 48×32×32 and 12×64×64 at downsampling rates of 8 times, 4 times and 2 times, respectively.
S12, after downsampling, carrying out convolution operation on the downsampled feature image through a convolution layer with a 3 multiplied by 3 convolution kernel; changing the number of channels through convolution operation, but not changing the spatial resolution so as to fuse with the result of the upper layer; each of the other layers, except the first layer, is additively fused with the results of the previous layer immediately after the convolution operation, as shown in the following equation:
Y i =Conv(Down r (X))⊕Y i-1
wherein Downr (. Cndot.) represents a downsampling operation, r is a downsampling rate, conv (-) represents a two-dimensional convolution operation with a convolution kernel size of 3×3, and the # -symbol represents a channel number stacking operation; all convolution operations in the multi-scale processing module use the LeakyReLU activation function;representing the intermediate process output of the i-th layer, where i e (1, 2, 3).
S13, Y obtained in S12 i Input into a spectral transducer module (Spe), capturing contextual information between spectral channels; since the space size of the first layer after downsampling is too small, at the end of the first layer, a 1×1 convolution layer is passed to enable the network to adaptively adjust the weights; finally, the spatial scale is adjusted to be the same as the size of the next layer through an up-sampling layer to obtain Y' i The method comprises the steps of carrying out a first treatment on the surface of the The specific implementation process is given by the following formula:
wherein ,intermediate process output representing layer i, +.>Up, representing the final result of the ith layer r (. Cndot.) represents an upsampling operation, r is the upsampling rate, conv (-) represents a two-dimensional convolution operation with a convolution kernel size of 1X 1, tran spe (. Cndot.) represents the transform processing for the spectral dimension.
S2, the final output characteristic diagram Y 'in S1' i Adding and stacking the characteristic images with the original RGB image X in the dimension of a spectrum channel to obtain a characteristic image with multi-scale preprocessing information, so that a backbone network can better extract the characteristics of the original information; .
The specific superposition steps are as follows:
s21, using a 3×3 convolution layer to transform the original input RGB image (X ε R 3×h×w ) Adjusted to the same spatial dimension (rh 'as the result output in S1' 3 ×rw' 3 ) Output result is
S22, for two tensors in the spectral dimension (feature map Y' i And the original RGB image), i.e., c) 0 +c 1, wherein Finally, the characteristic image with multi-scale preprocessing information is obtained>
S3, sequentially passing the characteristic image F with the multi-scale preprocessing information in S2 through 3 space-spectrum converter combined processing modules (SUB) to perform space-spectrum dimension combined processing to obtain a characteristic image with space-spectrum characteristic information;
the characteristic image sequentially passes through a space transducer process (Spa) and a spectrum transducer process (Spe) in each space-spectrum transducer joint processing module (SUB);
the specific steps of processing the characteristic image in a space-spectrum converter joint processing module (SUB) are as follows:
s31, space Transformer processing;
the input of the layer isTo simplify the character in the introduction, it is now assumed that the feature image F.epsilon.R c×h×w Is an input to the module; the characteristic image F is divided into small windows with the window size of m in an average way, and the characteristic image F is obtained through the partitioning operation i ∈R c×m×m Performing flattening and transposition to obtain ∈>Then, performing self-attention processing on the obtained one-dimensional feature map; the specific implementation process is shown in the following formula:
F={F 1 ,F 2 ,…,F N },N=hw/m 2
A i =Attention(F i W Q ,F i W K ,F i W V ),i=1,…,N
wherein WQ ,W K and WV ∈R c×c Respectively representIs a learnable parameter,/-a projection matrix of (2)>Outputting a result finally for each window; wherein the Attention () operation adds relative position coding while realizing self-Attention computation, the specific implementation details are given by the following formula:
wherein B is a relative positional offset, which is of a shape R (2m-1)×(2m-1) Is a learning parameter of (a);
the results of the self-attention calculations are then integrated by a simple multi-layer perceptron (MLP)And the training difficulty is reduced through jump connection in the whole process, so that Tran is summarized spa The calculation of (-) can be summarized by the following formula:
F out =MLP(LN(F′))+F′
wherein F' and F out Processing results of Spa and MLP respectively, and F out Representing the final result of the spatial transducer processing, the LN (·) symbol represents the layer normalization.
S32, spectrum converter processing;
feature map result F after space transform processing out Then input into a spectral transducer processing block;
for convenience of character description, let us now assume that the input is H ε R c×h×w Similar to spatial processing, the feature image X is flattened and transposed to H ε R hw×c H.epsilon.R is then likewise taken hw×c Via W Q ,W K and WV ∈R c×c Linear projection to Q, K, V ε R hw×c The method comprises the steps of carrying out a first treatment on the surface of the In contrast to spatial processing, where all channels of a single pixel are used as a token in spectral processing, the self-attention (Spe) specification is given by:
Attention(Q,K,V)=SoftMax(σK T Q)V
wherein σ∈R1 Is a parameter that can be learned;
subsequently, the self-Attention result Attention (Q, K, V) is linearly projected and the relative position code is added, and the specific procedure is given by the following formula:
Spe(H)=Attention(Q,K,V)W+φ(V)
wherein ,W∈Rc×c Is a parameter which can be learned, phi (&) signThe number represents the relative position code, which contains two 3 x 3 convolutional layers and one GELU activation function;
then integrating the weight matrix Spe (H) obtained by the formula processing through a feedforward network (FFN), and reducing training difficulty through jump connection in the whole process, wherein the feedforward network sequentially comprises a 1×1 convolution layer, a GELU activation function, a 3×3 convolution layer, a GELU activation function and a 1×1 convolution layer; to sum up, tran spe The calculation of (-) can be represented by the following formula:
H′=Spe(LN(H))+H
H out =FFN(LN(H′))+H′
wherein H' and H out The treatment results of Spe and FFN, respectively, while H out Representing the final result of the spectral transducer processing, the LN (·) symbol represents the layer normalization.
S4, the characteristic image obtained by the S3 processing is processed by a spectral dimension transducer (Spe+FFN) to obtain a characteristic image with spectral dimension characteristic information fully extracted by the transducer
The specific processing steps are as follows: the operation S32 is repeated (again a spectral converter process is performed here, since the final purpose is to reconstruct a 31-channel hyperspectral picture from a 3-channel RGB image, to achieve a mapping of the channel dimensions from low to high, most important or spectral dimensional feature information)
S5, the characteristic image obtained in the S4 and subjected to the transformation to fully extract the spectral dimension characteristic information is output by adjusting the channel dimension to a target 31 channel through a 3X 3 convolution layer; finally obtaining a hyperspectral image;
the method comprises the following specific steps:
s51, setting a convolution layer with 32 input and 31 output, wherein the convolution kernel size is 3 multiplied by 3, and the step size and the filling size are set to be 1 so as to ensure that the size of the feature map is not changed;
s52, inputting the final result of S4 into a convolution layer and obtaining a hyperspectral picture with 31 channels and the same spatial resolution as the input RGB picture through a LeakyRelu activation function.
In this embodiment, a dataset provided by the NTIRE 2022 spectral reconstruction challenge is selected for training and evaluation, the dataset comprising 1000 pairs of RGB and their corresponding real hyperspectral pictures; each hyperspectral picture has 31 channels, each channel stores light intensity information of every 10nm from 400nm to 700nm, and the size of the spatial resolution is 482 multiplied by 512; the RGB image has 3 RGB channels, and the size of the spatial resolution is consistent with that of the corresponding hyperspectral picture; 90% of the data in the dataset were randomly selected as the training set, and the remaining 10% of the data were selected as the test set.
In the training process, RGB and hyperspectral pictures in the original data set are randomly cut to 128×128 spatial dimensions, and the light intensity values of the pixels are normalized to [0,1 ]]Is not limited in terms of the range of (a). Simple data enhancement is performed on the training data, such as random rotation and flipping. The processed RGB and the corresponding real hyperspectral pictures are input into a network model, and the mapping from RGB to the real hyperspectral pictures is trained and learned after being processed by a plurality of column network modules provided by the invention, wherein the training aim is to pass through a loss function l MRAE Updating the parameter θ to minimize and Yi Distance between:
wherein Is the intensity of the generated hyperspectral picture at the pixel point, and S i The intensity of the hyperspectral image on the pixel point is real, and N represents all the pixel points in the image.
The experimental process comprises the following steps:
in the testing process, the original RGB picture in the testing data set is input into a trained network to obtain a corresponding hyperspectral picture, the reconstructed hyperspectral picture and the real hyperspectral picture are subjected to absolute value difference according to the numerical value of each pixel point in each spectrum dimension, and a numerical error picture is obtained, as shown in figure 3. And randomly selecting the same coordinate position from the reconstructed hyperspectral picture and the real hyperspectral picture to draw the light intensity curves of all wave bands of the pixel point as shown in figure 4, thereby judging the accuracy of the hyperspectral picture generated by the method. In addition, the method of the invention is also compared with HSCNN+ and AWAN based on CNN architecture in the field in MRAE index, and ablation experiments are carried out on two modules of MSP and SSU in the method of the invention to prove the effectiveness of the method.
It can be seen from fig. 3 that the hyperspectral pictures generated from RGB pictures have a strong similarity compared to the true hyperspectral pictures on the left, and further from the error map of both, that the error of all pixels is within a particularly small range in the whole picture (darker representing smaller differences and whiter representing larger differences). As can be clearly seen from the light intensity graph of fig. 4, the method of the present invention can generate a light intensity curve very close to real data in all bands. As is evident from table 1, the proposed method has a lower MRAE, which means that the hyperspectral pictures generated by the method of the invention have a very high similarity to the true hyperspectral pictures, and from table 2 it can be found that there is a positive improvement in the performance of the method of the invention, whether MSP or SSU, which indicates the independent effectiveness of the two modules.
Table 1 comparison with the index of the conventional method
Table 2 ablation experiments
From the point of view of the index quantification result and the visual effect diagram, the algorithm provided by the invention can better generate high-quality hyperspectral pictures, the network is lighter, the network can be conveniently carried into other hardware, a more convenient and lower-cost hyperspectral picture acquisition mode is provided for various hyperspectral application fields, and the further development of the field is promoted.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The solutions in the embodiments of the present application may be implemented in various computer languages, for example, object-oriented programming language Java, and an transliterated scripting language JavaScript, etc.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.
Claims (10)
1. A multi-scale algorithm for reconstructing a corresponding hyperspectral image from an RGB image, comprising the steps of:
s1, processing an original RGB image X through a multi-scale processing module, and outputting a feature map Y' i ;
S2, the characteristic diagram Y' i Adding and stacking the characteristic image F with the original RGB image X in the dimension of a spectrum channel to obtain a characteristic image F with multi-scale preprocessing information;
s3, sequentially passing the characteristic image F with the multi-scale preprocessing information through 3 space-spectrum converter combined processing modules, and carrying out space-spectrum dimension combined processing to obtain the characteristic image with the space-spectrum characteristic information;
s4, processing the characteristic image obtained by the processing of S3 through a spectrum dimension transducer to obtain a characteristic image with spectrum dimension characteristic information fully extracted through the transducer;
s5, the characteristic image with the spectral dimension characteristic information fully extracted by the transducer is subjected to channel dimension adjustment to a target 31 channel output through a 3X 3 convolution layer, and finally a hyperspectral image is obtained.
2. A multi-scale algorithm for reconstructing a corresponding hyperspectral image from an RGB image according to claim 1 wherein the processing in S1 results in a feature map Y' i The method comprises the following steps:
s11, copying 3 copies of an original RGB image, and then respectively performing downsampling processing at downsampling rates of 8 times, 4 times and 2 times;
s12, after downsampling, carrying out convolution operation on the downsampled feature image through a convolution layer with a 3 multiplied by 3 convolution kernel; each of the other layers, except the first layer, is additively fused with the results of the previous layer immediately after the convolution operation, as shown in the following equation:
wherein Downr (. Cndot.) denotes a downsampling operation, r is a downsampling rate, conv (-) denotes a two-dimensional convolution operation with a convolution kernel size of 3 x 3,symbols represent a channel number stacking operation; all convolution operations in the multi-scale processing module use the LeakyReLU activation function; />Representing the intermediate process output of the i-th layer, where i e (1, 2, 3);
s13, obtaining Y from S12 i Input into a spectrum converter module, capture the context information among spectrum channels and obtain Y' i The method comprises the steps of carrying out a first treatment on the surface of the The following are listed belowThe formula is shown as follows:
wherein ,intermediate process output representing layer i, +.>Up, representing the final result of the ith layer r (. Cndot.) represents an upsampling operation, r is the upsampling rate, conv (-) represents a two-dimensional convolution operation with a convolution kernel size of 1X 1, tran spe (. Cndot.) represents the transform processing for the spectral dimension.
3. The multi-scale algorithm for reconstructing a corresponding hyperspectral image from an RGB image according to claim 1, wherein in S2, the step of obtaining the feature image F with multi-scale preprocessing information is:
s21, the original RGB image is adjusted to be the sum feature map Y 'by using a 3X 3 convolution layer' i The same spatial dimensions (rh' 3 ×rw' 3 ) Output result is
4. A multi-scale algorithm for reconstructing a corresponding hyperspectral image from an RGB image according to claim 1 wherein in S3, the feature image is subjected to a spatial transform process and a spectral transform process in each spatial-spectral transform joint processing module in sequence.
5. The multi-scale reconstruction of a corresponding hyperspectral image algorithm from an RGB image according to claim 4 wherein the feature image processing steps in each spatio-spectral transducer joint processing module are:
s31, space Transformer processing:
1) The input of the layer is set as a characteristic image F epsilon R c×h×w The characteristic image F is divided into small windows with the window size of m in an average way, and the characteristic image F is obtained through the partitioning operation i ∈R c×m×m Flattening and transposing to obtainThen, performing self-attention processing on the obtained one-dimensional feature map; the specific formula is as follows:
F={F 1 ,F 2 ,…,F N },N=hw/m 2
A i =Attention(F i W Q ,F i W K ,F i W V ),i=1,…,N
wherein WQ ,W K and WV ∈R c×c Respectively, represent Q, K,is a learnable parameter,/-a projection matrix of (2)>Final output junction for each windowFruit; wherein the Attention () operation adds relative position codes while realizing self-Attention calculation, and the specific formula is as follows:
wherein B is a relative positional offset, which is of a shape R (2m-1)×(2m-1) Is a learning parameter of (a);
2) The results of the self-attention calculations are then integrated by a simple multi-layer perceptronIn the whole process, training difficulty is reduced through jump connection, and the calculation process can be as follows:
F out =MLP(LN(F′))+F′
wherein F' and F out Processing results of Spa and MLP respectively, and F out Representing the final result of the space Transformer processing, LN (·) symbols representing layer normalization;
s32, inputting the characteristic diagram result subjected to the space Transformer processing into a spectrum Transformer processing block;
1) Let the input feature diagram be H E R c×h×w Firstly, flattening and transpose a characteristic diagram H into H E R hw×c H.epsilon.R hw×c Via W Q ,W K and WV ∈R c×c Linear projection to Q, K, V ε R hw×c The method comprises the steps of carrying out a first treatment on the surface of the The self-attention calculation process is as follows:
Attention(Q,K,V)=SoftMax(σK T Q)V
wherein σ∈R1 Is a parameter that can be learned;
2) Subsequently, the self-Attention result Attention (Q, K, V) is linearly projected and the relative position code is added, and the specific procedure is given by the following formula:
Spe(H)=Attention(Q,K,V)W+φ(V)
wherein ,W∈Rc×c Is a learnable parameter, the phi (·) symbol represents the relative position code, which comprises two 3 x 3 convolutional layers and a GELU activation function;
3) Then, a feedforward network is used for integrating the weight matrix obtained by the formula processing, and training difficulty is reduced by jump connection in the whole process, tran spe The calculation of (-) can be represented by the following formula:
H′=Spe(LN(H))+H
H out =FFN(LN(H′))+H′
wherein H' and H out The treatment results of Spe and FFN, respectively, while H out Representing the final result of the spectral transducer processing, the LN (·) symbol represents the layer normalization.
6. A multi-scale reconstruction of a corresponding hyperspectral image algorithm from an RGB image as claimed in claim 5 wherein the feed forward network consists of a 1 x 1 convolution layer, a GELU activation function, a 3 x 3 convolution layer, a GELU activation function and a 1 x 1 convolution layer in that order.
7. A multi-scale algorithm for reconstructing a corresponding hyperspectral image from an RGB image according to claim 1 wherein in S5, the step of acquiring the hyperspectral image is:
s51, setting a convolution layer with an input of 32 and an output of 31, wherein the convolution kernel size is 3 multiplied by 3, and the step size and the filling size are set to be 1;
s52, inputting the feature image obtained in the S4 and fully extracting the spectral dimension feature information through the transducer into a convolution layer, and obtaining a hyperspectral picture with 31 channels and the same spatial resolution as the input RGB picture through a LeakyRelu activation function.
8. A system for multi-scale reconstruction of a corresponding hyperspectral image from an RGB image, comprising: the device comprises a multi-scale processing unit, a dimension superposition unit, a space-spectrum transducer combined processing unit, a spectrum dimension transducer processing unit and a hyperspectral image output unit;
a multi-scale processing unit for processing the original RGB image X by a multi-scale processing module and outputting a characteristic image Y' i ;
A dimension superposition unit for superposing the feature map Y' i Adding and stacking the characteristic image F with the original RGB image X in the dimension of a spectrum channel to obtain a characteristic image F with multi-scale preprocessing information;
the space-spectrum converter combined processing unit is used for processing the characteristic image F with the multi-scale preprocessing information through the 3 space-spectrum converter combined processing modules in sequence to obtain the characteristic image with the space-spectrum characteristic information;
the spectral dimension converter processing unit is used for processing the characteristic image obtained by the S3 processing through one spectral dimension converter to obtain a characteristic image with spectral dimension characteristic information fully extracted through the converter;
and the hyperspectral image output unit is used for adjusting the channel dimension to the target 31 channel output through a 3×3 convolution layer to finally obtain the hyperspectral image.
9. A computer storage medium storing a readable program, characterized in that the algorithm according to any one of claims 1-7 is executed when the program is run.
10. An apparatus, comprising: one or more processors, memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the algorithm of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211469458.2A CN116091916A (en) | 2022-11-22 | 2022-11-22 | Multi-scale hyperspectral image algorithm and system for reconstructing corresponding RGB images |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211469458.2A CN116091916A (en) | 2022-11-22 | 2022-11-22 | Multi-scale hyperspectral image algorithm and system for reconstructing corresponding RGB images |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116091916A true CN116091916A (en) | 2023-05-09 |
Family
ID=86201421
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211469458.2A Pending CN116091916A (en) | 2022-11-22 | 2022-11-22 | Multi-scale hyperspectral image algorithm and system for reconstructing corresponding RGB images |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116091916A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116433551A (en) * | 2023-06-13 | 2023-07-14 | 湖南大学 | High-resolution hyperspectral imaging method and device based on double-light-path RGB fusion |
CN116990243A (en) * | 2023-09-26 | 2023-11-03 | 湖南大学 | GAP frame-based light-weight attention hyperspectral calculation reconstruction method |
CN117314757A (en) * | 2023-11-30 | 2023-12-29 | 湖南大学 | Space spectrum frequency multi-domain fused hyperspectral computed imaging method, system and medium |
-
2022
- 2022-11-22 CN CN202211469458.2A patent/CN116091916A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116433551A (en) * | 2023-06-13 | 2023-07-14 | 湖南大学 | High-resolution hyperspectral imaging method and device based on double-light-path RGB fusion |
CN116433551B (en) * | 2023-06-13 | 2023-08-22 | 湖南大学 | High-resolution hyperspectral imaging method and device based on double-light-path RGB fusion |
CN116990243A (en) * | 2023-09-26 | 2023-11-03 | 湖南大学 | GAP frame-based light-weight attention hyperspectral calculation reconstruction method |
CN116990243B (en) * | 2023-09-26 | 2024-01-19 | 湖南大学 | GAP frame-based light-weight attention hyperspectral calculation reconstruction method |
CN117314757A (en) * | 2023-11-30 | 2023-12-29 | 湖南大学 | Space spectrum frequency multi-domain fused hyperspectral computed imaging method, system and medium |
CN117314757B (en) * | 2023-11-30 | 2024-02-09 | 湖南大学 | Space spectrum frequency multi-domain fused hyperspectral computed imaging method, system and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Arad et al. | Ntire 2022 spectral recovery challenge and data set | |
Yang et al. | Deep edge guided recurrent residual learning for image super-resolution | |
CN116091916A (en) | Multi-scale hyperspectral image algorithm and system for reconstructing corresponding RGB images | |
CN110969124B (en) | Two-dimensional human body posture estimation method and system based on lightweight multi-branch network | |
CN112819910B (en) | Hyperspectral image reconstruction method based on double-ghost attention machine mechanism network | |
CN112766160A (en) | Face replacement method based on multi-stage attribute encoder and attention mechanism | |
Zhang et al. | LR-Net: Low-rank spatial-spectral network for hyperspectral image denoising | |
CN114881871A (en) | Attention-fused single image rain removing method | |
CN115393233A (en) | Full-linear polarization image fusion method based on self-encoder | |
CN114549567A (en) | Disguised target image segmentation method based on omnibearing sensing | |
CN112163998A (en) | Single-image super-resolution analysis method matched with natural degradation conditions | |
CN116797488A (en) | Low-illumination image enhancement method based on feature fusion and attention embedding | |
CN115578262A (en) | Polarization image super-resolution reconstruction method based on AFAN model | |
Lei et al. | Tghop: an explainable, efficient, and lightweight method for texture generation | |
CN115631107A (en) | Edge-guided single image noise removal | |
CN116757986A (en) | Infrared and visible light image fusion method and device | |
Bao et al. | S 2 net: Shadow mask-based semantic-aware network for single-image shadow removal | |
CN113034408B (en) | Infrared thermal imaging deep learning image denoising method and device | |
Liu et al. | Residual-guided multiscale fusion network for bit-depth enhancement | |
Liu et al. | Multi-Scale Underwater Image Enhancement in RGB and HSV Color Spaces | |
CN114202460A (en) | Super-resolution high-definition reconstruction method, system and equipment facing different damage images | |
CN113379606A (en) | Face super-resolution method based on pre-training generation model | |
CN117237207A (en) | Ghost-free high dynamic range light field imaging method for dynamic scene | |
CN115937429A (en) | Fine-grained 3D face reconstruction method based on single image | |
CN116309278A (en) | Medical image segmentation model and method based on multi-scale context awareness |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |