CN117058009A - Full-color sharpening method based on conditional diffusion model - Google Patents

Full-color sharpening method based on conditional diffusion model Download PDF

Info

Publication number
CN117058009A
CN117058009A CN202310740976.1A CN202310740976A CN117058009A CN 117058009 A CN117058009 A CN 117058009A CN 202310740976 A CN202310740976 A CN 202310740976A CN 117058009 A CN117058009 A CN 117058009A
Authority
CN
China
Prior art keywords
noise
image
module
sampling
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310740976.1A
Other languages
Chinese (zh)
Inventor
邢颖慧
瞿立涛
张艳宁
张世周
张秀伟
尹翰林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Shenzhen Institute of Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Shenzhen Institute of Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University, Shenzhen Institute of Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202310740976.1A priority Critical patent/CN117058009A/en
Publication of CN117058009A publication Critical patent/CN117058009A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/84Arrangements for image or video recognition or understanding using pattern recognition or machine learning using probabilistic graphical models from image or video features, e.g. Markov models or Bayesian networks
    • G06V10/85Markov-related models; Markov random fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • G06T2207/10036Multispectral image; Hyperspectral image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a full-color sharpening method based on a conditional diffusion model, which comprises a forward noise adding process and a reverse noise removing process, wherein the forward noise adding process of an algorithm gradually adds training data into Gaussian noise through a Markov chain process, and the reverse noise removing process starts from the Gaussian noise and gradually removes noise and samples through the output of a noise prediction network to obtain a high-resolution multispectral image; the method uses detail information (the difference between the panchromatic image and the up-sampled multispectral image) as a condition, and guides the inverse denoising process to generate a fusion result of the panchromatic image and the multispectral image from Gaussian noise. And carrying out multiple rounds of denoising iteration on the trained noise prediction network through a reverse Markov chain process to generate a final fusion result. The fusion result obtained by the method has high fidelity of space and spectrum information, and the accuracy of noise prediction is obviously improved.

Description

Full-color sharpening method based on conditional diffusion model
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a full-color sharpening method based on a conditional diffusion model.
Background
Most of the current optical earth observation satellites, such as Worldview-4, quickBird, worldView-2 and the like, can simultaneously capture remote sensing images with complementary characteristics, namely Multispectral (MS) images with high spectral resolution and low spatial resolution and full-color (PAN) images with high spatial resolution and low spectral resolution through two sensors, and provide a sufficient data source for full-color sharpening technology. Panchromatic sharpening (panshapping) is a technique that fuses multispectral and panchromatic images, essentially sharpening a multispectral image using fine spatial detail information in the panchromatic image, thereby obtaining a high spatial resolution multispectral image. The method is beneficial to the tasks of change detection, target detection, land classification and the like, and can also be used for realizing image enhancement so as to improve the image readability, thereby having important roles in real life.
The key to full color sharpening technology is to enhance the spatial resolution of a multispectral image with detailed information of the full color image. Panchromatic sharpening models based on generating an countermeasure network typically use a dual-path generator to fuse the multispectral and panchromatic images and a discriminator is designed to measure the difference between the fused result output by the generator and the true high-resolution multispectral image. The high-resolution multispectral image is obtained by the generation countermeasure network in an alternate iteration mode, but the training is unstable, the pattern collapse is easy to fall into, and the sharpening result has the problems of high-frequency component loss, spectrum or space distortion and the like.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a full-color sharpening method based on a conditional diffusion model (Conditional denoising diffusion probabilistic model), wherein the model comprises a forward denoising process and a reverse denoising process, the forward denoising process of an algorithm gradually denoises training data into Gaussian noise through a Markov chain process, and the reverse denoising process starts from the Gaussian noise and gradually denoises and samples through the output of a noise prediction network to obtain a high-resolution multispectral image; specifically, the present method uses detail information (difference of full-color image and up-sampled multispectral image) as a condition, and directs the inverse denoising process to generate fusion results of full-color image and multispectral image from gaussian noise. The noise prediction network adopts a U-shaped network architecture, model parameters are optimized through a re-weighted variation lower bound loss function, the model receives the detail components, the noise adding image and the time step as input, and noise information contained in the noise adding image is output; and carrying out multiple rounds of denoising iteration on the trained noise prediction network through a reverse Markov chain process to generate a final fusion result. The fusion result obtained by the method has high fidelity of space and spectrum information, and the accuracy of noise prediction is obviously improved.
The technical scheme adopted by the invention for solving the technical problems comprises the following steps:
step 1: preparing a data set;
capturing image blocks of a large-scale remote sensing multispectral MS image and a full-color image PAN which are paired and registered according to the sequence from left to right and from top to bottom, and dividing the image blocks into a training set, a verification set and a test set;
firstly, carrying out normalization processing on a training set, a verification set and a test set; processing the image blocks in the training set, the verification set and the test set according to the Wald protocol, and taking the processed image blocks as the input of a model;
the original MS image block is used as a reference image;
step 2: a forward noise adding process;
the noise adding process is as shown in fig. 2, the total step length of noise adding is set as T, and the original multispectral image R is directly calculated through formulas (1) and (2) for any time step T to {1, …, T } 0 Data distribution noisy to time t:
wherein the method comprises the steps ofα i Is a predefined fixed parameter, and the value range is (0, 1); e-shaped article t Noise information that is subject to a standard gaussian distribution; r is R 0 Is a multispectral image; r is R t Representing a multispectral image denoised to a time t; i represents a matrix with all 1 values; />Representing a gaussian distribution; q (R) t |R 0 ) Representing multispectral image R 0 Adding noise to the data distribution at the time t;
step 3: constructing a noise prediction network model;
the input information received by the noise prediction network comprises a time step t and a noise added image R of the time step t t Detail information P D -M r↑ Wherein P is D Representing full color images replicated in the channel dimension to correspond to the number of channels of the multispectral image, M r↑ The multispectral image is represented to be up-sampled r times until the spatial resolution is consistent with the full-color image, and detail information is obtained after difference is obtained and used as a guiding condition of a network; the predicted target of the network is R 0 To R t Added noise information, i.e. epsilon θ (R t ,t,P D -M r↑ )→ε t
The noise prediction network adopts a U-shaped network architecture and comprises four parts: the device comprises a time step module, a downsampling module, an upsampling module and a fusion module; the time step module encodes the noise adding time t and transmits the noise adding time t to other modules of the network, the input information sequentially passes through the downsampling module and the upsampling module to extract the characteristics, and the fusion module outputs a prediction result;
the network is at the input end, the detail information P D -M r↑ And noisy image R t Connect in the channel dimension; the noise prediction network uses a convolution layer to uniformly process input information;
the construction process of each part is as follows:
step 3-1: constructing a time step module;
the noise prediction network is related to time steps, the input time steps T to {1, …, T } are encoded into one-dimensional vectors, and then all up-sampling modules, down-sampling modules and fusion modules of the noise prediction network are transmitted into the one-dimensional vectors through a Linear change layer; before the one-dimensional vector is transmitted into each module, the length of the one-dimensional vector is converted into the number of input channels of the current module through a Linear change layer, so that the one-dimensional vector is summed with the input of the current module;
step 3-2: constructing a downsampling module;
the downsampling module is of a structure of sequential cascade of a group normalized GN layer, a leaklyRelu activation function and a convolution layer; the downsampling operation of the convolution layer is realized by adjusting the step length, the feature diagram of the input downsampling module is subjected to 2-time spatial downsampling, and the number of output channels is doubled relative to the number of input channels;
step 3-3: an up-sampling module is constructed;
the up-sampling module is of a structure of sequential cascade connection of a group normalization GN layer, a leakage Relu activation function, an up-sampling layer, a jump connection and a convolution layer with a step length of 1, and performs 2-time spatial up-sampling on a feature map input into the up-sampling module, and the number of output channels is halved relative to the number of input channels; the convolution layer of the up-sampling module uses jump connection to receive the output of the down-sampling module with the same size; the up-sampling layer adopts bilinear interpolation algorithm to realize 2 times of spatial up-sampling;
step 3-4: constructing a fusion module;
the fusion module is of a structure of sequential cascade of a group normalized GN layer, a leaklyRelu activation function, jump connection and a convolution layer with a step length of 1; the convolutional layers of the fusion module use a jump connection, receiving the output of the first convolutional layer and the network input (P D -M r↑ ,R t );
Step 4: training a noise prediction network;
the noise prediction network model epsilon θ (R t ,t,P D -M r↑ ) Optimizing by varying the lower bound derived loss function, R 0 To R t Added noise epsilon t Is the target; selecting time steps T to 1, the first place and the second place from the uniform distribution, and sampling from the standard normal distribution to obtain the E t Selecting a batch of reference images R from training data 0 Adding noise through a formula (2) to obtain R t Calculating detail information-a duplicated panchromatic image P D And up-sampled multispectral image M r↑ Training parameters of the noise prediction network to converge by:
min||ε θ (R t ,t,P D -M r↑ )-∈ t || 2 (3)
step 5: reverse denoising process;
after the noise prediction network training is finished, two samples R are obtained by sampling from standard normal distribution based on a Markov chain process T And z, performing a T-step iteration using the following formula (4), terminating the iteration when t=0, and finally obtaining the gaussian noise R T Denoising to obtain a fusion result of the full-color image P and the multispectral image M:
wherein the method comprises the steps of
Preferably, the total step size of the noise adding is t=1000.
Preferably, the normalization process is dividing the pixel values of all input images by 2047.0.
Preferably, the Wald protocol process refers to first filtering the original MS and PAN images using a gaussian smoothing kernel of 5 x 5 size, and then downsampling to 1/4 of the original spatial resolution.
Preferably, the up-sampled multispectral image M r↑ The method is to perform bicubic interpolation up-sampling on the MS image block to the spatial resolution of the PAN image block, and the up-sampling factor r is 4.
Preferably, the PAN image block size of the training set and the validation set is 256×256, and the ms image block size is 64×64×4; the PAN image block size of the test set is 1024×1024, and the ms image block size is 256×256×4; the data volume ratio of the training set, the verification set and the test set is as follows: 8:1:1.
The beneficial effects of the invention are as follows:
the invention provides a panchromatic sharpening algorithm based on a conditional diffusion model, which is characterized in that the difference between a panchromatic image and a multispectral image is used as conditional information to guide, a remote sensing image which simultaneously meets high spatial resolution and high spectral resolution is generated from any data sample obeying standard Gaussian distribution through T-step iteration, the algorithm is based on the strong image reconstruction capability of the diffusion model, and the obtained fusion result has high spatial and spectral information fidelity. In order to improve the characteristic extraction capability of the noise prediction network on multi-source input information, a U-shaped network structure is constructed, and the accuracy of noise prediction is obviously improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a schematic diagram of the forward denoising process and the backward denoising process according to the present invention.
Fig. 3 is a block diagram of a noise prediction network constructed in accordance with the method of the present invention.
Detailed Description
The invention will be further described with reference to the drawings and examples.
The invention provides a full-color sharpening method based on a conditional diffusion probability model. The model comprises a forward denoising process and a reverse denoising process, training data is gradually denoised into Gaussian noise through a Markov chain, and then the high-resolution multispectral image is obtained through gradual denoising and sampling of the output of a noise prediction network. The noise prediction network uses a U-shaped network structure architecture, optimizes model parameters through a re-weighted variation lower bound loss function, inputs detail information, a noise adding image and a time step, and outputs noise information in the noise adding image. And carrying out multiple rounds of denoising iteration on the trained noise prediction network through a reverse Markov chain process to generate a final fusion result. The detail information is the difference between the panchromatic image and the up-sampled multispectral image. The method comprises the following steps:
step 1: preparing a data set;
the data is from QuickBird (QB), worldView-4 (WV-4) and WorldView-2 (WV-2) satellite sensors. For QB data, the spatial resolution of PAN is 0.6 meters, the spatial resolution of MS is 2.4 meters, the MS contains 4 spectral bands: blue, green, red and near infrared bands. For WV-4 data, the spatial resolution of PAN is 0.3 meters, the spatial resolution of MS is 1.2 meters, and MS contains 4 spectral bands: blue, green, red and near infrared bands. For WV-2 data, the spatial resolution of PAN is 0.5 meters, the spatial resolution of MS is 2 meters, and MS contains 8 spectral bands: cyan, yellow, infrared, blue, green, red, near infrared 1 band and near infrared 2 band. The spatial resolution ratio between the MS and PAN images in the three data sets is 4.
The image block sizes of the training set and the verification set are 256×256 (PAN)/64×64×4 (MS), the image block size of the test set is 1024×1024 (PAN)/256×256×4 (MS), and the ratio of the amounts of training, verification, and test data is: 8:1:1. Because the reference image is not used, the MS image blocks and the PAN image blocks in the training set, the verification set and the test set are processed according to the Wald protocol, then the processed image is used as the input of the network, the original MS image is used as the reference image, the processed PAN image block is marked as P, the MS image block is marked as M, and the original MS image block is marked as R.
Step 2: forward noise adding process
The noise adding process is as shown in fig. 2, the total step length of noise adding is set to be T=1000, and for any time T to {1, …, T }, the original multispectral image R is directly calculated by the following formula 0 Data distribution noisy to time t:
wherein the method comprises the steps ofα i Is a predefined fixed parameter, the value range is (0, 1), and the multispectral image R is increased along with the increment of the time step t 0 Will be disturbed by Gaussian noise gradually, if the total step length T of the noise is large enough, R T Will follow a gaussian distributionI represents a matrix with all 1 values; e-shaped article t Noise information that is subject to a standard gaussian distribution;
said alpha i Obeying a linear noise strategy (Linear noise schedule), i.e. alpha i =1-β i ,β i At [10 ] -6 ,10 -2 ]Uniformly taking T values within a range of beta i As i increases, alpha increases i Decreasing with increasing i;
step 3: constructing a noise prediction network;
the noise prediction network is shown in fig. 3, and comprises four parts: the system comprises a time step module, a downsampling module, an upsampling module and a fusion module, wherein input information received by a network comprises a time step t, and a noise-added image R at the moment t Detail information P D -M r↑ Wherein P is D Representing full color image replication in channel dimension to and from the number of multi-spectral image channelsConsistent amount of M r↑ The multispectral image is represented to be up-sampled r times until the spatial resolution is consistent with the full-color image, and detail information is obtained after difference is obtained and used as a guiding condition of a network; the predicted target of the network is R 0 To R t Added noise information, i.e. epsilon θ (R t ,t,P D -M r↑ )→ε t
Step 3-1: constructing a time step module;
the noise prediction network is related to time steps, and the input integer time steps T to 1 are encoded into a one-dimensional vector, and then the vector is transmitted into all up-sampling, down-sampling and fusion modules of the U-shaped network structure through a Linear change layer; before the vector is transmitted into each module, the length of the vector is converted into the number of input channels of the current module through a Linear change layer, and the vector is summed with the input of the current module.
Step 3-2: building downsampling module
The downsampling module is of a structure of sequential cascade of a group normalized GN layer, a leaklyRelu activation function and a convolution layer; the window size of the convolution layer is 3 multiplied by 3, the filling is 2, the step length is 1, so that the feature map is subjected to 2 times downsampling, and the number of output channels is doubled relative to the number of input channels;
step 3-3: building up sampling modules
The up-sampling module is of a structure of sequential cascade of a group normalization GN layer, a leaklyRelu activation function, an up-sampling layer, a jump connection layer and a convolution layer; the window size of the convolution layer is 3 multiplied by 3, the filling is 1, and the step length is 1, so that the feature map is up-sampled by 2 times, and the number of output channels relative to the number of input channels is halved; the convolution layer of the up-sampling module uses jump connection to receive the output of the down-sampling module with the same size; the up-sampling layer adopts bilinear interpolation algorithm to realize 2 times of spatial up-sampling.
Step 3-4: construction of fusion modules
The fusion module is of a structure of sequential cascade of a group normalized GN layer, a leaklyRelu activation function, jump connection and a convolution layer; the window size of the convolution layer is 3 multiplied by 3, the filling is 1, and the step length is 1, so that the size of the feature map is kept unchanged;the convolutional layers of the fusion module use a jump connection, receiving the output of the first convolutional layer and the network input (P D -M r↑ ,R t );
Step 4: training noise prediction network
The noise prediction network epsilon θ (R t ,t,P D -M r↑ ) Optimizing by varying the lower bound derived loss function, R 0 To R t Added noise epsilon t Is the target; selecting a certain time T to {1, & gt, T } from the uniform distribution, and sampling from the standard normal distribution to obtain E } t Selecting a batch of reference images R from training data 0 Adding noise through a formula (2) to obtain R t Calculating detail information, and training parameters of the noise prediction network to be converged by the following formula:
min||ε θ (R t ,t,P D -M r↑ )-∈ t || 2 (3)
specifically, the model was optimized using Adam optimizer at an initial learning rate of 0.0003 until training stopped when the model's loss function converged.
Step 5: reverse denoising process
The reverse denoising process is shown in fig. 2; after the noise prediction network training is finished, the algorithm samples two samples R from standard normal distribution based on a Markov chain process T And z, performing a T-step iteration using the following formula, the iteration terminating when t=0, the algorithm being derived from gaussian noise R T Denoising to obtain a fusion result of the full-color image P and the multispectral image M:
wherein the method comprises the steps ofα t 、γ t The setting is the same as that of the training process; noise prediction network epsilon θ (R t ,t,P D -M r↑ ) Using allThe convolution network architecture is directly applied to fusion of remote sensing images under original resolution, and fusion results are obtained through T-step denoising iteration based on formula (4) from Gaussian distribution noise.
Specific examples:
(1) Data set preparation:
the aspect ratio examples are 4:1 and registered to each other, the following processing is performed:
(1) the image is read, the original image is divided into two parts which are respectively used as a training data image and a test data image, and the dividing principle is that the width of the two parts is the same and the height ratio is 9:1. PAN, MS do this;
(2) and the training data part intercepts the corresponding position image blocks of the matched PAN and MS training images from left to right and from top to bottom, wherein the image block size of the PAN is 256×256, and the image block size of the MS is 64×64×4 (4 is the channel number, and when the channel number of the MS is 8, the channel number can be correspondingly changed to 8). The test data section is constructed in a similar manner, with the image block size of PAN being 1024×1024 and the image block size of MS being 256×256×4.
(3) The training data portion randomly demarcates 1/9 as verification set data.
So far, training set, verification set and test set data are obtained. For QB, the training set contains 6943 pairs of images, the validation set contains 743 pairs of images, and the test set contains 156 pairs of images; for WV-4, the training set contains 7166 pairs of images, the validation set contains 772 pairs of images, and the test set contains 271 pairs of images; for WV-2, the training set contains 9641 pairs of images, the validation set contains 945 pairs of images, and the test set contains 136 pairs of images.
(4) When processing is carried out according to the Wald protocol, after the PAN image and the MS image are subjected to Gaussian blur in a mode that the Gaussian kernel is 5 multiplied by 5 and the standard deviation is taken to be 2, the PAN image and the MS image are downsampled by 4 times to form a training set as new images. The validation set and the test set perform the same operation.
The dataset preparation step is completed.
(2) Noise prediction fusion network model construction
The network structure is shown in fig. 3, and the important parameters for constructing the network include:
(1) convolution layer used by the whole network: the window size of all convolution layers is 3×3; the step length of a convolution layer of the downsampling module is 2, and the filling is 1; the convolution layer step size of the other modules is 1 and the padding is 1.
The number of output channels of the convolution layers in fig. 3 is 32, 64, 128, 256, 128, 64, 32, the number of target channels, respectively; the jump connection occurs between the up-sampling module and the fusion module, the input information and the output of the down-sampling module are transmitted into the second half section of the network, and the characteristics with the same resolution are connected in the channel dimension;
(2) the time coding layer uses a sine and cosine coding layer of the document Attention is all you need, a one-dimensional vector with the length of 32 is coded by a time step t, the vector is converted into a vector with the length of 256 through a Linear layer, and the number of channels is modified by using a Linear change layer before the vector is accumulated with the input of each module, so that the number of other input channels is consistent;
(3) Network training
(1) Input image: full color map P, size 64×64 (height×width) and upsampled multispectral map M r↑ Size 64×64×4 (height×width×number of channels). Here up-sampling of the multispectral M r↑ From the size of 16×16×4 multispectral, through bilinear interpolation up-sampling 4 times. The image of the input network is normalized, here the pixel values of the input image divided by 2047.0.
(2) Other relevant settings: parameters were updated using Adam optimizer. The training number epoch was set to 200, the batch training batch size was set to 32, and the initial learning rate was set to 0.0003. And testing the network effect by using the verification set every 50 epochs, and storing network parameters with the best effect.
(3) Stopping training conditions: the loss function of the network reaches a converged state.
(4) Network testing
(1) Input image: full color map P, size 256×256 (height×width) and upsampled multispectral map M r↑ Size 256×256×4 (height×width×number of channels). Here up-sampling of the multispectral M r↑ From multiple spectra with the size of 64 multiplied by 4, the spectra are obtained by bilinear interpolationSamples were obtained 4-fold.
(2) Loading best-effort denoising network parameters stored in training phase, selecting a pair of panchromatic graphs P and up-sampling multispectral graphs M from test set r↑ Taking the difference as detail information, sampling an image from Gaussian noise, and performing T-step repeated iteration based on the output of a noise prediction network to obtain a fusion result.

Claims (6)

1. A full-color sharpening method based on a conditional diffusion model, which is characterized by comprising the following steps:
step 1: preparing a data set;
capturing image blocks of a large-scale remote sensing multispectral MS image and a full-color image PAN which are paired and registered according to the sequence from left to right and from top to bottom, and dividing the image blocks into a training set, a verification set and a test set;
firstly, carrying out normalization processing on a training set, a verification set and a test set; processing the image blocks in the training set, the verification set and the test set according to the Wald protocol, and taking the processed image blocks as the input of a model;
the original MS image block is used as a reference image;
step 2: a forward noise adding process;
the noise adding process is as shown in fig. 2, the total step length of noise adding is set as T, and the original multispectral image R is directly calculated through formulas (1) and (2) for any time step T to {1,. 0 Data distribution noisy to time t:
wherein the method comprises the steps ofα i Is predefinedThe value range is (0, 1); e-shaped article t Noise information that is subject to a standard gaussian distribution; r is R 0 Is a multispectral image; r is R t Representing a multispectral image denoised to a time t; i represents a matrix with all 1 values; />Representing a gaussian distribution; q (R) t |R 0 ) Representing multispectral image R 0 Adding noise to the data distribution at the time t;
step 3: constructing a noise prediction network model;
the input information received by the noise prediction network comprises a time step t and a noise added image R of the time step t t Detail information P D -M r↑ Wherein P is D Representing full color images replicated in the channel dimension to correspond to the number of channels of the multispectral image, M r↑ The multispectral image is represented to be up-sampled r times until the spatial resolution is consistent with the full-color image, and detail information is obtained after difference is obtained and used as a guiding condition of a network; the predicted target of the network is R 0 To R t Added noise information, i.e. epsilon θ (R t ,t,P D -M r↑ )→ε t
The noise prediction network adopts a U-shaped network architecture and comprises four parts: the device comprises a time step module, a downsampling module, an upsampling module and a fusion module; the time step module encodes the noise adding time t and transmits the noise adding time t to other modules of the network, the input information sequentially passes through the downsampling module and the upsampling module to extract the characteristics, and the fusion module outputs a prediction result;
the network is at the input end, the detail information P D -M r↑ And noisy image R t Connect in the channel dimension; the noise prediction network uses a convolution layer to uniformly process input information;
the construction process of each part is as follows:
step 3-1: constructing a time step module;
the noise prediction network is related to time steps, the input time steps T to 1, the input time steps are encoded into one-dimensional vectors, and then all up-sampling modules, down-sampling modules and fusion modules of the noise prediction network are transmitted into the one-dimensional vectors through a Linear change layer; before the one-dimensional vector is transmitted into each module, the length of the one-dimensional vector is converted into the number of input channels of the current module through a Linear change layer, so that the one-dimensional vector is summed with the input of the current module;
step 3-2: constructing a downsampling module;
the downsampling module is of a structure of sequential cascade of a group normalized GN layer, a leaklyRelu activation function and a convolution layer; the downsampling operation of the convolution layer is realized by adjusting the step length, the feature diagram of the input downsampling module is subjected to 2-time spatial downsampling, and the number of output channels is doubled relative to the number of input channels;
step 3-3: an up-sampling module is constructed;
the up-sampling module is of a structure of sequential cascade connection of a group normalization GN layer, a leakage Relu activation function, an up-sampling layer, a jump connection and a convolution layer with a step length of 1, and performs 2-time spatial up-sampling on a feature map input into the up-sampling module, and the number of output channels is halved relative to the number of input channels; the convolution layer of the up-sampling module uses jump connection to receive the output of the down-sampling module with the same size; the up-sampling layer adopts bilinear interpolation algorithm to realize 2 times of spatial up-sampling;
step 3-4: constructing a fusion module;
the fusion module is of a structure of sequential cascade of a group normalized GN layer, a leaklyRelu activation function, jump connection and a convolution layer with a step length of 1; the convolutional layers of the fusion module use a jump connection, receiving the output of the first convolutional layer and the network input (P D -M r↑ ,R t );
Step 4: training a noise prediction network;
the noise prediction network model epsilon θ (R t ,t,P D -M r↑ ) Optimizing by varying the lower bound derived loss function, R 0 To R t Added noise epsilon t Is the target; selecting time steps T to 1, the first place and the second place from the uniform distribution, and sampling from the standard normal distribution to obtain the E t Selecting a batch of reference images R from training data 0 General purpose medicineAdding noise through formula (2) to obtain R t Calculating detail information-a duplicated panchromatic image P D And up-sampled multispectral image M r↑ Training parameters of the noise prediction network to converge by:
min||ε θ (R t ,t,P D -M r↑ )-∈ t || 2 (3)
step 5: reverse denoising process;
after the noise prediction network training is finished, two samples R are obtained by sampling from standard normal distribution based on a Markov chain process T And z, performing a T-step iteration using the following formula (4), terminating the iteration when t=0, and finally obtaining the gaussian noise R T Denoising to obtain a fusion result of the full-color image P and the multispectral image M:
wherein the method comprises the steps of
2. A method of full color sharpening based on a conditional diffusion model according to claim 1, wherein the total step size of the noise addition is t=1000.
3. A method of full color sharpening based on a conditional diffusion model according to claim 1, wherein the normalization process is dividing pixel values of all input images by 2047.0.
4. A full-color sharpening method based on a conditional diffusion model according to claim 1, wherein the Wald protocol processing means that the original MS and PAN images are first filtered using a gaussian smoothing kernel of 5 x 5 size and then downsampled to 1/4 of the original spatial resolution.
5. A method of full color sharpening based on a conditional diffusion model according to claim 1, wherein said up-sampled multispectral image M r↑ The method is to perform bicubic interpolation up-sampling on the MS image block to the spatial resolution of the PAN image block, and the up-sampling factor r is 4.
6. The method of claim 1, wherein the training set and validation set have PAN image block sizes of 256 x 256 and ms image block sizes of 64 x 4; the PAN image block size of the test set is 1024×1024, and the ms image block size is 256×256×4; the data volume ratio of the training set, the verification set and the test set is as follows: 8:1:1.
CN202310740976.1A 2023-06-21 2023-06-21 Full-color sharpening method based on conditional diffusion model Pending CN117058009A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310740976.1A CN117058009A (en) 2023-06-21 2023-06-21 Full-color sharpening method based on conditional diffusion model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310740976.1A CN117058009A (en) 2023-06-21 2023-06-21 Full-color sharpening method based on conditional diffusion model

Publications (1)

Publication Number Publication Date
CN117058009A true CN117058009A (en) 2023-11-14

Family

ID=88663450

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310740976.1A Pending CN117058009A (en) 2023-06-21 2023-06-21 Full-color sharpening method based on conditional diffusion model

Country Status (1)

Country Link
CN (1) CN117058009A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611484A (en) * 2024-01-19 2024-02-27 武汉大学 Image denoising method and system based on denoising self-decoding network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611484A (en) * 2024-01-19 2024-02-27 武汉大学 Image denoising method and system based on denoising self-decoding network
CN117611484B (en) * 2024-01-19 2024-04-02 武汉大学 Image denoising method and system based on denoising self-decoding network

Similar Documents

Publication Publication Date Title
CN110533620B (en) Hyperspectral and full-color image fusion method based on AAE extraction spatial features
CN109102469B (en) Remote sensing image panchromatic sharpening method based on convolutional neural network
CN110415199B (en) Multispectral remote sensing image fusion method and device based on residual learning
CN114119444B (en) Multi-source remote sensing image fusion method based on deep neural network
CN110544212B (en) Convolutional neural network hyperspectral image sharpening method based on hierarchical feature fusion
CN109003239B (en) Multispectral image sharpening method based on transfer learning neural network
CN113327218B (en) Hyperspectral and full-color image fusion method based on cascade network
CN112488978A (en) Multi-spectral image fusion imaging method and system based on fuzzy kernel estimation
CN115272078A (en) Hyperspectral image super-resolution reconstruction method based on multi-scale space-spectrum feature learning
CN117058009A (en) Full-color sharpening method based on conditional diffusion model
CN112163998A (en) Single-image super-resolution analysis method matched with natural degradation conditions
CN113793289A (en) Multi-spectral image and panchromatic image fuzzy fusion method based on CNN and NSCT
CN114266957A (en) Hyperspectral image super-resolution restoration method based on multi-degradation mode data augmentation
CN111008936B (en) Multispectral image panchromatic sharpening method
CN115311184A (en) Remote sensing image fusion method and system based on semi-supervised deep neural network
CN116029902A (en) Knowledge distillation-based unsupervised real world image super-resolution method
Yang et al. An effective and comprehensive image super resolution algorithm combined with a novel convolutional neural network and wavelet transform
CN113744134A (en) Hyperspectral image super-resolution method based on spectrum unmixing convolution neural network
CN112184552B (en) Sub-pixel convolution image super-resolution method based on high-frequency feature learning
CN116883799A (en) Hyperspectral image depth space spectrum fusion method guided by component replacement model
CN116957940A (en) Multi-scale image super-resolution reconstruction method based on contour wave knowledge guided network
CN115861083B (en) Hyperspectral and multispectral remote sensing fusion method for multiscale and global features
CN111899166A (en) Medical hyperspectral microscopic image super-resolution reconstruction method based on deep learning
CN116433548A (en) Hyperspectral and panchromatic image fusion method based on multistage information extraction
CN114638761A (en) Hyperspectral image panchromatic sharpening method, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination