CN117058009A

CN117058009A - Full-color sharpening method based on conditional diffusion model

Info

Publication number: CN117058009A
Application number: CN202310740976.1A
Authority: CN
Inventors: 邢颖慧; 瞿立涛; 张艳宁; 张世周; 张秀伟; 尹翰林
Original assignee: Northwestern Polytechnical University; Shenzhen Institute of Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University; Shenzhen Institute of Northwestern Polytechnical University
Priority date: 2023-06-21
Filing date: 2023-06-21
Publication date: 2023-11-14

Abstract

The invention discloses a full-color sharpening method based on a conditional diffusion model, which comprises a forward noise adding process and a reverse noise removing process, wherein the forward noise adding process of an algorithm gradually adds training data into Gaussian noise through a Markov chain process, and the reverse noise removing process starts from the Gaussian noise and gradually removes noise and samples through the output of a noise prediction network to obtain a high-resolution multispectral image; the method uses detail information (the difference between the panchromatic image and the up-sampled multispectral image) as a condition, and guides the inverse denoising process to generate a fusion result of the panchromatic image and the multispectral image from Gaussian noise. And carrying out multiple rounds of denoising iteration on the trained noise prediction network through a reverse Markov chain process to generate a final fusion result. The fusion result obtained by the method has high fidelity of space and spectrum information, and the accuracy of noise prediction is obviously improved.

Description

Full-color sharpening method based on conditional diffusion model

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a full-color sharpening method based on a conditional diffusion model.

Background

Most of the current optical earth observation satellites, such as Worldview-4, quickBird, worldView-2 and the like, can simultaneously capture remote sensing images with complementary characteristics, namely Multispectral (MS) images with high spectral resolution and low spatial resolution and full-color (PAN) images with high spatial resolution and low spectral resolution through two sensors, and provide a sufficient data source for full-color sharpening technology. Panchromatic sharpening (panshapping) is a technique that fuses multispectral and panchromatic images, essentially sharpening a multispectral image using fine spatial detail information in the panchromatic image, thereby obtaining a high spatial resolution multispectral image. The method is beneficial to the tasks of change detection, target detection, land classification and the like, and can also be used for realizing image enhancement so as to improve the image readability, thereby having important roles in real life.

The key to full color sharpening technology is to enhance the spatial resolution of a multispectral image with detailed information of the full color image. Panchromatic sharpening models based on generating an countermeasure network typically use a dual-path generator to fuse the multispectral and panchromatic images and a discriminator is designed to measure the difference between the fused result output by the generator and the true high-resolution multispectral image. The high-resolution multispectral image is obtained by the generation countermeasure network in an alternate iteration mode, but the training is unstable, the pattern collapse is easy to fall into, and the sharpening result has the problems of high-frequency component loss, spectrum or space distortion and the like.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a full-color sharpening method based on a conditional diffusion model (Conditional denoising diffusion probabilistic model), wherein the model comprises a forward denoising process and a reverse denoising process, the forward denoising process of an algorithm gradually denoises training data into Gaussian noise through a Markov chain process, and the reverse denoising process starts from the Gaussian noise and gradually denoises and samples through the output of a noise prediction network to obtain a high-resolution multispectral image; specifically, the present method uses detail information (difference of full-color image and up-sampled multispectral image) as a condition, and directs the inverse denoising process to generate fusion results of full-color image and multispectral image from gaussian noise. The noise prediction network adopts a U-shaped network architecture, model parameters are optimized through a re-weighted variation lower bound loss function, the model receives the detail components, the noise adding image and the time step as input, and noise information contained in the noise adding image is output; and carrying out multiple rounds of denoising iteration on the trained noise prediction network through a reverse Markov chain process to generate a final fusion result. The fusion result obtained by the method has high fidelity of space and spectrum information, and the accuracy of noise prediction is obviously improved.

The technical scheme adopted by the invention for solving the technical problems comprises the following steps:

step 1: preparing a data set;

capturing image blocks of a large-scale remote sensing multispectral MS image and a full-color image PAN which are paired and registered according to the sequence from left to right and from top to bottom, and dividing the image blocks into a training set, a verification set and a test set;

firstly, carrying out normalization processing on a training set, a verification set and a test set; processing the image blocks in the training set, the verification set and the test set according to the Wald protocol, and taking the processed image blocks as the input of a model;

the original MS image block is used as a reference image;

step 2: a forward noise adding process;

the noise adding process is as shown in fig. 2, the total step length of noise adding is set as T, and the original multispectral image R is directly calculated through formulas (1) and (2) for any time step T to {1, …, T } ₀ Data distribution noisy to time t:

wherein the method comprises the steps ofα _i Is a predefined fixed parameter, and the value range is (0, 1); e-shaped article _t Noise information that is subject to a standard gaussian distribution; r is R ₀ Is a multispectral image; r is R _t Representing a multispectral image denoised to a time t; i represents a matrix with all 1 values; />Representing a gaussian distribution; q (R) _t |R ₀ ) Representing multispectral image R ₀ Adding noise to the data distribution at the time t;

step 3: constructing a noise prediction network model;

the input information received by the noise prediction network comprises a time step t and a noise added image R of the time step t _t Detail information P ^D -M ^r↑ Wherein P is ^D Representing full color images replicated in the channel dimension to correspond to the number of channels of the multispectral image, M ^r↑ The multispectral image is represented to be up-sampled r times until the spatial resolution is consistent with the full-color image, and detail information is obtained after difference is obtained and used as a guiding condition of a network; the predicted target of the network is R ₀ To R _t Added noise information, i.e. epsilon _θ (R _t ，t，P ^D -M ^r↑ )→ε _t ；

The noise prediction network adopts a U-shaped network architecture and comprises four parts: the device comprises a time step module, a downsampling module, an upsampling module and a fusion module; the time step module encodes the noise adding time t and transmits the noise adding time t to other modules of the network, the input information sequentially passes through the downsampling module and the upsampling module to extract the characteristics, and the fusion module outputs a prediction result;

the network is at the input end, the detail information P ^D -M ^r↑ And noisy image R _t Connect in the channel dimension; the noise prediction network uses a convolution layer to uniformly process input information;

the construction process of each part is as follows:

step 3-1: constructing a time step module;

the noise prediction network is related to time steps, the input time steps T to {1, …, T } are encoded into one-dimensional vectors, and then all up-sampling modules, down-sampling modules and fusion modules of the noise prediction network are transmitted into the one-dimensional vectors through a Linear change layer; before the one-dimensional vector is transmitted into each module, the length of the one-dimensional vector is converted into the number of input channels of the current module through a Linear change layer, so that the one-dimensional vector is summed with the input of the current module;

step 3-2: constructing a downsampling module;

the downsampling module is of a structure of sequential cascade of a group normalized GN layer, a leaklyRelu activation function and a convolution layer; the downsampling operation of the convolution layer is realized by adjusting the step length, the feature diagram of the input downsampling module is subjected to 2-time spatial downsampling, and the number of output channels is doubled relative to the number of input channels;

step 3-3: an up-sampling module is constructed;

the up-sampling module is of a structure of sequential cascade connection of a group normalization GN layer, a leakage Relu activation function, an up-sampling layer, a jump connection and a convolution layer with a step length of 1, and performs 2-time spatial up-sampling on a feature map input into the up-sampling module, and the number of output channels is halved relative to the number of input channels; the convolution layer of the up-sampling module uses jump connection to receive the output of the down-sampling module with the same size; the up-sampling layer adopts bilinear interpolation algorithm to realize 2 times of spatial up-sampling;

step 3-4: constructing a fusion module;

the fusion module is of a structure of sequential cascade of a group normalized GN layer, a leaklyRelu activation function, jump connection and a convolution layer with a step length of 1; the convolutional layers of the fusion module use a jump connection, receiving the output of the first convolutional layer and the network input (P ^D -M ^r↑ ，R _t )；

Step 4: training a noise prediction network;

the noise prediction network model epsilon _θ (R _t ，t，P ^D -M ^r↑ ) Optimizing by varying the lower bound derived loss function, R ₀ To R _t Added noise epsilon _t Is the target; selecting time steps T to 1, the first place and the second place from the uniform distribution, and sampling from the standard normal distribution to obtain the E _t Selecting a batch of reference images R from training data ₀ Adding noise through a formula (2) to obtain R _t Calculating detail information-a duplicated panchromatic image P ^D And up-sampled multispectral image M ^r↑ Training parameters of the noise prediction network to converge by:

min||ε _θ (R _t ，t，P ^D -M ^r↑ )-∈ _t || ₂ (3)

step 5: reverse denoising process;

after the noise prediction network training is finished, two samples R are obtained by sampling from standard normal distribution based on a Markov chain process _T And z, performing a T-step iteration using the following formula (4), terminating the iteration when t=0, and finally obtaining the gaussian noise R _T Denoising to obtain a fusion result of the full-color image P and the multispectral image M:

wherein the method comprises the steps of

Preferably, the total step size of the noise adding is t=1000.

Preferably, the normalization process is dividing the pixel values of all input images by 2047.0.

Preferably, the Wald protocol process refers to first filtering the original MS and PAN images using a gaussian smoothing kernel of 5 x 5 size, and then downsampling to 1/4 of the original spatial resolution.

Preferably, the up-sampled multispectral image M ^r↑ The method is to perform bicubic interpolation up-sampling on the MS image block to the spatial resolution of the PAN image block, and the up-sampling factor r is 4.

Preferably, the PAN image block size of the training set and the validation set is 256×256, and the ms image block size is 64×64×4; the PAN image block size of the test set is 1024×1024, and the ms image block size is 256×256×4; the data volume ratio of the training set, the verification set and the test set is as follows: 8:1:1.

The beneficial effects of the invention are as follows:

the invention provides a panchromatic sharpening algorithm based on a conditional diffusion model, which is characterized in that the difference between a panchromatic image and a multispectral image is used as conditional information to guide, a remote sensing image which simultaneously meets high spatial resolution and high spectral resolution is generated from any data sample obeying standard Gaussian distribution through T-step iteration, the algorithm is based on the strong image reconstruction capability of the diffusion model, and the obtained fusion result has high spatial and spectral information fidelity. In order to improve the characteristic extraction capability of the noise prediction network on multi-source input information, a U-shaped network structure is constructed, and the accuracy of noise prediction is obviously improved.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2 is a schematic diagram of the forward denoising process and the backward denoising process according to the present invention.

Fig. 3 is a block diagram of a noise prediction network constructed in accordance with the method of the present invention.

Detailed Description

The invention will be further described with reference to the drawings and examples.

The invention provides a full-color sharpening method based on a conditional diffusion probability model. The model comprises a forward denoising process and a reverse denoising process, training data is gradually denoised into Gaussian noise through a Markov chain, and then the high-resolution multispectral image is obtained through gradual denoising and sampling of the output of a noise prediction network. The noise prediction network uses a U-shaped network structure architecture, optimizes model parameters through a re-weighted variation lower bound loss function, inputs detail information, a noise adding image and a time step, and outputs noise information in the noise adding image. And carrying out multiple rounds of denoising iteration on the trained noise prediction network through a reverse Markov chain process to generate a final fusion result. The detail information is the difference between the panchromatic image and the up-sampled multispectral image. The method comprises the following steps:

step 1: preparing a data set;

the data is from QuickBird (QB), worldView-4 (WV-4) and WorldView-2 (WV-2) satellite sensors. For QB data, the spatial resolution of PAN is 0.6 meters, the spatial resolution of MS is 2.4 meters, the MS contains 4 spectral bands: blue, green, red and near infrared bands. For WV-4 data, the spatial resolution of PAN is 0.3 meters, the spatial resolution of MS is 1.2 meters, and MS contains 4 spectral bands: blue, green, red and near infrared bands. For WV-2 data, the spatial resolution of PAN is 0.5 meters, the spatial resolution of MS is 2 meters, and MS contains 8 spectral bands: cyan, yellow, infrared, blue, green, red, near infrared 1 band and near infrared 2 band. The spatial resolution ratio between the MS and PAN images in the three data sets is 4.

The image block sizes of the training set and the verification set are 256×256 (PAN)/64×64×4 (MS), the image block size of the test set is 1024×1024 (PAN)/256×256×4 (MS), and the ratio of the amounts of training, verification, and test data is: 8:1:1. Because the reference image is not used, the MS image blocks and the PAN image blocks in the training set, the verification set and the test set are processed according to the Wald protocol, then the processed image is used as the input of the network, the original MS image is used as the reference image, the processed PAN image block is marked as P, the MS image block is marked as M, and the original MS image block is marked as R.

Step 2: forward noise adding process

The noise adding process is as shown in fig. 2, the total step length of noise adding is set to be T=1000, and for any time T to {1, …, T }, the original multispectral image R is directly calculated by the following formula ₀ Data distribution noisy to time t:

wherein the method comprises the steps ofα _i Is a predefined fixed parameter, the value range is (0, 1), and the multispectral image R is increased along with the increment of the time step t ₀ Will be disturbed by Gaussian noise gradually, if the total step length T of the noise is large enough, R _T Will follow a gaussian distributionI represents a matrix with all 1 values; e-shaped article _t Noise information that is subject to a standard gaussian distribution;

said alpha _i Obeying a linear noise strategy (Linear noise schedule), i.e. alpha _i ＝1-β _i ，β _i At [10 ] ^-6 ，10 ^-2 ]Uniformly taking T values within a range of beta _i As i increases, alpha increases _i Decreasing with increasing i;

step 3: constructing a noise prediction network;

the noise prediction network is shown in fig. 3, and comprises four parts: the system comprises a time step module, a downsampling module, an upsampling module and a fusion module, wherein input information received by a network comprises a time step t, and a noise-added image R at the moment _t Detail information P ^D -M ^r↑ Wherein P is ^D Representing full color image replication in channel dimension to and from the number of multi-spectral image channelsConsistent amount of M ^r↑ The multispectral image is represented to be up-sampled r times until the spatial resolution is consistent with the full-color image, and detail information is obtained after difference is obtained and used as a guiding condition of a network; the predicted target of the network is R ₀ To R _t Added noise information, i.e. epsilon _θ (R _t ，t，P ^D -M ^r↑ )→ε _t ；

Step 3-1: constructing a time step module;

the noise prediction network is related to time steps, and the input integer time steps T to 1 are encoded into a one-dimensional vector, and then the vector is transmitted into all up-sampling, down-sampling and fusion modules of the U-shaped network structure through a Linear change layer; before the vector is transmitted into each module, the length of the vector is converted into the number of input channels of the current module through a Linear change layer, and the vector is summed with the input of the current module.

Step 3-2: building downsampling module

The downsampling module is of a structure of sequential cascade of a group normalized GN layer, a leaklyRelu activation function and a convolution layer; the window size of the convolution layer is 3 multiplied by 3, the filling is 2, the step length is 1, so that the feature map is subjected to 2 times downsampling, and the number of output channels is doubled relative to the number of input channels;

step 3-3: building up sampling modules

The up-sampling module is of a structure of sequential cascade of a group normalization GN layer, a leaklyRelu activation function, an up-sampling layer, a jump connection layer and a convolution layer; the window size of the convolution layer is 3 multiplied by 3, the filling is 1, and the step length is 1, so that the feature map is up-sampled by 2 times, and the number of output channels relative to the number of input channels is halved; the convolution layer of the up-sampling module uses jump connection to receive the output of the down-sampling module with the same size; the up-sampling layer adopts bilinear interpolation algorithm to realize 2 times of spatial up-sampling.

Step 3-4: construction of fusion modules

The fusion module is of a structure of sequential cascade of a group normalized GN layer, a leaklyRelu activation function, jump connection and a convolution layer; the window size of the convolution layer is 3 multiplied by 3, the filling is 1, and the step length is 1, so that the size of the feature map is kept unchanged;the convolutional layers of the fusion module use a jump connection, receiving the output of the first convolutional layer and the network input (P ^D -M ^r↑ ，R _t )；

Step 4: training noise prediction network

The noise prediction network epsilon _θ (R _t ，t，P ^D -M ^r↑ ) Optimizing by varying the lower bound derived loss function, R ₀ To R _t Added noise epsilon _t Is the target; selecting a certain time T to {1, & gt, T } from the uniform distribution, and sampling from the standard normal distribution to obtain E } _t Selecting a batch of reference images R from training data ₀ Adding noise through a formula (2) to obtain R _t Calculating detail information, and training parameters of the noise prediction network to be converged by the following formula:

min||ε _θ (R _t ，t，P ^D -M ^r↑ )-∈ _t || ₂ (3)

specifically, the model was optimized using Adam optimizer at an initial learning rate of 0.0003 until training stopped when the model's loss function converged.

Step 5: reverse denoising process

The reverse denoising process is shown in fig. 2; after the noise prediction network training is finished, the algorithm samples two samples R from standard normal distribution based on a Markov chain process _T And z, performing a T-step iteration using the following formula, the iteration terminating when t=0, the algorithm being derived from gaussian noise R _T Denoising to obtain a fusion result of the full-color image P and the multispectral image M:

wherein the method comprises the steps ofα _t 、γ _t The setting is the same as that of the training process; noise prediction network epsilon _θ (R _t ，t，P ^D -M ^r↑ ) Using allThe convolution network architecture is directly applied to fusion of remote sensing images under original resolution, and fusion results are obtained through T-step denoising iteration based on formula (4) from Gaussian distribution noise.

Specific examples:

(1) Data set preparation:

the aspect ratio examples are 4:1 and registered to each other, the following processing is performed:

(1) the image is read, the original image is divided into two parts which are respectively used as a training data image and a test data image, and the dividing principle is that the width of the two parts is the same and the height ratio is 9:1. PAN, MS do this;

(2) and the training data part intercepts the corresponding position image blocks of the matched PAN and MS training images from left to right and from top to bottom, wherein the image block size of the PAN is 256×256, and the image block size of the MS is 64×64×4 (4 is the channel number, and when the channel number of the MS is 8, the channel number can be correspondingly changed to 8). The test data section is constructed in a similar manner, with the image block size of PAN being 1024×1024 and the image block size of MS being 256×256×4.

(3) The training data portion randomly demarcates 1/9 as verification set data.

So far, training set, verification set and test set data are obtained. For QB, the training set contains 6943 pairs of images, the validation set contains 743 pairs of images, and the test set contains 156 pairs of images; for WV-4, the training set contains 7166 pairs of images, the validation set contains 772 pairs of images, and the test set contains 271 pairs of images; for WV-2, the training set contains 9641 pairs of images, the validation set contains 945 pairs of images, and the test set contains 136 pairs of images.

(4) When processing is carried out according to the Wald protocol, after the PAN image and the MS image are subjected to Gaussian blur in a mode that the Gaussian kernel is 5 multiplied by 5 and the standard deviation is taken to be 2, the PAN image and the MS image are downsampled by 4 times to form a training set as new images. The validation set and the test set perform the same operation.

The dataset preparation step is completed.

(2) Noise prediction fusion network model construction

The network structure is shown in fig. 3, and the important parameters for constructing the network include:

(1) convolution layer used by the whole network: the window size of all convolution layers is 3×3; the step length of a convolution layer of the downsampling module is 2, and the filling is 1; the convolution layer step size of the other modules is 1 and the padding is 1.

The number of output channels of the convolution layers in fig. 3 is 32, 64, 128, 256, 128, 64, 32, the number of target channels, respectively; the jump connection occurs between the up-sampling module and the fusion module, the input information and the output of the down-sampling module are transmitted into the second half section of the network, and the characteristics with the same resolution are connected in the channel dimension;

(2) the time coding layer uses a sine and cosine coding layer of the document Attention is all you need, a one-dimensional vector with the length of 32 is coded by a time step t, the vector is converted into a vector with the length of 256 through a Linear layer, and the number of channels is modified by using a Linear change layer before the vector is accumulated with the input of each module, so that the number of other input channels is consistent;

(3) Network training

(1) Input image: full color map P, size 64×64 (height×width) and upsampled multispectral map M ^r↑ Size 64×64×4 (height×width×number of channels). Here up-sampling of the multispectral M ^r↑ From the size of 16×16×4 multispectral, through bilinear interpolation up-sampling 4 times. The image of the input network is normalized, here the pixel values of the input image divided by 2047.0.

(2) Other relevant settings: parameters were updated using Adam optimizer. The training number epoch was set to 200, the batch training batch size was set to 32, and the initial learning rate was set to 0.0003. And testing the network effect by using the verification set every 50 epochs, and storing network parameters with the best effect.

(3) Stopping training conditions: the loss function of the network reaches a converged state.

(4) Network testing

(1) Input image: full color map P, size 256×256 (height×width) and upsampled multispectral map M ^r↑ Size 256×256×4 (height×width×number of channels). Here up-sampling of the multispectral M ^r↑ From multiple spectra with the size of 64 multiplied by 4, the spectra are obtained by bilinear interpolationSamples were obtained 4-fold.

(2) Loading best-effort denoising network parameters stored in training phase, selecting a pair of panchromatic graphs P and up-sampling multispectral graphs M from test set ^r↑ Taking the difference as detail information, sampling an image from Gaussian noise, and performing T-step repeated iteration based on the output of a noise prediction network to obtain a fusion result.

Claims

1. A full-color sharpening method based on a conditional diffusion model, which is characterized by comprising the following steps:

step 1: preparing a data set;

the original MS image block is used as a reference image;

step 2: a forward noise adding process;

the noise adding process is as shown in fig. 2, the total step length of noise adding is set as T, and the original multispectral image R is directly calculated through formulas (1) and (2) for any time step T to {1,. ₀ Data distribution noisy to time t:

wherein the method comprises the steps ofα _i Is predefinedThe value range is (0, 1); e-shaped article _t Noise information that is subject to a standard gaussian distribution; r is R ₀ Is a multispectral image; r is R _t Representing a multispectral image denoised to a time t; i represents a matrix with all 1 values; />Representing a gaussian distribution; q (R) _t |R ₀ ) Representing multispectral image R ₀ Adding noise to the data distribution at the time t;

step 3: constructing a noise prediction network model;

the input information received by the noise prediction network comprises a time step t and a noise added image R of the time step t _t Detail information P ^D -M ^r↑ Wherein P is ^D Representing full color images replicated in the channel dimension to correspond to the number of channels of the multispectral image, M ^r↑ The multispectral image is represented to be up-sampled r times until the spatial resolution is consistent with the full-color image, and detail information is obtained after difference is obtained and used as a guiding condition of a network; the predicted target of the network is R ₀ To R _t Added noise information, i.e. epsilon _θ (R _t ,t,P ^D -M ^r↑ )→ε _t ；

the construction process of each part is as follows:

step 3-1: constructing a time step module;

the noise prediction network is related to time steps, the input time steps T to 1, the input time steps are encoded into one-dimensional vectors, and then all up-sampling modules, down-sampling modules and fusion modules of the noise prediction network are transmitted into the one-dimensional vectors through a Linear change layer; before the one-dimensional vector is transmitted into each module, the length of the one-dimensional vector is converted into the number of input channels of the current module through a Linear change layer, so that the one-dimensional vector is summed with the input of the current module;

step 3-2: constructing a downsampling module;

step 3-3: an up-sampling module is constructed;

step 3-4: constructing a fusion module;

the fusion module is of a structure of sequential cascade of a group normalized GN layer, a leaklyRelu activation function, jump connection and a convolution layer with a step length of 1; the convolutional layers of the fusion module use a jump connection, receiving the output of the first convolutional layer and the network input (P ^D -M ^r↑ ,R _t )；

Step 4: training a noise prediction network;

the noise prediction network model epsilon _θ (R _t ,t,P ^D -M ^r↑ ) Optimizing by varying the lower bound derived loss function, R ₀ To R _t Added noise epsilon _t Is the target; selecting time steps T to 1, the first place and the second place from the uniform distribution, and sampling from the standard normal distribution to obtain the E _t Selecting a batch of reference images R from training data ₀ General purpose medicineAdding noise through formula (2) to obtain R _t Calculating detail information-a duplicated panchromatic image P ^D And up-sampled multispectral image M ^r↑ Training parameters of the noise prediction network to converge by:

min||ε _θ (R _t ,t,P ^D -M ^r↑ )-∈ _t || ₂ (3)

step 5: reverse denoising process;

wherein the method comprises the steps of

2. A method of full color sharpening based on a conditional diffusion model according to claim 1, wherein the total step size of the noise addition is t=1000.

3. A method of full color sharpening based on a conditional diffusion model according to claim 1, wherein the normalization process is dividing pixel values of all input images by 2047.0.

4. A full-color sharpening method based on a conditional diffusion model according to claim 1, wherein the Wald protocol processing means that the original MS and PAN images are first filtered using a gaussian smoothing kernel of 5 x 5 size and then downsampled to 1/4 of the original spatial resolution.

5. A method of full color sharpening based on a conditional diffusion model according to claim 1, wherein said up-sampled multispectral image M ^r↑ The method is to perform bicubic interpolation up-sampling on the MS image block to the spatial resolution of the PAN image block, and the up-sampling factor r is 4.

6. The method of claim 1, wherein the training set and validation set have PAN image block sizes of 256 x 256 and ms image block sizes of 64 x 4; the PAN image block size of the test set is 1024×1024, and the ms image block size is 256×256×4; the data volume ratio of the training set, the verification set and the test set is as follows: 8:1:1.