CN116385264A

CN116385264A - Super-resolution remote sensing data reconstruction method

Info

Publication number: CN116385264A
Application number: CN202310330032.7A
Authority: CN
Inventors: 杜震洪; 赵佳晖; 李亚东; 戚劲
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2023-03-30
Filing date: 2023-03-30
Publication date: 2023-07-04

Abstract

The invention discloses a super-resolution remote sensing data reconstruction method, and belongs to the field of deep learning super-resolution reconstruction. The invention derives generalization feasibility oriented to unknown fuzzy core degradation from a reconstruction theory level, and designs a blind super-resolution framework comprising an image correction network based on fuzzy core information, a fuzzy core estimation network based on low-resolution images and a super-resolution network. The method can promote the generation of more accurate fuzzy kernels and clearer high-resolution images, and alleviate the problem of performance degradation of the super-resolution model trained in the fixed degradation process when the super-resolution model is applied to real remote sensing images. The method improves the generalization capability of the super-resolution model and meets various degradation problems in a real scene when facing degradation phenomena such as random unknown remote sensing image deformation, blurring, noise and the like.

Description

Super-resolution remote sensing data reconstruction method

Technical Field

The invention belongs to the field of deep learning super-resolution, and particularly relates to a super-resolution remote sensing data reconstruction method.

Background

Although the super-resolution method of a single remote sensing image based on deep learning has achieved a better super-resolution effect, most methods generate LR remote sensing image samples by adopting an image degradation model of a fixed fuzzy core such as Bicubic downsampling because real LR and HR image samples in the same scene are difficult to obtain and the number of the real LR and HR image samples is not large. In this case, the super-resolution model obtained by supervised training generally can only cope with the degradation process of bicubic downsampling. However, in the real world, many factors such as sensor parameters, satellite attitude, weather conditions and the like can cause degradation phenomena such as deformation, blurring, noise and the like of an image, the degradation process of each remote sensing image is more complex and more diverse than that of bicubic downsampling, and the degradation process of each remote sensing image can be different, a super-resolution model learned from a sample library obtained by bicubic downsampling generally faces a serious performance degradation problem when applied to a real remote sensing image with an unknown blur kernel, and the super-resolution problem facing the unknown degradation blur kernel can be simply called a blind super-resolution (Blind Super Resolution) problem. Therefore, the research on the blind super-resolution method of the remote sensing image for the unknown degradation fuzzy core has important application value.

In recent years, a blind super-resolution method based on deep learning becomes one of research hotspots. Traditional blind super-resolution methods rely on artificially designed image features to estimate blur kernels, which have very limited generalization ability. Therefore, the invention provides a remote sensing image blind super-resolution reconstruction (Multi-step Generalization for Blind Super Resolution, MGBSR) method for unknown fuzzy core degradation aiming at the defects of the existing remote sensing image super-resolution method in the aspect of self-adaptive processing of unknown fuzzy core degradation factors.

Disclosure of Invention

The invention aims to provide a super-resolution remote sensing data reconstruction method, which aims at solving the problems of weak generalization capability, poor model precision and the like of a super-resolution model when random unknown remote sensing image deformation, blurring, noise and other degradation phenomena are caused.

In order to achieve the purpose of the invention, the technical scheme provided by the invention is as follows:

a super-resolution remote sensing data reconstruction method is used for reconstructing a low-resolution remote sensing image into a high-resolution remote sensing image, and comprises the following steps:

s1, acquiring a low-resolution remote sensing image to be processed, and initializing a fuzzy core with all zero values;

s2, inputting the low-resolution remote sensing image and the current initial blur kernel into a pre-trained image correction network, and correcting the low-resolution remote sensing image by using the input blur kernel through the image correction network to obtain a corrected low-resolution remote sensing image;

In the image correction network, first, a low-resolution remote sensing image is extracted through a first convolution layer to obtain first shallow features, and a fuzzy core is extracted through a second convolution layer to obtain second shallow features; then sequentially passing the first shallow features and the second shallow features through a plurality of cascaded dual-path fusion modules adopting an attention mechanism, wherein the first shallow features and the second shallow features are respectively used as a first module input and a second module input of a first dual-path fusion module, and the first module output and the second module output of each other dual-path fusion module are respectively used as a first module input and a second module input of each other dual-path fusion module; finally, the first module output of the last dual-path fusion module passes through a third convolution layer, so that a corrected low-resolution remote sensing image is obtained;

the dual-path fusion module comprises a plurality of cascaded dual-path residual modules, 2 channel local attention modules and 2 space local attention modules; each dual-path residual error module consists of two parallel residual error module paths, wherein the inputs of the two residual error module paths are respectively a first input characteristic image and a second input characteristic image, the first residual error module path obtains a first intermediate characteristic image after the first input characteristic image passes through a convolution layer of a two-layer intermediate connection LeakyReLU activation function, the second residual error module path obtains a second intermediate characteristic image after the second input characteristic image passes through a convolution layer of a two-layer intermediate connection LeakyReLU activation function, the first intermediate characteristic image and the second intermediate characteristic image are multiplied by elements and then added with the first input characteristic image to obtain a first output characteristic image, and the second intermediate characteristic image and the second input characteristic image are added to obtain a second output characteristic image; the first dual-path residual error module in the dual-path fusion module takes the first module input and the second module input into the current dual-path residual error module as a first input characteristic diagram and a second input characteristic diagram respectively, each other dual-path residual error module takes the first output characteristic diagram and the second output characteristic diagram output by the last dual-path residual error module as the first input characteristic diagram and the second input characteristic diagram of the first dual-path residual error module, the first output characteristic diagram output by the last dual-path residual error module is added with the first module input after being weighted by a group of channel local attention modules and space local attention modules and is output as the first module input of the current dual-path residual error module, and the second output characteristic diagram output by the last dual-path residual error module is added with the second module input after being weighted by another group of channel local attention modules and space local attention modules and is output as the second module input of the current dual-path residual error module;

S3, inputting the corrected low-resolution remote sensing image into a pre-trained super-resolution network, and reconstructing the corrected low-resolution remote sensing image by the super-resolution network to obtain a super-resolution remote sensing image;

s4, inputting the low-resolution remote sensing image and the super-resolution remote sensing image into a pre-trained fuzzy core estimation network, re-estimating the fuzzy core by the fuzzy core estimation network, and updating the current fuzzy core according to an estimation result;

in the fuzzy core estimation network, the low-resolution remote sensing image is extracted by a fourth convolution layer to obtain low-resolution image characteristics, the super-resolution remote sensing image is extracted by a fifth convolution layer to obtain super-resolution remote sensing image characteristics, the low-resolution image characteristics and the super-resolution remote sensing image characteristics are respectively used as a first module input and a second module input of the dual-path fusion module, the dual-path fusion module fuses the two types of image characteristics, and after the first module output of the dual-path fusion module sequentially passes through a sixth convolution layer, an average pooling layer, a seventh convolution layer and a Softmax layer, a more accurate fuzzy core is finally estimated and obtained and is used as an initial fuzzy core of the next iteration;

S5, repeating the steps S2-S4 continuously and iteratively to enable the estimated fuzzy core to gradually approach the real fuzzy core, and outputting the super-resolution remote sensing image obtained in the last iteration as a final result.

Preferably, the total number of iterations of steps S2 to S4 is 8 to 10.

Preferably, in the image correction network, the number of the cascade dual-path fusion modules is 3-4; in the dual-path fusion module, the number of the cascaded dual-path residual modules is 5-6.

Preferably, the super-resolution network uses an RCAN network.

Preferably, in the image correction network, the convolution kernel size of the first convolution layer is 3×3, the convolution kernel size of the second convolution layer is 1×1, and the convolution kernel size of the third convolution layer is 3×3.

Preferably, in the fuzzy core estimation network, the convolution kernel size of the fourth convolution layer is 5×5, the convolution kernel size of the fifth convolution layer is (4s+1) × (4s+1), where s is a super-resolution multiple, the convolution kernel size of the sixth convolution layer is 3×3, and the convolution kernel size of the seventh convolution layer is 1×1.

Preferably, in the spatial local attention module, after the feature map input to the spatial local attention module is respectively subjected to average pooling and maximum pooling, the two pooling results are spliced, and then the spliced results sequentially pass through a convolution layer and a Sigmoid activation function layer to generate a two-dimensional spatial local attention map, which is used for weighting the attention of the spatial position dimension of the feature map input to the spatial local attention module.

Preferably, the channel local attention module performs average pooling and maximum pooling on the feature map of the input channel local attention module, and the obtained average pooling feature and maximum pooling feature sequentially pass through two convolution layers of the shared parameters and are connected with a ReLU activation function in the middle to obtain two channel feature maps, and then adds the two channel feature maps and uses a Sigmoid activation function layer to obtain one-dimensional channel local attention for weighting the attention of the channel dimension of the feature map of the input channel local attention module.

Preferably, the image correction network, the super-resolution network and the fuzzy core estimation network perform joint training in advance through a multi-step training frame, and each training sample consists of a low-resolution remote sensing image, a high-resolution remote sensing image and a real fuzzy core; in the multi-step training frame, a low-resolution remote sensing image in a training sample is taken as an input, a fuzzy kernel and a super-resolution remote sensing image obtained by each iteration are output according to S1-S5, a high-resolution remote sensing image and a real fuzzy kernel are taken as truth labels of each iteration, and the loss of each iteration round is obtained by weighted summation of image loss between the super-resolution remote sensing image and the truth labels thereof, fuzzy kernel loss between the fuzzy kernel and the truth labels thereof, fuzzy kernel regularization term and comparison regularization term; the image correction network, the super-resolution network and the fuzzy core estimation network are jointly optimized by taking the sum of the losses of all iteration rounds as a total loss function and taking the minimum total loss function as a target.

Preferably, in the combined training process through the multi-step training framework, parameters of the super-resolution network are fixed in a first plurality of periods (epochs) of the training process, parameters of the image correction network and the fuzzy core estimation network are optimized, and then the three networks are combined and optimized.

Compared with the prior art, the invention has the following beneficial effects:

(1) The invention designs a blind super-resolution multi-step generalization framework comprising an image correction network, a fuzzy kernel estimation network and a pre-training super-resolution network, and reduces generalization difficulty facing unknown fuzzy kernel degradation. The result shows that the method for decomposing the generalization process into multiple steps can effectively improve the reconstruction performance of the blind super-resolution.

(2) The invention designs an image correction network based on fuzzy kernel priori information and a fuzzy kernel estimation network based on LR/HR images. The image correction network extracts deep features based on a dual-path fusion module adopting an attention mechanism, and fully fuses fuzzy core prior information in features of different levels, so that the difference of deblurring mapping of unknown fuzzy cores and super-resolution network fitting is reduced. The fuzzy core estimation network takes the SR image and the LR image as input at the same time, breaks through the limitation of only utilizing the LR image information, and effectively improves the cooperativity of the image correction network, the super-resolution network and the fuzzy core estimation network. The result shows that the dual-path fusion module adopting the attention mechanism can effectively improve the blind super-resolution reconstruction performance.

(3) The invention designs a multi-step loss function based on fuzzy kernel priori and contrast learning strategies. The fuzzy core regular term is used for promoting the fuzzy core estimation network to generate a fuzzy core which is more accurate and has physical significance based on physical constraint, and the comparison regular term is used for promoting the generation of a high-resolution remote sensing image which is closer to a clear image and further from the fuzzy image in a representation space by adopting a comparison learning strategy. The result shows that the fuzzy core regular term and the contrast regular term can effectively improve the blind super-resolution reconstruction performance of the model.

Drawings

FIG. 1 is a flow chart of a blind super-resolution generalization method of a remote sensing image based on deep learning;

FIG. 2 is a schematic diagram of an image correction network;

FIG. 3 is a schematic diagram of a dual path fusion module;

FIG. 4 is a schematic diagram of a dual path residual module;

FIG. 5 is a schematic diagram of a spatial local attention module;

FIG. 6 is a schematic diagram of a channel local attention module;

FIG. 7 is a schematic diagram of a fuzzy core estimation network;

FIG. 8 is a schematic diagram of a multi-step training framework.

Detailed Description

In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit of the invention, whereby the invention is not limited to the specific embodiments disclosed below. The technical features of the embodiments of the invention can be combined correspondingly without mutual conflict.

Before introducing the specific technical scheme of the invention, a generalization theoretical method of a super-resolution model trained by a fixed fuzzy core for an unknown fuzzy core is introduced.

The image degradation model refers to a mapping process of degrading a high resolution image to a low resolution image, and the degradation process of the model can be divided into a blurring process, a downsampling process and random noise, and is expressed as the following formula:

I _LR ＝Down(Blur(I _HR ))+Noise

wherein I is _LR And I _HR Respectively representing a low resolution remote sensing image (IR image) and an original high resolution remote sensing image (HR image) thereof, blur represents a blurring process, down represents a downsampling process, and Noise represents random Noise.

According to the image degradation process, a high-resolution remote sensing image I is given _HR The Gaussian blur kernel is generally used for calculating I in a convolution mode _HR Performing blurring processing, and then performing double three times (Bicubic) downsampling method to obtain a target I _HR Downsampling by s times, and finally adding additive noise to obtain low-resolution remote sensing image I _LR . The image degradation model can be expressed as a function of several factors, such as blur kernel, downsampling factor, noise level, etc., as shown in the following equation:

I _LR ＝(I _HR ⊙k)↓ _s +ε

wherein, the liquid crystal display device comprises a liquid crystal display device,

representing a low resolution remote sensing image of size h x w- >

A high-resolution remote sensing image of h×w is shown, and h=s×h, w=s×w, s is a downsampling factor. />

Indicates a fuzzy core with the size of d, "++" indicates convolution operation, ++ _s Representing the down-sampling process by a factor s and epsilon represents random noise.

Depending on whether these parameter factors are known or not, the super-resolution problem can be classified into a Non-blind super-resolution (Non-Blind Super Resolution) problem of known degradation factors and a blind super-resolution (Blind Super Resolution) problem of unknown degradation factors. Experience and theoretical analysis show that the accuracy degree of the fuzzy kernel is crucial to the performance of blind super-resolution of the remote sensing image, and the influence of the accuracy degree is even larger than that of complex image priori. Specifically, when the assumed blur kernel is smoother than the true blur kernel, the super-resolution image is excessively smooth, and when the assumed blur kernel is sharper than the true blur kernel, the super-resolution image suffers from high-frequency ringing artifacts.

If three factors in the image degradation model are combined and expressed as a matrix form D _↓ Can be expressed as:

I _LR ＝D _↓ ×I _HR

wherein the method comprises the steps of

The mapping relationship of the high resolution remote sensing image to the low resolution remote sensing image is represented.

Theoretically, it can be determined by D _↓ Inverse process D of (2) _↑ Will lower the resolution image I _LR Perfect recovery, obtaining and realistically obtaining a high resolution image I _HR Identical super resolution results I _SR The following are given in detailThe formula is shown as follows:

I _SR ＝D _↑ ×I _LR ＝D _↑ ×D _↓ ×I _HR ＝I _HR

in the prior art, however, most of the remote sensing image super-resolution methods based on deep learning reduce the degradation process to a Bicubic (Bicubic) downsampling process, and assume that the degradation process using Bicubic downsampling is represented as

Visible B _↓ Is the true degradation process D _↓ Special cases when only bicubic downsampling is considered, namely:

I _SR ＝B _↑ ×I _{Bicubic_LR} ＝B _↑ ×B _↓ ×I _HR ＝I _HR

when the real LR image of the unknown blur kernel is used as the super-resolution model SR (I _LR ) When input, the above formula becomes:

I _SR ＝B _↑ ×I _LR ＝B _↑ ×D _↓ ×I _HR ≠I _HR

obviously, the real degradation process is in large probability and the super-resolution model obtained through training is not matched, so that a super-resolution result which is relatively poor from a real HR image is generated.

The invention provides a generalization theoretical method of a super-resolution model facing to an unknown fuzzy core, which is used for obtaining a real LR image I by using a correction function C to obtain the unknown fuzzy core _LR Correction is performed when c=b _↓ ×D _↑ When the method is used, the corrected LR image can be processed by using the super-resolution model obtained by training of the invention to obtain a super-resolution result consistent with the HR image, namely the super-resolution result is shown as the following formula:

I _SR ＝B _↑ ×C×I _LR ＝B _↑ ×B _↓ ×D _↑ ×D _↓ ×I _HR ＝I _HR

The invention designs an image correction network Corrector (I _LR ) Fitting c=b _↓ ×D _↑ This correction procedure corrects the true LR image. In order to make the correction network more adaptive and accurate, the blur kernel information k is also taken as input to the correction function, i.e. the Corrector (I _LR ,k)。

For the unknown random fuzzy core degradation process, a fuzzy core estimation network (Estimator) is designed, more accurate fuzzy cores are estimated according to the input original LR image and super-resolution results, and the estimated fuzzy cores are used as input fuzzy cores of an image correction network (Corector) in the next process. Wherein the fuzzy core estimation network Estimator (I _LR ,I _SR ) To extract image features from LR and HR images and estimate blur kernel k, the optimization process can be expressed as:

wherein k is _i And (3) representing an estimated fuzzy core, wherein i is the iteration step number, and n is the maximum iteration step number.

In addition, the invention can train on the bicubic downsampled data set to obtain the super-resolution network SR (I) _LR ) Introducing the unknown random fuzzy core degradation process to form a remote sensing image blind super-resolution generalization method consisting of a super-resolution network, an image correction network and a fuzzy core estimation network together to obtain I _SR ＝SR(I _LR ,)). The method can be further decomposed into multiple steps of iteration, so that the gradual optimization of the fuzzy kernel estimation, the image correction and the super-resolution process is realized, and the generalization difficulty of the blind super-resolution is reduced.

Based on the generalization theory, the invention further describes a specific process of a super-resolution remote sensing data reconstruction method for reconstructing a low-resolution remote sensing image into a high-resolution remote sensing image.

In a preferred embodiment of the present invention, as shown in fig. 1, a super-resolution remote sensing data reconstruction method is provided for reconstructing a low-resolution remote sensing image into a high-resolution remote sensing image, which comprises the following steps:

s1, acquiring a low-resolution remote sensing image to be processed, and initializing a fuzzy core with all zero values.

In the embodiment of the invention, the size d×d of the blur kernel is a super parameter, and needs to be adjusted according to actual needs.

S2, inputting the low-resolution remote sensing image and the current initial blur kernel into a pre-trained image correction network (Corrector), and correcting the low-resolution remote sensing image by using the input blur kernel by the image correction network to obtain a corrected low-resolution remote sensing image.

In the image correction network, first, a low-resolution remote sensing image is extracted through a first convolution layer to obtain first shallow features, and a fuzzy kernel is extracted through a second convolution layer to obtain second shallow features; then sequentially passing the first shallow features and the second shallow features through a plurality of cascaded dual-path fusion modules (Attention-based Dual Path Fusion Block, ADPFB) adopting an Attention mechanism, wherein the first shallow features and the second shallow features are respectively used as a first module input and a second module input of a first dual-path fusion module, and the first module output and the second module output of each other dual-path fusion module are respectively used as a first module input and a second module input of the other dual-path fusion module; and finally, outputting the first module output of the last dual-path fusion module through a third convolution layer, so as to obtain the corrected low-resolution remote sensing image.

The image correction network based on the fuzzy core aims at preprocessing and correcting a real LR image based on the fuzzy core information, and reducing the difference of deblurring inverse mapping of unknown fuzzy core and super-resolution network fitting of a real situation. The structure of the image correction network is shown in FIG. 2, and the LR image I is used _LR And the fuzzy kernel k is used as input, firstly, initial features of the fuzzy kernel k and the fuzzy kernel k are extracted through a layer of convolution layer respectively, then, deep features of images are extracted through a plurality of dual-path fusion modules (Attention-based Dual Path Fusion Block, ADPFB) adopting an Attention mechanism, fuzzy kernel priori information is fully fused in features of different levels, and finally, LR images I after final correction are obtained through a layer of convolution layer _{Corrected_LR} . In the image correction network, the convolution kernel size of the first convolution layer is 3×3, the convolution kernel size of the second convolution layer is 1×1, and the convolution kernel size of the third convolution layer is 3×3, so the processing procedure in the image correction network is expressed as follows:

F _M ,K _M ＝ADPFB _M (ADPFB _M-1 (…(ADPFB ₁ (F ₀ ,K ₀ ))…))

wherein:

and->

ADPFB representing convolution layers of convolution kernel sizes 3×3 and 1×1, respectively _i (i=1, 2, ·, M) represents an ith dual path fusion module, M is the number of ADPFB modules in the image correction network. In the embodiment of the present invention, the number M of the dual-path fusion modules cascaded in the image correction network may be selected to be 3-4, preferably 3.

As shown in fig. 3, the dual-path fusion module (i.e., ADPFB) includes several cascaded dual-path residual modules (Dual Path Residual Block, DPRB), 2-channel local attention modules (Channel Local Attention Block, CLAB), and 2-space local attention modules (Spatial Local Attention Block, SLAB).

The ADPFB includes two paths, two feature maps F of dual path inputs _in And K _in Deep features are extracted through a plurality of dual-path residual error modules (DPRBs), and features of different levels are fully fused in the deep features to obtain F _N And K _N Respectively passing through CLAB and SLAB attention modules in sequenceThe blocks are subjected to feature refinement and are respectively added with the corresponding original input feature images to finally obtain an output feature image F _out And K _out . The process in ADPFB is expressed by the formula:

F _N ,K _N ＝DPRB _N (DPRB _N-1 (…(DPRB ₁ (F _in ,K _in ))…))

F _out ＝SLAB(CLAB(F _N ))+F _in

K _out ＝SLAB(CLAB(K _N ))+K _in

ADPFB(F _in ,K _in )＝[F _out ,K _out ]

wherein: n is the number of cascade dual-path residual modules in ADPFB. In the embodiment of the invention, the number of the cascaded DPRB modules in the dual-path fusion module can be 5-6, preferably 5.

Each dual-path residual error module consists of two parallel residual error module paths, wherein the inputs of the two residual error module paths are respectively a first input characteristic image and a second input characteristic image, the first residual error module path obtains a first intermediate characteristic image after the first input characteristic image passes through a convolution layer of a two-layer intermediate connection LeakyReLU activation function, the second residual error module path obtains a second intermediate characteristic image after the second input characteristic image passes through a convolution layer of a two-layer intermediate connection LeakyReLU activation function, the first intermediate characteristic image and the second intermediate characteristic image are multiplied by elements and then added with the first input characteristic image to obtain a first output characteristic image, and the second intermediate characteristic image and the second input characteristic image are added to obtain a second output characteristic image; the first dual-path residual module in the dual-path fusion module takes the first module input and the second module input of the current dual-path residual module as a first input characteristic diagram and a second input characteristic diagram respectively, the first output characteristic diagram and the second output characteristic diagram output by the last dual-path residual module are taken as the first input characteristic diagram and the second input characteristic diagram of the first dual-path residual module respectively, the first output characteristic diagram output by the last dual-path residual module is added with the first module input after being weighted by a group of channel local attention modules and space local attention modules and is taken as the first module input of the current dual-path residual module, and the second output characteristic diagram output by the last dual-path residual module is added with the second module input after being weighted by another group of channel local attention modules and space local attention modules and is taken as the second module input of the current dual-path residual module.

The dual path Residual Block (Dual Path Residual Block, DPRB) expands the Residual Block (RB) consisting of two convolutional layers and one active function layer into a dual path structure, the structure of which is shown in fig. 4. Two feature maps of input DPRB, namely first input feature map F _in And a second input feature map K _in Respectively obtaining intermediate feature graphs F and K through two convolution layers, and outputting the feature graph F of the first path _out Element-level matrix multiplication results for F and K are multiplied by the original input feature map F _in The sum of the two paths is realized, and the output characteristic diagram K of the second path is obtained _out Then K is the original input feature map K _in And the process is expressed as follows by the formula:

K _out ＝K _in +K

DRRB(F _in ,K _in )＝[F _out ,K _out ]

wherein:

representing element-level matrix multiplication operationsThe action is Hadamard product; />

A LeakyReLU activation function with a parameter of 0.2 is represented.

In addition, the above-mentioned CLAB module is aimed at learning the importance level of each channel in the input feature map, and the SLAB module is aimed at learning the importance level of each position. The local attention of the one-dimensional channel output by the CLAB module can represent the attention weight of each channel in the remote sensing image characteristics, the channel attention force diagram can be multiplied by the input characteristic diagram of the CLAB module in the ADPFB module to realize the selection of the characteristic diagram channel, and the final output is obtained and then enters the SLAB module. Similar to the CLAB module, the two-dimensional local attention map output by the SLAB module can represent the attention weights of all positions in the remote sensing image feature map, and the spatial local attention map can be multiplied by the input feature map in the ADPFB module to realize the selection of the features of different positions, and the final output is obtained.

As shown in fig. 5, in the above-mentioned spatial local attention module SLAB, after the feature map of the input spatial local attention module is respectively subjected to average pooling and maximum pooling, the two pooling results are spliced, and then the spliced results sequentially pass through a convolution layer and a Sigmoid activation function layer to generate a two-dimensional spatial local attention map, which is used for weighting the attention of the spatial position dimension of the feature map of the input spatial local attention module.

As shown in fig. 6, the channel local attention module CLAB performs average pooling and maximum pooling on the feature map of the input channel local attention module, and the obtained average pooling feature and maximum pooling feature sequentially pass through two convolution layers of the shared parameters and are connected with a ReLU activation function in the middle to obtain two channel feature maps, and then adds the two channel feature maps and uses a Sigmoid activation function layer to obtain one-dimensional channel local attention, so as to weight the attention of the channel dimension on the feature map of the input channel local attention module.

The process flow in the above-mentioned CLAB and SLAB can be expressed as follows by the formula:

CLA＝δ(FLC(Avg _s (F))+FLC(Max _s (F)))

wherein: FLC represents a shared convolution layer with a convolution kernel size of 1 x 1 in the CLAB module, while the convolution kernel size in the SLAB module is 7 x 7.

Representing a ReLU activation function, delta referring to a Sigmoid activation function, avg _s And Max _s Representing average pooling and maximum pooling operations along the spatial dimension, avg, respectively _C And Max _C Representing the average pooling and maximum pooling operations along the channel dimension, respectively.

S3, inputting the corrected low-resolution remote sensing image into a pre-trained super-resolution network, and reconstructing the corrected low-resolution remote sensing image by the super-resolution network to obtain the super-resolution remote sensing image.

In the embodiment of the present invention, the above super-resolution network may directly use an existing RCAN network.

S4, inputting the low-resolution remote sensing image and the super-resolution remote sensing image into a pre-trained fuzzy core estimation network, re-estimating the fuzzy core by the fuzzy core estimation network, and updating the current fuzzy core according to the estimation result.

In the fuzzy core estimation network, the low-resolution remote sensing image is extracted by a fourth convolution layer to obtain low-resolution image features, the super-resolution remote sensing image is extracted by a fifth convolution layer to obtain super-resolution remote sensing image features, the low-resolution image features and the super-resolution remote sensing image features are respectively used as a first module input and a second module input of the dual-path fusion module, the dual-path fusion module fuses the two types of image features, and after the first module output of the dual-path fusion module sequentially passes through a sixth convolution layer, an average pooling layer, a seventh convolution layer and a Softmax layer, a more accurate fuzzy core is finally estimated and obtained and is used as an initial fuzzy core of the next iteration.

The LR and SR image-based fuzzy core estimation network (Estimator) breaks through the prior blind super-resolution method, and only utilizes the low-resolution image I when estimating the fuzzy core _LR Information limitation, high resolution super resolution image I _SR And the information is also used as input to increase the information quantity, and meanwhile, the cooperativity of the image correction network, the super-resolution network and the fuzzy core estimation network can be improved. The structure of the fuzzy core estimation network is shown in FIG. 7, I is extracted by a convolution layer respectively _LR And I _SR By setting different convolution kernel sizes and step sizes for the two convolution layers, the extracted initial features have the same space size, then the features of the two convolution layers are fused by using an ADPFB (ADPFB) with an attention mechanism, and finally the final estimated fuzzy kernel image k is obtained by 2 convolution layers, 1 average pooling layer and 1 Softmax layer. In the above-mentioned fuzzy core estimation network, the convolution kernel size of the fourth convolution layer is 5×5, the convolution kernel size of the fifth convolution layer is (4s+1) × (4s+1), where s is a super-resolution multiple, the convolution kernel size of the sixth convolution layer is 3×3, and the convolution kernel size of the seventh convolution layer is 1×1, so that the procedure is expressed by the following formula:

F ₂ ,F ₃ ＝ADPFB(F ₀ ,F ₁ )

Wherein:

and->

The convolution layers with convolution kernel sizes of 5 and 4s+1 are represented, respectively, s is a super-resolution multiple, avgPool represents the average pooling operation, and Softmax represents the Softmax function.

The steps S2-S4 are iteratively executed in the step S5, which is essentially a blind super-resolution multi-step generalization framework (MGBSR) of the remote sensing image. Considering that the difficulty of realizing accurate estimation of a fuzzy core by the current blind super-resolution method is large, and the small error of the estimated fuzzy core can possibly have great influence on the final super-resolution result, the method decomposes the generalization process of a pre-training model from a fixed degradation fuzzy core to an unknown degradation fuzzy core into a plurality of steps through an MGBSR framework, and each step comprises a fuzzy core estimation process, an LR image correction process and an image super-resolution process, gradually optimizes the fuzzy core and effectively maintains the matching degree of a fuzzy core estimation network and an image correction network. Firstly, initializing an all-zero-value blur kernel, correcting an input LR image based on the initial blur kernel by using an image correction network (Corrector), reconstructing the corrected LR image by using a pre-trained super-resolution network to obtain a super-resolution image, and then estimating a more accurate blur kernel by using a blur kernel estimation network (Estimator) according to the input original LR image and a super-resolution result, wherein the estimated blur kernel is used as an input blur kernel of the image correction network (Corrector) in the next step. And repeating the steps, and iterating n times to obtain a final output fuzzy core and a super-resolution image result. And finally, calculating the error between the super-resolution image result and the real high-resolution image in each step and the error between the estimated fuzzy core and the real fuzzy core in each step according to the loss function, adding a fuzzy core constraint term and a high-resolution image constraint term, and performing end-to-end training on the three networks by adopting an optimization algorithm.

In the embodiment of the present invention, the total number of iterations n of the steps S2 to S4 may be selected to be 8 to 10, preferably 8.

As shown in fig. 8, the image correction network, the super-resolution network and the blur kernel estimation network need to perform joint training in advance through a multi-step training frame between the actual use of the image correction network, the super-resolution network and the blur kernel estimation network for super-resolution reconstruction of the low-resolution remote sensing image, and each training sample is composed of the low-resolution remote sensing image, the high-resolution remote sensing image and the real blur kernel. In the multi-step training frame, a low-resolution remote sensing image in a training sample is taken as an input, a fuzzy kernel and a super-resolution remote sensing image obtained by each iteration are output according to S1-S5, a high-resolution remote sensing image and a real fuzzy kernel are taken as truth labels of each iteration, and the loss of each iteration round is obtained by weighted summation of image loss between the super-resolution remote sensing image and the truth labels thereof, fuzzy kernel loss between the fuzzy kernel and the truth labels thereof, fuzzy kernel regularization term and comparison regularization term; the image correction network, the super-resolution network and the fuzzy core estimation network are jointly optimized by taking the sum of the losses of all iteration rounds as a total loss function and taking the minimum total loss function as a target.

Wherein n is the total iteration number of the multi-step generalization frame, namely the total step number, and sigma and lambda respectively represent the weight coefficients of the fuzzy core regular term and the comparison regular term. In the embodiment of the present invention, the weight coefficients of the regularization term are preferably set to σ=0.001 and λ=0.01.

The image loss uses the L1 loss (i.e., average absolute error) to measure the pixel error between two images, and exhibits better performance and convergence than the L2 loss, and the formula is as follows:

wherein, h, w and c respectively represent the length, width and wave band numbers of the high-resolution remote sensing image.

The fuzzy core loss also adopts L1 loss to measure the error of the estimated fuzzy core and the true fuzzy core, and the formula is as follows:

where K and K represent the estimated blur kernel and the true blur kernel, respectively, and d represents the blur kernel size.

The invention introduces a contrast regularization term under a potential feature space in a loss function, and aims to promote generation of a super-resolution result which is closer to a clear high-resolution image and further away from a fuzzy high-resolution image. The invention leads the clear real high-resolution remote sensing image I to be _hR And the blurred high-resolution remote sensing image I which is processed by random blur kernel and added with noise _{Blur_hR} Respectively used as a positive sample and a negative sample, wherein the positive sample is composed of a real clear high-resolution remote sensing image I _HR And the super-resolution result I obtained by the invention _S The negative sample pair is composed of the super-resolution result I of the method _S And blurred high resolution image I _{Blur_HR} The composition is formed. For potential feature space, the present invention uses an intermediate feature space of a pre-trained model VGG trained on an image dataset to represent. In addition, in order to enhance contrast capability, different weights are given to intermediate features extracted from different layers of the VGG network of the pre-training model, the weight of the shallow features to the calculated contrast loss is small, and the weight of the deep features to the calculated contrast loss is large, and the formula is as follows:

respectively represent the first layer feature space, w, of VGG network _l Is a weighting coefficient of the feature space of the first layer.

The fuzzy core regularization term in the loss function adopts 4 prior constraints with physical significance, such as 1 fuzzy core sum, 0 boundary, sparsity, centralization and the like, and limits the assumed space of the fuzzy core estimation network to a fuzzy core solution space subset meeting the physical constraint. The fuzzy core meeting the sum of 1 and the centralization constraint can basically ensure that the image is not offset, and the sparsity constraint can improve the error tendency in the optimization process and avoid the generation of excessively scattered and smooth fuzzy cores. The formula of the fuzzy core regularization term is as follows:

R _kernel ＝αL _sum +βL _boundary +γL _sparse +δL _center

L _sum ＝|1-∑ _i,j k _i,j |

L _boundary ＝∑ _i,i |k _i,j ·m _i,j |

L _sparse ＝∑ _i,j |k _i,j | ^1/2

Wherein, alpha, beta, gamma and delta are weight coefficients of four constraint terms respectively, and are set to be 1 by default. L (L) _sum The sum of constraint blur kernels is 1.L (L) _boundary Suppressing non-zero values near the blur kernel boundary, m is a constant weight mask, the weight size increases exponentially with distance from the blur kernel center. k (k) _i,j 、m _i,j The pixel values of the blur kernel k and the weight mask m at the (i, j) position are represented, respectively. L (L) _sparse The generation of the blur kernel with sparsity is facilitated to prevent the blur kernel from being excessively smoothed. L (L) _center Encouraging the centroid of the blur kernel to be located at the center of the blur kernel, where (x ₀ ,y ₀ ) Representing the center index of the blur kernel.

In addition, because the super-resolution network is pre-trained at the beginning of optimization, and the image correction network and the fuzzy core estimation network are trained from scratch, parameters of the super-resolution network are fixed in a plurality of (preferably 10) periods (epochs) before the training process, parameters of the image correction network and the fuzzy core estimation network are only optimized, three networks are jointly optimized after the following epochs, the stability of the joint training of the 3 networks is ensured, and the training and optimization are easier than the blind super-resolution method based on the generation of the countermeasure network.

The effect of the method is demonstrated below by applying the method to specific embodiments based on the method of the embodiment. The specific process is as described above, and will not be described again, and the specific parameter setting and implementation effects are mainly shown below.

Examples

In this embodiment, taking the UC Merced multispectral remote sensing image dataset as an example, a remote sensing image blind super-resolution construction method based on random blurring is implemented, and the specific steps of the method are as follows:

the invention adopts UC Merced data set, each type of remote sensing scene comprises 100 image samples, the spatial resolution is 0.3 m, and the size is 256 multiplied by 256 pixels. The UC Merced data set is acquired by the United states geological exploration bureau and covers each city in the United states, and 2100 image samples are taken, wherein the remote sensing images comprise 21 scenes such as farmlands, forests, bushes, rivers, beaches, buildings, dense residential areas, medium-sized residential areas, sparse residential areas, movable house areas, tank areas, golf courses, baseball fields, tennis courts, parking lots, airports, ports, viaducts, highways, intersections, airport runways and the like. The invention constructs high-resolution and low-resolution image sample pairs by randomly generating random Gaussian blur kernels with the sizes of 11, 15 or 21, and trains a super-resolution model on a sample data set so as to better simulate the blur degradation process in the real world.

Based on the UC Merced data set, a high-resolution image sample pair and a low-resolution image sample pair are constructed by randomly generating random Gaussian blur kernels with the sizes of 11, 15 or 21, and different low-resolution images are generated by the same high-resolution remote sensing image due to the fact that different Gaussian blur kernels are adopted. The fuzzy kernel standard deviation has a value ranging from 0.2 to 4.0, i.e. sigma ₁ ,σ ₂ E 0.2,4.0. A random fuzzy core dataset is ultimately formed, three quarters of which are the training set and one quarter of which are the test set.

In order to contrast, in this embodiment, the low-resolution image sample is first used as the input of the Bicubic interpolation method and the conventional super-resolution model RCAN, and the high-resolution image sample is used as the output, and 3 image quality evaluation indexes such as Mean Square Error (MSE), peak signal-to-noise ratio (PSNR), and Structural Similarity Index (SSIM) are used. The results are shown in Table 1:

TABLE 1 SR results of a pre-trained model on UC Merced random fuzzy core dataset

It should be noted that the 3 evaluation indexes of the 2 models on the 3 random fuzzy core data sets are all obviously reduced. Taking PSNR index as an example, the PSNR index of the Bicubic interpolation model is reduced by about 1dB, the PSNR index of the RCAN model is reduced by about 2.5dB, the performance reduction ratio is respectively 9.7% and 9.1%, and the reduction ratio of other evaluation indexes is similar to the performance reduction ratio. Therefore, the super-resolution reconstruction problem of the low-resolution remote sensing image of the random fuzzy core cannot be processed by directly adopting a Bicubic interpolation method and a traditional super-resolution model RCAN.

Based on the MGBSR framework shown in the S1-S5, the random fuzzy core data set is tested. Super-parameter beta is adopted in the frame training process ₁ =0.9 and β ₂ Adam optimizer=0.999, initial learning rate set to 0.0002, generalization training with multi-step loss function based on fuzzy kernel priors and contrast learning strategy, whereinThe weight parameters of the fuzzy core regularization term and the comparative regularization term are set to σ=0.001 and λ=0.01, respectively, and the Batch (Batch) size is set to 8. After training, 3 image quality evaluation indexes such as Mean Square Error (MSE), peak signal-to-noise ratio (PSNR) and Structural Similarity Index (SSIM) are used on the test set. The comparison result of the MGBSR framework and the conventional Bicubic method is shown in table 2:

TABLE 2 SR results of different blind super-resolution methods on UC Merced random fuzzy core dataset

From the table, the invention provides the super-resolution training result of the model on the data set adopting random size Gaussian blur kernel degradation, and lists the test result of the model on the UC Merced remote sensing image data set with random blur kernel degradation of 21, 15 and 11 blur kernel sizes respectively. The blind super-resolution method obtains good results on 3 evaluation indexes, and takes PSNR indexes as an example, the lifting amplitude on data sets of random fuzzy kernels with different sizes is more than 0.15 dB. The result shows that the blind super-resolution MGBSR framework provided by the invention can obviously relieve the problem of performance degradation of the pre-training model facing the fixed degradation process when facing the unknown random fuzzy core degradation process.

The embodiment described above is only a preferred embodiment of the present invention, but is not intended to limit the present invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, all the technical schemes obtained by adopting the equivalent substitution or equivalent transformation are within the protection scope of the invention.

Claims

1. A super-resolution remote sensing data reconstruction method is used for reconstructing a low-resolution remote sensing image into a high-resolution remote sensing image and is characterized by comprising the following steps of:

2. The method of claim 1, wherein the total number of iterations of steps S2-S4 is 8-10.

3. The method for reconstructing super-resolution remote sensing data according to claim 1, wherein preferably, in the image correction network, the number of the cascade dual-path fusion modules is 3-4; preferably, in the dual-path fusion module, the number of the cascaded dual-path residual modules is 5-6.

4. The method for reconstructing super-resolution remote sensing data according to claim 1, wherein the super-resolution network uses an RCAN network.

5. The method for reconstructing super-resolution remote sensing data according to claim 1, wherein in the image correction network, a convolution kernel of a first convolution layer has a size of 3×3, a convolution kernel of a second convolution layer has a size of 1×1, and a convolution kernel of a third convolution layer has a size of 3×3.

6. The method for reconstructing super-resolution remote sensing data according to claim 1, wherein in the fuzzy core estimation network, the convolution kernel size of the fourth convolution layer is 5×5, the convolution kernel size of the fifth convolution layer is (4s+1) × (4s+1), where s is a super-resolution multiple, the convolution kernel size of the sixth convolution layer is 3×3, and the convolution kernel size of the seventh convolution layer is 1×1.

7. The method for reconstructing super-resolution remote sensing data according to claim 1, wherein in the spatial local attention module, after the feature map of the input spatial local attention module is respectively subjected to average pooling and maximum pooling, the two pooled results are spliced, and then the spliced results sequentially pass through a convolution layer and a Sigmoid activation function layer to generate a two-dimensional spatial local attention map, which is used for weighting the attention of the spatial position dimension of the feature map of the input spatial local attention module.

8. The method for reconstructing super-resolution remote sensing data according to claim 1, wherein the channel local attention module performs average pooling and maximum pooling on the feature map of the input channel local attention module, respectively, and after the obtained average pooling feature and maximum pooling feature sequentially pass through two convolution layers of the shared parameters and are connected with a ReLU activation function in the middle, two channel feature maps are obtained, and then the two channel feature maps are added and a Sigmoid activation function layer is used to obtain one-dimensional channel local attention for weighting the attention of the channel dimension of the feature map of the input channel local attention module.

9. The method for reconstructing super-resolution remote sensing data according to claim 1, wherein the image correction network, the super-resolution network and the fuzzy core estimation network perform joint training in advance through a multi-step training framework, and each training sample consists of a low-resolution remote sensing image, a high-resolution remote sensing image and a real fuzzy core; in the multi-step training frame, a low-resolution remote sensing image in a training sample is taken as an input, a fuzzy kernel and a super-resolution remote sensing image obtained by each iteration are output according to S1-S5, a high-resolution remote sensing image and a real fuzzy kernel are taken as truth labels of each iteration, and the loss of each iteration round is obtained by weighted summation of image loss between the super-resolution remote sensing image and the truth labels thereof, fuzzy kernel loss between the fuzzy kernel and the truth labels thereof, fuzzy kernel regularization term and comparison regularization term; the image correction network, the super-resolution network and the fuzzy core estimation network are jointly optimized by taking the sum of the losses of all iteration rounds as a total loss function and taking the minimum total loss function as a target.

10. The method of claim 9, wherein in the process of performing the joint training through the multi-step training framework, parameters of the super-resolution network are fixed first in a first period (Epoch) of the training process, parameters of only the image correction network and the fuzzy core estimation network are optimized, and then the three networks are jointly optimized.