CN117474764A

CN117474764A - High-resolution reconstruction method for remote sensing image under complex degradation model

Info

Publication number: CN117474764A
Application number: CN202311819893.8A
Authority: CN
Inventors: 蒲立新; 李明欣; 曲建明
Original assignee: Chengdu Chengdian Jinpan Health Data Technology Co ltd; University of Electronic Science and Technology of China
Current assignee: Chengdu Chengdian Jinpan Health Data Technology Co ltd; University of Electronic Science and Technology of China
Priority date: 2023-12-27
Filing date: 2023-12-27
Publication date: 2024-01-30
Anticipated expiration: 2043-12-27
Also published as: CN117474764B

Abstract

The invention relates to a high-resolution reconstruction method for a remote sensing image under a complex degradation model, and belongs to the field of high-resolution reconstruction. The method introduces an MAE self-supervision model into the field of high-resolution reconstruction, and the prior information of the remote sensing image is learned by the MAE in the first stage. And in the second stage, a reconstruction network is provided, and the reconstruction network utilizes the prior information learned by the MAE to complete the high-resolution reconstruction task of the low-resolution remote sensing image. Meanwhile, an edge attention module is provided in the reconstruction network, the edge attention module can extract gradient information of the feature map, and larger learning weight is given to edge positions with larger gradients, so that the reconstruction network focuses on the edge positions. The superiority of the reconstruction model in the high-resolution reconstruction task is proved on a public data set, the model obtains good objective index results on most degradation models, the visually reconstructed image has clear edges, contains more abundant detail information, and has wide application prospect in the field of high-resolution reconstruction.

Description

High-resolution reconstruction method for remote sensing image under complex degradation model

Technical Field

The invention belongs to the field of high-resolution reconstruction, and particularly relates to a high-resolution reconstruction method for a remote sensing image under a complex degradation model.

Background

The remote sensing image contains abundant ground object targets, has large scale and wide range, has abundant detail information and perception information, and can effectively perform scene perception and environment analysis. Therefore, the remote sensing image is widely applied in a plurality of fields:

in the disaster monitoring field, related personnel can utilize remote sensing images to monitor the area in a large scale for a long time, so that the occurrence of disasters is effectively prevented, and property loss and casualties caused by the disasters are reduced. After the disaster occurs, the remote sensing image can also provide disaster area information in real time and continuously, so that rescue workers can know the disaster progress and grade, and the rescue workers can be guided to carry out disaster relief activities.

In the field of resource exploration, spatial information in a remote sensing image can help people to determine various geological structures so as to infer position information of coal resources, and in addition, because different substances have different properties such as absorption, reflection and the like of light, the spectrum information in the remote sensing image can be utilized to explore resources such as natural gas, petroleum and the like.

In the field of land monitoring, timely mastering of the change condition of land area, utilization type and other information is important for land management, reasonable land resource utilization and farmland red line conservation. The space-time information of the remote sensing image is comprehensively utilized, so that the dynamic monitoring of the land can be realized, and the requirements of modern land management are met. The remote sensing image from the public Dataset AID Dataset is shown in fig. 1.

However, in the process of imaging the remote sensing image, the acquired remote sensing image may have the problems of low spatial resolution and blurred ground feature edge details once due to the limitation of objective conditions of the imaging equipment. At the same time, the data transmission process is limited by network bandwidth and transmission time, so that image compression processing is often required, and image information is lost. Therefore, the finally obtained remote sensing image lacks part of high-frequency information, has low spatial resolution and is influenced by noise and blurring. Such images are difficult to meet the demands of practical applications. If the remote sensing image quality is improved from hardware, the requirements on equipment are strict, the cost is high, and the difficulty is high, so researchers aim to solve the problem with low cost through software. The single remote sensing image super-Resolution technology (Single Remote Sensing Image Super Resolution, SRSISR) refers to a process of inputting a Low Resolution (LR) remote sensing image and reconstructing the Low Resolution (HR) remote sensing image by using an image processing algorithm model. The SRSISR technology can enhance the high-frequency information of the remote sensing image, improve the spatial resolution, eliminate noise and blur, improve the image quality and enable the remote sensing image to be better applied to actual scenes.

Conventional SRSISR techniques fall into three categories: (1) an interpolation-based algorithm; (2) a modeling-based algorithm; (3) algorithms based on shallow learning. The interpolation-based algorithm estimates the current unknown pixel value through the known pixel values around the pixel to be inserted, the algorithm is simple to realize and high in running speed, but the interpolation cannot recover the high-frequency information of the defect of the degraded image, and the reconstructed image is smooth and fuzzy. The modeling-based algorithm utilizes priori knowledge to combine with a mathematical model to reconstruct a high-resolution image, and can be divided into a frequency domain method, a space domain method and a frequency domain space domain combination method. The algorithm utilizes the priori knowledge of the image to restrict the reconstruction process, relieves the blurring effect brought by the interpolation algorithm to a certain extent, but is not suitable for high-resolution reconstruction with larger amplification factor. The algorithm based on shallow learning collects a large number of high-low resolution image pairs to construct a learning library, and learns the mapping relation between the image pairs by using a learning model, wherein the representative algorithm comprises neighborhood embedding, manifold learning, sparse representation and the like. These algorithms are developed from traditional machine learning methods, and require artificial design features, so the reconstruction results depend on how good the design features are.

Deep learning has achieved a great deal of development in various fields in recent years. In 2016, dong Chaodeng, deep learning was applied to the field of single image high resolution reconstruction, and Super Resolution Convolutional Neural Networks (srccn) model was proposed. The SRCNN comprises a three-layer network structure, and can realize the end-to-end high-resolution reconstruction task. On objective indexes, SRCNs obtain results superior to those of the traditional algorithm, which indicates the feasibility and superiority of deep learning in the technical field of single image super-resolution. Li Jiang on the basis of the Super-Resolution Generative Adversarial Network (SRGAN) model, a dense residual block and a receptive field module are introduced into a generating network for feature extraction. The dense residual blocks are fused with the residual blocks and the dense blocks, and the gradient vanishing problem can be relieved on the basis of strengthening feature propagation. Lei proposes a Local-Global Combined Network (LGCNet) model. LGCNet first extracts image features using an L-layer convolution layer. The shallow convolved receptive field in the neural network is less focused on local information, and the deep convolved receptive field is more focused on global information. And then LGCNet fuses the results of shallow and deep convolution through a multi-bifurcation structure so as to fuse local information and global information of the remote sensing image, and better guide high-resolution reconstruction of the remote sensing image. The model obtains an LR remote sensing image through a fixed downsampling mode (double three downsampling is adopted in most cases) under the condition of the existing HR remote sensing image, and then supervised learning under a fixed degradation model is carried out by using paired HR-LR image pairs. Although the degradation process is simple and easy to realize, the degradation process of the remote sensing image in the actual scene is complex, and besides the reduction of the spatial resolution, the image can be influenced by blurring and noise. The performance of the above model is limited when applied in a practically complex scenario.

Disclosure of Invention

In order to solve the problem of image information loss in the transmission process of a remote sensing image, the invention provides a high-resolution reconstruction method for the remote sensing image under a complex degradation model, which comprises the following specific steps:

s1: acquiring a sample image dataset, wherein the dataset comprises M images of different scenes, randomly selecting the images, and dividing the dataset into a training set and a testing set;

s2: designing a degradation model, adding noise after blurring processing and downsampling of a high-resolution remote sensing image, and generating a final low-resolution remote sensing image;

s3: training a self-encoder model with a mask, and learning priori information of the low-resolution remote sensing image;

s4: building a new reconstruction network; the reconstruction network and the self-encoder model with the mask are trained simultaneously, and the prior information learned by the self-encoder model with the mask is utilized to reconstruct the low-resolution remote sensing image with high resolution; the reconstructed network structure is divided into three parts as a whole, and the three parts are as follows:

s41: the first part carries out shallow feature extraction; the shallow layer feature extraction module extracts multi-scale features by using convolution layers with convolution kernel sizes of 3, 5 and 7 respectively, then connects the three extracted features in channel dimension, reduces the number of channels by using the convolution layer with the convolution kernel size of 1 multiplied by 1, and fuses the multi-scale features; the shallow feature extraction module initially extracts multi-scale features in the remote sensing imageF ₀ Represented by the following formula (2):F ₀ =H _SFE (I _LR ) （2）

wherein the method comprises the steps ofH _SFE A mapping function representing the shallow feature extraction,I _LR representing the input low resolution remote sensing image,F ₀ representing the extracted multi-scale feature features;

s42: the second part is used for deep feature extraction and consists of residual branches and feature branches;

the feature branch directly transmits the shallow features extracted from the first part to the rear of the deep feature extraction network;

the residual difference branch uses the structural design of a UNet model, and a convolution layer in the middle of the branch divides the branch into a front part and a rear part; the front part and the rear part are respectively cascaded with r basic blocks, the structures of the r basic blocks are mutually corresponding, each basic block of the front part is sequentially composed of a multi-scale receptive field attention module, a residual fusion block and a priori module, and each basic block of the rear part is sequentially composed of a convolution layer, the multi-scale receptive field attention module, the residual fusion block and the priori module; the multi-scale receptive field attention module and the residual fusion module are responsible for solving the characteristic learning sub-problem, and the prior module is responsible for solving the prior learning sub-problem;

the inputs of the previous base block are from a priori information and the output of the last base block, the output of the previous r base block being expressed as equation (3):（3）

wherein the method comprises the steps ofA mapping function representing the r-th basic block of the previous section; />Representing the output of the r-th basic block,representing the output of the previous base block of the r-th base block; the latter part is also cascaded with a plurality of basic blocks, the input of the basic block of the latter part is from the prior information and the output of the last basic block, and the output of the basic block corresponding to the former part is also from the output of the basic block corresponding to the former part, which are connected in the channel dimension and are subjected to feature fusion by utilizing a convolution layer; the output of the r-th basic block of the latter part is expressed as formula (4): />（4）

Wherein the method comprises the steps ofRepresenting the output of the r-th basic block of the following part,/->Representing the mapping function of the r-th basic block of the latter part,Concatrepresentation generalTrack dimension join operation, ">Representing the output of the corresponding basic blocks of the previous part, wherein n is the total number of modules in the network, n=the first r basic blocks+1 convolution blocks of the middle part+the last r basic blocks;

s43: the third part is an up-sampling module, the up-sampling mode adopts sub-pixel convolution, the generation of artifacts in the reconstruction process is avoided, and the image containing abundant detail information is reconstructed.

The invention provides a high-resolution reconstruction model based on two-stage training, wherein the prior information of a remote sensing image is learned through an MAE self-supervision model in the first stage, the reconstruction network provided in the second stage completes the high-resolution reconstruction of the remote sensing image under the guidance of the prior information, and meanwhile, an edge attention module is designed, so that the edge of the reconstructed image is clearer. On the public data set, the designed reconstruction model has better results than other algorithm models against complex degradation models.

Drawings

Fig. 1 is a remote sensing image in the prior art.

Fig. 2 is a diagram of the overall framework of the present invention.

Fig. 3 is a diagram of a reconstructed network structure.

Fig. 4 is a block diagram of the SFE module.

Fig. 5 is a block diagram of the ISAB module.

Fig. 6 is a degradation model flow diagram.

Fig. 7 is a visual reconstruction result of bicubic downsampling into a degradation model.

FIG. 8 is a diagram ofAnd reconstructing results for visualization of the degradation model.

Description of the embodiments

The self-monitoring model MAE (Masked Auto-Encoders) is a model proposed by the end of 2021, and is mainly used for image restoration tasks. The image features learned by the MAE can be used for downstream visual tasks such as detection, classification and the like, the MAE is creatively introduced into the field of high-resolution reconstruction, a high-resolution reconstruction model based on two-stage training is provided, and the overall frame of the model is shown in figure 2. The following are specific examples:

s1: the public data set AID Dataset is downloaded. AID Dataset was published by university of science and technology and university of Wuhan in 2017, and contains 10000 sample images collected from Google Earth, and scenes including 30 types of scenes, including airports, lands, baseball fields, beach, bridges, centers, churches, and the like, each of which has about 200-420 images in size of 600X 600 pixels. The training set randomly selects 100 images from 30 types of scenes of the AID Dataset, the total number of images is 3000, and the test set randomly selects 15 images from the rest images of the same scene, the total number of images is 450.

S2: and (3) designing a degradation model, and adding noise after blurring and downsampling the high-Resolution remote sensing image to generate a final Low-Resolution (LR) remote sensing image, as shown in FIG. 6.

The degradation model designed by the algorithm aims at two application scenes. The first device for photographing the remote sensing image is fine, the photographed high-resolution image has high quality and is basically not affected by blurring, but in order to transfer the image back to the ground, downsampling and lossy compression are needed for the image, the size of a transmission file is reduced, and noise is introduced in the compression and transmission processes. At this time, the degradation model is simplified into a low-resolution remote sensing image obtained by adding noise after downsampling the high-resolution remote sensing image. The second application scene is that the shot high-resolution image is affected by blurring under the condition that the quality of the equipment for shooting the remote sensing image is poor. For the second application scene, three fuzzy modes are introduced for better generalization performance of the model: isotropic gaussian blur, anisotropic gaussian blur, and motion blur. Meanwhile, JEPG compression noise is added to the image, and the image degradation process is simulated.

S3: and (3) performing one-stage MAE model training, inputting the LR image into the MAE model, and enabling the encoder to learn prior information of the remote sensing image, namely learning the characteristics of the input low-resolution image. Separately training MAE, learning priori information of remote sensing image, and calculating mask pixels and original mask images generated by MAE loss functionEuclidean distance between elementsRepresented by the following formula (1):（1）

wherein the method comprises the steps ofRepresenting the generation of mask pixels, ">Representing the original mask pixels, n is the total number of pixels of the image, the input image 224 x 224 of the method, where n=224 x 224. The Euclidean distance is used in mathematics to calculate the distance between two points, in computer vision tasks, the distance per pixel between 2 images is calculated, the difference between 2 images is measured, euclidean distance>The smaller the image the more alike.

S4: building a new reconstruction network; the reconstruction network and the MAE are trained simultaneously, and the high-resolution reconstruction is carried out on the low-resolution remote sensing image by using the prior information learned by the MAE. In order to solve the problem that the edge of a reconstructed image is not clear, an edge attention module is innovatively provided in a reconstruction network, and the edge position is given more learning weight by extracting gradient information. The second-stage reconstruction network structure is shown in fig. 3, and the second-stage reconstruction network model is divided into three parts as a whole:

s41: the first part performs shallow feature extraction (Shallow Feature Extraction, SFE), which mainly focuses on the extraction of low-level visual features, such as texture, color, shape, etc. The SFE module extracts multi-scale features by using convolution layers with convolution kernel sizes of 3, 5 and 7 respectively, the sizes of different convolution kernels represent receptive fields with different sizes, the sizes of the receptive fields are different, and the extracted image features have different scales, so that the extraction of the features of targets with different scales in the remote sensing image is facilitated. A kind of electronic deviceAnd then connecting the three extracted features in the channel dimension, reducing the number of channels by using a convolution layer with the convolution kernel size of 1 multiplied by 1, and fusing the multi-scale features. The SFE module preliminarily extracts multi-scale features in the remote sensing image, and the multi-scale features are represented as the following formula (2):（2）

wherein the method comprises the steps ofMapping function representing shallow feature extraction, +.>Representing the input low resolution remote sensing image,representing the extracted features;

s42: the second part is used for deep feature extraction and consists of residual branches and feature branches, and the multi-level feature extraction of priori information and shallow features is realized through a deep learning model, so that higher-level semantic features can be learned, and the method has stronger expression and generalization capability. The feature branch directly transmits the shallow layer features extracted from the first part to the rear of the network, more information can be provided to flow to the deep layer part of the network, so that the network can be converged to an optimal solution more quickly in the training process, and thus, the residual branch only needs to learn a residual (or difference) part between input and output, the network convergence is accelerated, and the gradient vanishing problem is relieved;

residual branches refer to the structural design of the UNet model, and a convolution layer in the middle of each branch divides the branch into a front part and a rear part. The front part and the rear part are respectively cascaded with r basic blocks, the structures of the r basic blocks are mutually corresponding, and each basic Block sequentially consists of a multi-scale receptive field attention module (Muti-Scale Receptive Field Attention Block, MRFAB), a residual fusion Block (Residual Fusing Block, RFB) and a priori module (Prior Block, PB). Wherein MRFAB and RFB are responsible for solving the feature learning sub-problem, PB is responsible for solving the priori learning sub-problem;

in order to better extract the spatial information and edge information of the image, we have designed an improved attention module (Improving Spatial Attention Block, ISAB) in MRFAB;

ISAB has two branches, as shown in fig. 5, the first branch passes through a spatial attention module (SAB), it carries out mean pooling (AvgPool) and maximum pooling (MaxPool) on the input feature map in the spatial dimension, the size of the input feature map becomes that the pooled result is linked in the channel dimension, then each element of the matrix is changed into a probability value between 0 and 1 through a convolution layer and Sigmoid activation function, the probability value is multiplied by the spatial channel weight, the larger the multiplied result is to indicate that the information of the spatial position is more useful, the useful information can be reserved, and the useless information is restrained;

the second branch adopts an edge attention module, and convolves an input feature map with a Sobel operator extracted by an edge to extract gradient information of the feature map in the x direction and the y direction respectively, and then square sums the gradient information and then square opens the sum to obtain a gradient matrix of the feature map; the gradient matrix is changed into a weight matrix after passing through a convolution layer and a Sigmoid activation function to obtain weight parameters of edge positions, wherein the weight parameters of the spatial channels are as followsThe weighting parameter of the edge position is +.>. The ISAB performs weighted summation on the spatial attention and the edge attention and then outputs the weighted summation, so that the spatial position and the edge position on the feature map can be focused, the detail information can be reserved, and the experiment is carried out to set +.>。

The inputs of the previous base block are from a priori information and the output of the last base block, the output of the previous r base block being expressed as equation (3):

（3）

wherein the method comprises the steps ofMapping function representing the r-th basic block of the previous part,/->Representing the output of the r-th basic block,representing the output of the previous base block of the r-th base block; the latter part also concatenates multiple base blocks, each base block structurally adding one convolutional layer compared to the base blocks of the former part. The inputs of the base blocks of the latter part come not only from a priori information and the output of the last base block, but also from the outputs of the corresponding base blocks of the former part, which are concatenated in the channel dimension (Concat), and feature fusion is performed using a convolutional layer. Thus, the characteristics learned by the shallow network are fully utilized, and the loss of the characteristics is reduced. The output of the r-th basic block of the latter part is expressed as formula (4):（4）

wherein the method comprises the steps ofRepresenting the output of the r-th basic block of the following part,/->Representing the mapping function of the r-th basic block of the latter part,Concatrepresenting channel dimension join operations, ">The output of the corresponding basic blocks of the front part is denoted, where n is the total number of modules in the network, n=r (the first r basic blocks) +1 (the convolution blocks of the middle part) +r (the last r basic blocks).

S43: the third part is an up-sampling module, so that the spatial resolution is improved. The up-sampling mode adopts sub-pixel convolution, so that the generation of artifacts in the reconstruction process is avoided, and an image containing abundant detail information is reconstructed.

The loss function of the up-sampling module adopts L1 loss, and calculates the absolute difference between the reconstructed image and the real high-resolution image, so that the training process of the model is more stable, and the model is represented as the following formula (5):（5）

wherein the method comprises the steps ofRepresenting reconstructed image +.>Representing a true high resolution image; the total loss function is expressed as formula (6):（6）。

the invention adopts Peak Signal-to-Noise Ratio (PSNR), structural similarity (Structure Similarity Index Measure, SSIM) and learning perception image similarity (Learned Perceptual Image Patch Similarity, LPIPS) as evaluation indexes to evaluate the performance of the model. PSNR is used to evaluate the quality of a reconstructed digital signal, the greater the PSNR, the better the quality of the reconstructed image. The SSIM is an index for measuring the similarity of two digital images, and the larger the SSIM is, the higher the similarity between the reconstructed image and the real image is. The LPIPS is used for measuring the perceived similarity between two images, and the smaller the LPIPS is, the larger the similarity between the reconstructed image and the real image is. And 3 evaluation indexes objectively and comprehensively evaluate the reconstruction performance of the model from the angles such as pixel distance, structural similarity, perception similarity and the like.

Four algorithm models Bicubic, EDSR, RCAN, DASR are selected to train on the same data set with the model provided by the invention, and test results are compared. Bicubic is a commonly used interpolation algorithm that can be used to improve the spatial resolution of an image. EDSR is a high-resolution reconstruction model proposed by the CVPR conference in 2017, and is a residual structure as a whole, a main branch is connected with a plurality of residual blocks in series, and the upsampling adopts a sub-pixel convolution mode, so that the performance of the model exceeds the most advanced algorithm at the time. RCAN is a high-resolution reconstruction model proposed in 2018, has a similar architecture to EDSR, and provides a residual attention block, wherein a channel attention module is connected in series on a residual branch of the residual block, and important features are focused, so that the performance of the model is improved. DASR is a high resolution reconstruction model proposed by the 2021 CVPR conference to address complex degradation processes.

In addition, nine degradation models with specific parameters are selected for testing, wherein Bicubic represents the degradation model which is only subjected to Bicubic downsampling, other treatments are added to the rest degradation models except for Bicubic downsampling, ISO represents isotropic Gaussian blur, ANI represents anisotropic Gaussian blur, motion represents Motion blur,represents fuzzy core variance (subscript indicates size),>the gaussian white noise variance (subscript indicates the size) is shown, and the reconstruction results when the five algorithm models face different degradation models are compared respectively. The nine degradation models are subjected to JPEG compression processing with a compression quality of 95 by default. The comparison of the reconstruction result and other algorithms when the amplification factors are 2 and 4 from the perspective of the evaluation index PSNR/SSIM/LPIPS is shown in the table 1 and the table 2:

table 1 amplification factor x 2 reconstruction results

Table 2 amplification factor x 4 reconstruction results

From the reconstruction result of objective index reaction, the reconstruction model based on the deep learning obtains the result superior to the Bicubic interpolation algorithm under various degradation models, and proves the superiority of the reconstruction model based on the deep learning.

When the amplification factor is 2, the double three downsampling is taken as the back-offEDSR, RCAN model trained with model under Bicubic only, with ISO/uFuzzy core and ANI-bearing _>The fuzzy kernel obtains results superior to other models when three degradation models with relatively weak degradation degrees are used, which shows that EDSR and RCAN models can better solve the problems of small amplification factors and high-resolution reconstruction with relatively weak degradation degrees. However, when the degree of degradation is enhanced, the degradation model is more complex, including motion blur, greater blur kernel variance, and noise addition, the performance of the EDSR and RCAN models is significantly degraded, e.g., when the degradation model is derived from ISO uBecome ISO_ the->When EDSR's PSNR index is reduced by 3.09dB, RCAN is reduced by 3.17dB, and DASR is reduced by 2.65dB, the model provided by the invention is reduced by only 1.70dB. And the reconstruction result is also worse than the DASR model and the model proposed by the present invention. This demonstrates that the high resolution reconstruction model based on two-stage training is more able to cope with degradation models of high degree of degradation and complexity. The high-resolution reconstruction model provided by the method not only obtains competitive results when dealing with the degradation model with weak degradation degree, but also obtains results superior to other algorithms when dealing with the degradation model with high degradation degree and high complexity. When the amplification factor is 4, more detail information needs to be reconstructed at the moment, and reconstruction tasks are more difficult.

In conclusion, the high-resolution reconstruction model provided by the invention obtains competitive results when the amplification factor is small and the degradation model with relatively weak degradation degree is processed, and the high-resolution reconstruction model provided by the invention obtains the best results when the amplification factor is large and the degradation model with relatively strong degradation degree is processed, which shows that the high-resolution reconstruction model provided by the invention can reconstruct abundant detail information and has relatively high robustness to complex degradation models.

From the viewpoint of visual perception, fig. 7 shows the reconstruction results of each reconstruction model when the magnification factor is 4 and bicubic downsampling is the degradation model. The reconstructed image of the Bicubic algorithm is smooth and fuzzy, and the reconstruction detail is less. The images reconstructed by the EDSR model and the RCAN model contain more noise, and the DASR model and the reconstruction model provided by the invention have better visual effects. But compared with the DASR model, the image edge reconstructed by the model is clearer, and more detail information is reconstructed.

Figure 8 shows that the magnification factor is 4,is the reconstruction result of each reconstruction model when the model is degraded. It can be seen that the Bicubic algorithm, the EDSR model and the RCAN model have difficulty in eliminating noise in the low resolution image when the degradation model introduces noise, and noise in the reconstructed image is significant. Both DASR and the models presented herein have better noise removal capabilities, but DASR reconstructed images are smoother, losing some detailed information. The reconstructed image of the model provided by the invention can still keep clearer edges and recover more image features.

Claims

1. A high-resolution reconstruction method for a remote sensing image under a complex degradation model is characterized by comprising the following steps:

wherein the method comprises the steps ofA mapping function representing the r-th basic block of the previous section; />Representing the output of the r-th basic block,/v>Representing the output of the previous base block of the r-th base block; the latter part is also cascaded with a plurality of basic blocks, the input of the basic block of the latter part is from the prior information and the output of the last basic block, and the output of the basic block corresponding to the former part is also from the output of the basic block corresponding to the former part, which are connected in the channel dimension and are subjected to feature fusion by utilizing a convolution layer; the output of the r-th basic block of the latter part is expressed as formula (4): />（4）

Wherein the method comprises the steps ofRepresenting the output of the r-th basic block of the following part,/->Representing the mapping function of the r-th basic block of the latter part,Concatrepresenting channel dimension join operations, ">Representing the output of the corresponding basic blocks of the previous part, wherein n is the total number of modules in the network, n=the first r basic blocks+1 convolution blocks of the middle part+the last r basic blocks;

2. The method for high resolution reconstruction of a remote sensing image under a complex degradation model according to claim 1, wherein the loss function of the masked self-encoder model uses euclidean distanceRepresented by the following formula (1):（1）

wherein the method comprises the steps ofRepresenting the generation of mask pixels, ">Representing the original mask pixels, n is the total number of pixels of the image.

3. The high-resolution reconstruction method for remote sensing images under complex degradation models according to claim 2, wherein the loss function of the upsampling module uses L1 loss to calculate the absolute difference between the reconstructed image and the real high-resolution image, so that the training process of the model is more stable, and the expression is as shown in formula (5):（5）

4. a high resolution reconstruction method for a remote sensing image under a complex degradation model according to claim 3, wherein an improved attention module is designed in the multiscale receptive field attention module;

the improved attention module is provided with two branches, wherein the first branch passes through the space attention module, and the improved attention module performs average value pooling and maximum value pooling on the input characteristic diagram in the space dimension, and the input characteristic diagram is [ ]B,C,H,W) The size is changed into%B,1,H,W) The pooled results are linked in the channel dimension, and then each element of the matrix is changed into a probability value between 0 and 1 through a convolution layer and a Sigmoid activation function, and the probability value is combined with the spatial channel weight parameterThe larger the multiplied result is, the more useful the information indicating the spatial position is;

the second branch adopts an edge attention module, and convolves an input feature map with a Sobel operator extracted by an edge to extract gradient information of the feature map in the x direction and the y direction respectively, and then square sums the gradient information and then square opens the sum to obtain a gradient matrix of the feature map; the gradient matrix is changed into a weight matrix after passing through a convolution layer and a Sigmoid activation function to obtain weight parameters of edge positions, wherein the weight parameters of the spatial channels are as followsThe weighting parameter of the edge position is +.>The method comprises the steps of carrying out a first treatment on the surface of the The improved attention module performs weighted summation on the spatial attention and the edge attention and outputs the weighted summation.