CN117830100A

CN117830100A - Remote sensing image super-resolution reconstruction method and system based on depth layer feature fusion

Info

Publication number: CN117830100A
Application number: CN202311868541.1A
Authority: CN
Inventors: 李路; 王密
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-04-05

Abstract

The invention provides a remote sensing image super-resolution reconstruction method and a remote sensing image super-resolution reconstruction system based on depth layer feature fusion. The method adds common noise in the remote sensing image in the training data set, increases the diversity of the image degradation model, and fuses shallow characteristic information and deep characteristic information of different scales of the remote sensing image, thereby providing richer and complete information for reconstructing high-quality details and remarkably improving the reconstruction quality of the remote sensing image.

Description

Remote sensing image super-resolution reconstruction method and system based on depth layer feature fusion

Technical Field

The invention belongs to the field of optical satellite remote sensing image processing, and particularly relates to a remote sensing image super-resolution reconstruction method system based on deep and shallow layer feature fusion.

Background

The satellite remote sensing image is an image obtained by observing and measuring the earth surface by a satellite-mounted sensor. These images provide rich information about the earth's surface, including data on various aspects of topography, land coverage, climate, sea, city, etc. The satellite remote sensing image has wide application in the fields of agriculture, environment monitoring, city planning, natural resource management, disaster monitoring and the like. In recent years, along with the increasing demands of users for high-resolution remote sensing images, particularly in the fields of target detection, change detection, semantic segmentation and the like, the high-resolution remote sensing images can provide finer and richer detail information for the users. However, since the satellite orbit is far away from the earth surface, the satellite sensor is limited by the acquisition environment, storage and communication transmission in the image acquisition process, the image quality of the obtained remote sensing data is often low, and meanwhile, the image definition is reduced and the detail information is lost due to factors such as camera shooting distortion, optical diffraction limit, defocus, camera shake, noise, lossy compression and the like. Traditionally, the method for improving the resolution of the satellite remote sensing image is to upgrade the performance of the image acquisition system, mainly comprising the steps of reducing the number of pixels, increasing the caliber of the image acquisition system and increasing the focal length of a camera, but the research and development cost is also greatly increased.

The image super-resolution reconstruction technology is an image processing technology, and aims to generate an image with high resolution from a low-resolution image through a series of algorithms and methods, wherein the process aims to improve the quality of the image, recover the lost detail information of the image, make the detail information clearer and finer, so as to meet the requirements of various applications, reduce the dependence on hardware upgrading and reduce the cost.

At present, image super-resolution reconstruction algorithms are classified according to research methods and mainly can be divided into three categories: interpolation-based methods, reconstruction-based methods, and learning-based methods. The super-resolution image reconstruction algorithm based on interpolation mainly utilizes a mathematical formula to simulate adjacent pixel values to perform interpolation processing on an image so as to reconstruct a high-resolution image, mainly comprises nearest interpolation, bilinear interpolation, tertiary interpolation and the like, is mainly based on a mathematical theory, has simple algorithm design and low complexity, but is based on the premise that the adjacent pixel values of the image are similar in the reconstruction process, so that a reconstruction result is greatly received into a peripheral pixel image, edge detail information is seriously lost, and the distortion condition of the high-resolution image is serious. The image super-resolution reconstruction algorithm based on reconstruction is to simulate the quality degradation process of an image into a mathematical model through learning priori knowledge, obtain the reconstruction image closest to a real image through solving the mathematical degradation model, and mainly comprises a maximum posterior probability method, an iterative back projection method and a convex set projection method, wherein the algorithm mainly depends on the priori knowledge of the image, the image execution of different layers can lead to non-unique feasible solution space, and the worse the detail recovery effect of the reconstructed high-resolution image is along with the increase of up-sampling multiple, a large number of high-frequency detail features are deleted, and the condition of over-blurring exists subjectively. The learning-based image super-resolution reconstruction method mainly comprises two stages, wherein the first stage is in the early stage of machine learning and mainly comprises a support vector machine, linear regression, sample learning and the like, the algorithm mainly utilizes data sets of various images to respectively train out the mapping relation between high-resolution image pairs and low-resolution image pairs, and the high-resolution image is reconstructed through the mapping relation of the trained image pairs; however, because the mapping relation between the high-resolution image and the low-resolution image is complex, the reconstruction result is greatly dependent on a sample set, and is not ideal for the remote sensing image reconstruction result with complex background; the second stage is a deep learning stage, mainly comprising a neural network, a deep convolution network, a generation countermeasure network and the like, wherein the method mainly takes a high-low resolution image pair as input, takes a convolution neural network model as backbone, learns characteristic information in a low resolution image through a convolution kernel, reconstructs the high resolution image on various image characteristics learned by the training model, does not depend on the information of an original image, has wider robustness and improves the visual effect of the reconstructed image. Most of the existing methods based on deep learning are directed against natural scene images, and aiming at remote sensing images with richer details and more complex background, the methods have defects in the reconstruction process, and meanwhile, the improvement of algorithms is mainly carried out by deepening the network layers and accelerating the convergence speed, so that the importance of the characteristics of the depth layers of the images on the image reconstruction effect is ignored.

Disclosure of Invention

Based on the method, aiming at the remote sensing images with image degradation and complex and various contents, the invention provides a remote sensing image super-resolution reconstruction method based on depth layer feature fusion, and modeling is carried out by fusing the shallow layer feature information and the deep layer feature information of different scales of the remote sensing image, so that the image quality of the reconstructed high-resolution remote sensing image is improved, and the richer detail features are recovered.

In order to achieve the above object and achieve the above technical effects, the steps of the technical scheme adopted in the invention are as follows:

step S1, acquiring a satellite remote sensing image data set, and dividing the satellite remote sensing image data set into a training set and a testing set according to a certain proportion;

step S2, preprocessing the data set obtained in the step S1;

step S3, a generator network which is provided with deep and shallow layer feature information of the remote sensing image and fuses different layers of feature information is constructed, wherein the generator network comprises a shallow layer feature extraction layer, a deep layer feature extraction layer, a feature fusion layer, an up-sampling layer and a reconstruction layer and is used for reconstructing the high-resolution remote sensing image;

constructing a discriminator network for judging whether the reconstructed high-resolution remote sensing image is a real image or not, wherein the discriminator network comprises a plurality of convolution layers, an activation layer, a BN layer, a dense connection layer and a discrimination layer, and the generator network and the discriminator network jointly form a generation countermeasure network;

step S4, initializing parameters of a generator network and a discriminator network are set, the learning rate of the generator network and the discriminator network is adaptively adjusted by using an Adam optimizer, and the constructed generated countermeasure network is trained by using the preprocessed training set image to obtain parameters of a network model;

and S5, reconstructing the test set image by using the trained generator network.

Further, the preprocessing in step S2 includes: and adding Gaussian noise, stripe noise, random noise and compression noise, performing downsampling operation on the image to obtain a corresponding low-resolution image data set, performing rotation, translation and scaling operation on the data, expanding a sample library, and cutting the image to obtain a small image block with a fixed size.

Further, the specific processing procedure of the generator network is as follows:

step S31, inputting the low-resolution remote sensing image into a shallow feature extraction layer of a generator network, and extracting shallow feature information of the image;

s32, inputting the extracted shallow layer feature information into a deep layer feature extraction layer, wherein K deep layer feature extraction blocks are used in the deep layer feature extraction layer, and obtaining deep layer feature information of different layers;

step S33, shallow characteristic information and deep characteristic information are input into a characteristic fusion layer, so that efficient fusion of the deep and shallow characteristic information of the remote sensing image is realized;

step S34, the fused characteristic information passes through an up-sampling layer, the up-sampling layer comprises K up-sampling modules, and different up-sampling modules are connected with corresponding deep characteristic extraction blocks in a jump connection mode, so that cross-layer transmission of deep information with different scales is realized;

step S35, inputting the output information of the up-sampling layer and the shallow characteristic information into a reconstruction layer, and performing super-resolution reconstruction on the low-resolution image through the deep and shallow characteristic information to obtain a high-resolution remote sensing image.

Further, the shallow feature extraction layer consists of a convolution layer and an activation layer;

the deep feature extraction layer comprises 5 deep feature extraction blocks, each deep feature extraction block comprises 5 convolution layers and 5 activation layers, the 5 deep feature extraction blocks are respectively a deep feature extraction block 1, a deep feature extraction block 2, a deep feature extraction block 3, a deep feature extraction block 4 and a deep feature extraction block 5, each deep feature extraction block is connected with the shallow feature extraction layer in a jump connection mode, namely the deep feature extraction block 1 inputs feature information from the shallow feature extraction layer, and the deep feature extraction block 2 inputs feature information from the shallow feature extraction layer and deep feature information output by the deep feature extraction block 1; the deep feature extraction block 3 inputs feature information of the shallow feature extraction layer and deep feature information output by the deep feature extraction block 2; the deep feature extraction block 4 inputs feature information of the shallow feature extraction layer and deep feature information output by the deep feature extraction block 3, and the deep feature extraction block 5 inputs feature information of the shallow feature extraction layer and feature information output by the deep feature extraction block 4.

Further, the upsampling layer contains 5 upsampling modules, each upsampling module comprising three layers: the convolutional layer and the deconvolution layer are respectively assumed to be a sampling block 1, an up-sampling block 2, an up-sampling block 3, an up-sampling block 4 and an up-sampling block 5, wherein different up-sampling modules are connected with corresponding deep feature extraction blocks in a jump connection mode, namely, the output information of a feature fusion layer is input by the up-sampling block 1 and the output information of the deep feature extraction block 5, the output information of the deep feature extraction block 4 is input by the up-sampling block 2 and the output information of the up-sampling block 1, the output information of the deep feature extraction block 3 and the output information of the up-sampling block 2 are input by the up-sampling block 3, the output information of the deep feature extraction block 1 and the output information of the up-sampling block 3 are input by the up-sampling block 5, and depth feature information containing different layers is obtained after the up-sampling layer is passed.

Further, the feature fusion layer comprises a convolution layer and an activation layer,

the reconstruction layer includes a deconvolution layer, an activation layer and a convolution layer.

Further, the discriminator network comprises 8 convolution layers, 9 activation layers, 7 BN layers, 2 dense connection layers and 1 discriminating layer, and the final output of the discriminator is true or false for judging whether the reconstructed high-resolution image is a real image or not.

Further, the objective function of generating the countermeasure network is:

wherein x represents an input real image, z represents an input noise, D (x) represents a probability of determining that the data is a real image after passing the real image through a discriminator network, G (z) represents an image reconstructed through a generator network, D (G (z)) represents a probability of determining that the data is a real image after passing the reconstructed image through the discriminator network, x to p _data (x) Representing the probability distribution, z-p, obeyed by the real image _z (z) represents the probability distribution to which the noise data obeys.

Further, in step S4, the generated countermeasure network is trained using a loss function,

wherein,representing a content loss function; />Representing the contrast loss function, λ is the weight, and the content loss function and the contrast loss function are respectively as follows:

wherein,representing a feature map obtained after activation of a jth convolution layer before an ith maximum pooling layer of a VGG network, W _i,j And H _i,j Representing the spatial dimension of the corresponding feature map, x, y representing the spatial coordinates of any point on the feature map, N representing the number of pixels of the image, I ^LR Representing low resolution images, I ^HR Representing the corresponding original high resolution image, G _θG (I ^LR ) Representing super-resolution images reconstructed from low-resolution images via a generator network, D _θD (G _θG (I ^LR ) Representing the probability that the arbiter considers the super-resolution image reconstructed by the generator to be a true image.

The invention also provides a remote sensing image super-resolution reconstruction system based on depth layer feature fusion, which comprises the following modules:

the data set acquisition module is used for acquiring a satellite remote sensing image data set and dividing the satellite remote sensing image data set into a training set and a testing set according to a certain proportion;

the preprocessing module is used for preprocessing the data set obtained by the data set obtaining module;

the system comprises a generation countermeasure network construction module, a remote sensing image generation countermeasure network construction module, wherein the generation countermeasure network construction module is used for constructing a generator network which is provided with deep and shallow layer feature information of a remote sensing image and is used for fusing different layers of feature information, and the generator network comprises a shallow layer feature extraction layer, a deep layer feature extraction layer, a feature fusion layer, an up-sampling layer and a reconstruction layer, and is used for reconstructing a high-resolution remote sensing image;

the network training module is used for setting initialization parameters of the generator network and the discriminator network, adaptively adjusting the learning rate of the generator network and the discriminator network by using an Adam optimizer, and training the constructed generation countermeasure network by using the preprocessed training set image to obtain weight parameters of the network model;

and the reconstruction module is used for reconstructing the test set image by utilizing the trained generator network weight parameters.

According to the technical scheme, the remote sensing image super-resolution reconstruction method based on deep and shallow layer feature fusion is provided, various common noises in the remote sensing image are added in the process of constructing a data set, the diversity of an image degradation model is increased, a deconvolution is used for replacing a convolution layer of an original network, shallow layer feature information extracted by the network and multi-scale deep layer feature information are subjected to interactive fusion, the information of the reconstructed image is enriched, and the quality of the reconstructed remote sensing image is remarkably improved.

Drawings

Fig. 1 is a schematic diagram of a remote sensing image super-resolution reconstruction method based on depth layer feature fusion.

Fig. 2 is a schematic diagram of a network model of a generator constructed in the present invention.

Fig. 3 is a schematic diagram of a deep feature extraction block structure in a generator network according to the present invention.

Fig. 4 is a schematic diagram of a network model of a arbiter constructed in the present invention.

Fig. 5 is a diagram showing the results of high-resolution remote sensing images reconstructed under different magnifications according to the present invention.

Detailed Description

For the purpose of illustrating the invention and for further clarity, the following detailed description of the invention is given by way of example with reference to the accompanying drawings and examples, so that the features and properties of the invention may be more readily understood by those skilled in the relevant art, and the scope of the invention may be more clearly and definitely defined.

The embodiment of the invention discloses a super-resolution reconstruction method of a satellite remote sensing image, which comprises the following steps:

and step 1, using a FAIR1M data set, dividing the FAIR1M data set into a training set, a test set and a verification set according to the ratio of 7:2:1, and respectively using the FAIR1M data set for model training, verification and test of image super-resolution reconstruction.

And 2, preprocessing the data set obtained in the step 1, adding Gaussian noise, stripe noise, random noise and compression noise which are common in the remote sensing image, performing downsampling operation on the image to obtain a corresponding low-resolution image data set, performing data enhancement operations such as rotation, translation and scaling on the data, expanding a sample library, and cutting the image with a center to obtain 256×256 small image blocks.

And 3, constructing a remote sensing image super-resolution model based on depth layer characteristic information fusion as shown in fig. 1, wherein the remote sensing image super-resolution model mainly comprises a generator network as shown in fig. 2 and a discriminator network as shown in fig. 4.

Step 4, training data are transmitted into a generator network, and shallow characteristic information is obtained by the data through a shallow characteristic extraction layer in the generator network; obtaining deep characteristic information through a deep characteristic extraction layer, then enabling the deep characteristic information and shallow characteristic information to pass through a characteristic fusion layer to obtain deep and shallow characteristic information, and enabling the fused characteristic information and deep characteristic information of different layers to pass through an upsampling layer to obtain characteristic information of different scales; and finally, reconstructing the high-resolution remote sensing image by the shallow characteristic information and the output information of the up-sampling layer through the reconstruction layer.

Specifically, the shallow feature extraction layer of the generator network includes a convolution layer and an activation layer. The deep feature extraction layer comprises five deep feature extraction blocks, namely a deep feature extraction block 1, a deep feature extraction block 2, a deep feature extraction block 3, a deep feature extraction block 4 and a deep feature extraction block 5, wherein each deep feature extraction block is connected with the shallow feature extraction layer in a jump connection mode, namely in the deep feature extraction layer, the deep feature extraction block 1 inputs feature information from the shallow feature extraction layer, and the deep feature extraction block 2 inputs feature information from the shallow feature extraction layer and deep feature information output by the deep feature extraction block 1; the deep feature extraction block 3 inputs feature information of the shallow feature extraction layer and deep feature information output by the deep feature extraction block 2; the deep feature extraction block 4 inputs feature information of the shallow feature extraction layer and deep feature information output by the deep feature extraction block 3, and the deep feature extraction block 5 inputs feature information of the shallow feature extraction layer and feature information output by the deep feature extraction block 4. Each deep feature extraction block comprises 5 convolution layers and 5 activation layers, wherein the 5 convolution layers and the 5 activation layers form a dense residual structure in a dense connection and jump connection mode, namely, the 1 st convolution layer receives input information, the 2 nd convolution layer receives input information and output information of the 1 st convolution layer, the 3 rd convolution layer receives input information, output information of the 1 st and 2 nd convolution layers, the 4 th convolution layer receives input information, output information of the 1 st, 2 nd and 3 rd convolution layers, and the 5 th convolution layer receives input information, output information of the 1 st, 2 nd, 3 rd and 4 th convolution layers, as shown in fig. 3. Each dense connection produces new features that are added to the previous features to gradually extract feature information, and the jump connection enables the network to learn the residual map, helps to alleviate the gradient vanishing problem, and makes the network easier to optimize.

Compared with the traditional fusion layer, the invention removes a normalization layer (BN layer) in the fusion layer, in the image super-resolution reconstruction task, the network requires the output image to be consistent in the requirements of color, contrast, brightness and the like, the super-resolution reconstruction task changes only resolution and detail information, the color distribution is normalized after the image passes through the BN layer, the contrast information of the reason image is destroyed, and meanwhile, the invention also occupies calculation resources, the training speed is slow and unstable, and the removal of the BN layer can ensure the color information of the image, and meanwhile, the difficulty and time of training are reduced.

The upsampling layer is mainly the multi-scale feature information that receives the output of the different depth feature extraction blocks. The upsampling layer comprises five upsampling blocks, respectively upsampling block 1, upsampling block 2, upsampling block 3, upsampling block 4 and upsampling block 5. Each upsampling block includes three layers: the device comprises a convolution layer and a deconvolution layer activation layer, wherein an up-sampling block is connected with a corresponding deep feature extraction block in a jumping connection mode, receives output information of an upper layer, and achieves cross-layer transmission of information to obtain deep feature information of different scales, namely, in the up-sampling layer, the up-sampling block 1 inputs the output information of a feature fusion layer and the output information of a deep feature extraction block 5, the up-sampling block 2 inputs the output information of the deep feature extraction block 4 and the output information of the up-sampling block 1, the up-sampling block 3 inputs the output information of the deep feature extraction block 3 and the output information of the up-sampling block 2, the up-sampling block 4 inputs the output information of the deep feature extraction block 2 and the output information of the up-sampling block 3, and after the up-sampling layer, the deep feature information containing different levels is obtained.

The reconstruction layer mainly comprises a deconvolution layer, an activation layer and a convolution layer, the reconstruction layer inputs the multi-scale deep characteristic information output by the up-sampling layer and the shallow characteristic information output by the shallow characteristic extraction layer, and finally, a high-resolution image is reconstructed.

In a specific embodiment, the convolution kernel size of the convolution layer in the shallow feature extraction layer is 9×9, the step size is 1, the padding is 4, the number of convolution kernels is 32, and the activation function adopts a PReLU activation function. The convolution kernel size of the front four convolution layers of all the deep feature extraction blocks in the deep feature extraction layer is 3 multiplied by 3, the step length is 1, the filling is 1, and the activation function is a PReLU activation function; the convolution kernel sizes of the last convolution layers of the deep feature extraction blocks 1, 2, 3 and 4 are 3 multiplied by 3, the step length is 2, the filling is 1, the convolution kernel sizes of the last convolution layers of the deep feature extraction blocks 5 are 3 multiplied by 3, the step length is 1, and the filling is 1; the activation function is a ReLU activation function; the number of convolution kernels of deep feature extraction blocks 1, 2, 3, 4, 5 is 64, 128, 512, 1024, respectively. In the feature fusion layer, the convolution kernel size of the convolution layer is 1 multiplied by 1, the step length is 1, the filling is 0, the number of the convolution kernels is 1024, and the activation function is a LeakyReLU activation function. In the up-sampling layer, the convolution kernel sizes of the convolution layers of the five up-sampling blocks are 3×3, the step length is 1, the filling is 1, the deconvolution layer convolution kernel sizes are 3×3, the step length is 2, the filling is 0, the activation functions are ReLU activation functions, and the convolution kernel numbers of the up-sampling blocks 1, 2, 3, 4 and 5 are 1024, 512, 256, 128 and 64 respectively. Taking 2 times of reconstruction image magnification as an example, in the reconstruction layer, the convolution kernel size of the deconvolution layer is 3×3, the step length is 2, the filling is 0, the number of convolution kernels is 32, the activation layer is a Tanh activation function, the convolution kernel size of the convolution layer is 9×9, the step length is 1, the filling is 4, and the number of convolution kernels is 3.

And 5, inputting the high-resolution remote sensing image reconstructed in the step 4 and the real image into a discriminator network, firstly, judging whether the reconstructed high-resolution remote sensing image is the real image or not through a series of convolution layers in the discriminator network and finally through a discrimination layer, fixing the discriminator network if the reconstructed high-resolution remote sensing image is the real image, training the generator network, and if the reconstructed high-resolution remote sensing image is the real image, fixing the generator network and training the discriminator network until dynamic balance is achieved, and stopping training.

In a specific embodiment, the number of convolution kernels of the 1 st, 3 rd, 5 th and 7 th convolution layers connected in left-to-right order of the arbiter network (see fig. 4) is 64, 128, 256 th and 512 th convolution layers, the convolution kernels are 3×3 in size, the step size is 1, and the padding is 1; the convolution kernel numbers of the 2 nd, 4 th, 6 th and 8 th convolution layers are 64, 128, 256 th and 512 th convolution layers, the convolution kernel sizes are 3×3, the step size is 2, and the filling is 1; the number of convolution kernels of the 1 st dense connecting layer is 1024, the convolution kernel size is 1 multiplied by 1, the filling is 0, and the step length is 1; the number of convolution kernels of the second dense connecting layer is 1, the convolution kernel size is 1 multiplied by 1, the filling is 0, and the step length is 1; all BN layers used batch norm2d; the 1 st to 9 th activation layers all use the LeakyReLU activation function, and the slopes are all set to 0.2; the last discrimination layer uses a sigmoid function.

Inputting the high-resolution image obtained by reconstructing the generator network into the discriminator, carrying out true or false judgment by utilizing the discriminator and the real image, finally outputting true or false by the discriminator, judging whether the reconstructed high-resolution image is the real image or not, fixing the generator network if the discriminator cannot accurately judge the real image and the reconstructed image, training the discriminator network to enable the discriminator to distinguish the real image from the reconstructed image, fixing the discriminator network if the discriminator can accurately judge the real image and the reconstructed image, training the generator network, enabling the image generated by the generator network to pass through the discriminator network, enabling the discriminator network to accurately recognize the reconstructed image, repeatedly iterating the training process, finally enabling the reconstructed image of the generator network to be more and more difficult to recognize by adjusting the weight of the network, and continuously optimizing the discriminator network until the discrimination capability of the high-resolution image reconstructed by the generator network is also continuously improved until the high-resolution image reconstructed by the discriminator cannot distinguish the real image and the reconstructed image at the moment, forming a dynamic balance state by the generator network and the discriminator network, and completing the training.

The objective function for generating the countermeasure network is:

When the probability distribution of the reconstructed image is equal to that of the real image, the objective function has a globally optimal solution. Considering first when the arbiter network can be optimized, i.e. max of min-max, given a generator network parameter, the following formula shows that the first row takes the integral form of min-max and the second row will take an equivalent substitution, since in the generator network the data is generated with the input z, i.e. with p _z To generate p _g So it is equivalent to the second integration to use p _g Equivalent substitutions are made.

V(G,D)＝∫ _x P _data (x)log(D(x))d _x +∫ _z P _z (z)log(1-D(g(z)))d _z

＝∫ _x P _data (x)log(D(x))+p _g (x)log(1-D(x))d _x (equation 2)

When equation (2) takes the maximum value, the distribution of the arbiter network is as follows:

at this time, parameters of the arbiter network are fixed to solve an optimal solution of the generator network, and the formula (3) is substituted into the formula (1), so that the following can be obtained:

the formula 4 is rewritten as an integral to give the following:

multiplying the numerator denominator of equation 5 by 1/2 simultaneously yields equation 6 and equation 7, i.e., the addition of two logarithms equals the multiplication of the numbers in the logarithms:

writing the integral of equation 7 separately yields:

in formula 8, p _data And p _g Is 1, and the KL divergence formula is applied to obtain formula 9:

and (3) applying a JS divergence formula to the formula 9 to obtain:

-2log2+2JSD(P _data ||p _g (x) (equation 10)

Solving for the minimum of the function of equation 10, i.e., for the JS divergence, when the two distributions are the same, i.e., p _data ＝p _g Then the JS divergence can get the minimum value of 0, which indicates that the objective function has the optimal solution, that is, the optimal solution is reached when the probability distribution of the real image is consistent with the probability distribution of the reconstructed image.

And 6, setting initialization parameters of a generator network and a discriminator network, adaptively adjusting the learning rate by using an Adam optimizer, setting the first-order estimated exponential decay rate to be 0.5, setting the initial learning rate of the generator network to be 0.0001, and setting the initial learning rate of the discriminator network to be 0.0004. And training by adopting different upsampling multiples to obtain parameters of the reconstructed network model, wherein the upsampling multiples are 2 times, 4 times and 8 times.

The loss function formula is as follows:

wherein,representing a content loss function; />Representing the contrast loss function, the content loss function and the contrast loss function are shown in equations 12 and 13, respectively:

wherein,representing a feature map obtained after activation of a jth convolution layer before an ith maximum pooling layer of a VGG network, in a specific embodiment, feature extraction is performed by using a network of a fourth convolution layer before a fifth maximum pooling layer of the VGG network, W _i,j And H _i,j Representing the spatial dimension of the corresponding feature map, x, y representing the spatial coordinates of any point on the feature map, N representing the number of pixels of the image, I ^LR Representing low resolution images, I ^HR Representing the corresponding original high resolution image, G _θG (I ^LR ) Representing super-resolution images reconstructed from low-resolution images via a generator network, D _θD (G _θG (I ^LR ) Representing the probability that the arbiter considers the super-resolution image reconstructed by the generator to be a true image.

In a specific embodiment, PSNR and SSIM are adopted as evaluation indexes, and objective evaluation is carried out on the reconstructed high-resolution remote sensing image, wherein the evaluation results are shown in Table 1:

TABLE 1 evaluation results of PSNR and SSIM

	PSNR	SSIM
			Up-sampling by a factor of 2	41.997	0.993
4-fold upsampling	37.9	0.99
			8 times upsampling	31.397	0.981

As shown in fig. 5, a is an input low-resolution remote sensing image, B is a 2-fold reconstructed high-resolution remote sensing image, C is a 4-fold reconstructed high-resolution remote sensing image, and D is an 8-fold reconstructed high-resolution remote sensing image.

The specific implementation manner of each module is the same as that of each step, and the invention is not written.

In summary, the method provided by the invention has the advantages that common noise of the remote sensing satellite image is introduced into the data set, the feature information of different layers is fused by using the feature fusion module, the feature richness of the reconstructed image is enhanced, and finally, the test result proves the effectiveness of the method in qualitative and quantitative aspects by carrying out the test on the real remote sensing data set.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A remote sensing image super-resolution reconstruction method based on depth layer feature fusion is characterized by comprising the following steps:

step S2, preprocessing the data set obtained in the step S1;

2. The remote sensing image super-resolution reconstruction method based on depth feature fusion of claim 1, wherein the method comprises the following steps of: the preprocessing in step S2 includes: and adding Gaussian noise, stripe noise, random noise and compression noise, performing downsampling operation on the image to obtain a corresponding low-resolution image data set, performing rotation, translation and scaling operation on the data, expanding a sample library, and cutting the image to obtain a small image block with a fixed size.

3. The remote sensing image super-resolution reconstruction method based on depth feature fusion of claim 1, wherein the method comprises the following steps of: the specific processing procedure of the generator network is as follows:

4. The remote sensing image super-resolution reconstruction method based on depth feature fusion according to claim 3, wherein the method comprises the following steps of: the shallow feature extraction layer consists of a convolution layer and an activation layer;

5. The remote sensing image super-resolution reconstruction method based on depth feature fusion of claim 4, wherein the method comprises the following steps of: the upsampling layer contains 5 upsampling modules, each upsampling block comprising three layers: the convolutional layer and the deconvolution layer are respectively assumed to be a sampling block 1, an up-sampling block 2, an up-sampling block 3, an up-sampling block 4 and an up-sampling block 5, wherein different up-sampling modules are connected with corresponding deep feature extraction blocks in a jump connection mode, namely, the output information of a feature fusion layer is input by the up-sampling block 1 and the output information of the deep feature extraction block 5, the output information of the deep feature extraction block 4 is input by the up-sampling block 2 and the output information of the up-sampling block 1, the output information of the deep feature extraction block 3 and the output information of the up-sampling block 2 are input by the up-sampling block 3, the output information of the deep feature extraction block 1 and the output information of the up-sampling block 3 are input by the up-sampling block 5, and depth feature information containing different layers is obtained after the up-sampling layer is passed.

6. The remote sensing image super-resolution reconstruction method based on depth feature fusion according to claim 3, wherein the method comprises the following steps of: the feature fusion layer comprises a convolution layer and an activation layer,

7. The remote sensing image super-resolution reconstruction method based on depth feature fusion of claim 1, wherein the method comprises the following steps of: the discriminator network comprises 8 convolution layers, 9 activation layers, 7 BN layers, 2 dense connection layers and 1 discriminating layer, and the final output of the discriminator is true or false for judging whether the reconstructed high-resolution image is a real image or not.

8. The remote sensing image super-resolution reconstruction method based on depth feature fusion of claim 1, wherein the method comprises the following steps of: the objective function for generating the countermeasure network is:

wherein x represents an input real image, z represents input noise, D (x) represents a probability of judging that the data is a real image after passing the real image through a discriminator network, G (z) represents an image reconstructed through a generator network, and D (G (z)) represents a graph to be reconstructedJudging the probability of the data being a real image after passing through a discriminator network, and x-p _data (x) Representing the probability distribution, z-p, obeyed by the real image _z (z) represents the probability distribution to which the noise data obeys.

9. The remote sensing image super-resolution reconstruction method based on depth feature fusion of claim 1, wherein the method comprises the following steps of: the generated countermeasure network is trained in step S4 using the following loss function,

10. The remote sensing image super-resolution reconstruction system based on depth layer feature fusion is characterized by comprising the following modules: