CN109035142B

CN109035142B - Satellite image super-resolution method combining countermeasure network with aerial image prior

Info

Publication number: CN109035142B
Application number: CN201810777731.5A
Authority: CN
Inventors: 黄源; 侯兴松; 赵世正
Original assignee: GUANGDONG XI'AN JIAOTONG UNIVERSITY ACADEMY; Xian Jiaotong University
Current assignee: GUANGDONG XI'AN JIAOTONG UNIVERSITY ACADEMY; Xian Jiaotong University
Priority date: 2018-07-16
Filing date: 2018-07-16
Publication date: 2020-06-19
Anticipated expiration: 2038-07-16
Also published as: CN109035142A

Abstract

The invention discloses a satellite image super-resolution method combining an antagonistic network with aerial image prior, which comprises the steps of firstly training a denoising model by using an image pair formed by a 16-level noise-containing image and a corresponding 16-level noise-free image, and then training the image super-resolution model by using clear aerial data. Because the satellite image and the aerial image do not exist in the condition of being paired, when the generated super-resolution image is subjected to image post-processing, the clear aerial image is adopted to construct the external prior dictionary of the GMM model, and the satellite image with unclear interior is guided to be reconstructed. And after reconstruction, in order to further improve the image quality, sharpening the image by using a Gaussian filtering mode. And finally, obtaining a high-resolution image of the original satellite image, and realizing the improvement of the image visual quality on the basis of the original satellite image. The effectiveness of the scheme can be seen from the experimental links. An effective idea is provided for solving the problems of satellite image super-resolution and image quality improvement under the condition of conditional limitation in reality.

Description

Satellite image super-resolution method combining countermeasure network with aerial image prior

Technical Field

The invention belongs to the technical field of image super-resolution, and particularly relates to a satellite image super-resolution method based on multi-scale sensing loss and generation countermeasure network combined aerial image prior.

Background

The image resolution is an important index of image quality, and an image with higher resolution can show more details more clearly, but is affected by hardware and external environment in the process of obtaining the image, and the obtained image resolution is lower, so that the problem of how to obtain a high-resolution image from a low-resolution image is caused. Currently, as the number of satellites increases, the range of the earth covered by the satellites is more than 90%, which makes the range which can be monitored by the satellites much larger than the range covered by images obtained by other means, but the satellite images are affected by various reasons and have lower resolution. For example, compared with an aerial image, a satellite image is relatively fuzzy and lacks of detailed information, but the coverage of the aerial image is far less than that of the satellite image, so that how to obtain a satellite image with higher resolution has important significance and value.

In the field of image super-resolution, the combination of a deep neural network and the traditional image super-resolution problem enables the image super-resolution technology to have a new breakthrough. With the development of computer hardware equipment, the cost of large-scale operation acceleration is obviously reduced, the cost of training a deep neural network is reduced, great convenience is brought to scientific researchers, and the technology is widely applied to various fields. From the network SRCNN of the originally proposed deep learning combined with super-resolution problem to the super-resolution algorithm SRGAN implemented by the present generation countermeasure network (GAN), a model for converting a low-resolution image into a high-resolution image is obtained by training network parameters using low-resolution and high-resolution images, and the high-resolution image is generated only in the case of the low-resolution image.

The image super resolution problem is described as follows:

the image super-resolution problem refers to a process of obtaining a corresponding high-resolution image from a low-resolution image, and the technology breaks through the limitation of the imaging hardware condition of the original system to obtain a clearer image. In image super-resolution technology, the super-resolution problem can be generally divided into two cases: a super-resolution method based on a single image and a super-resolution method based on a plurality of images. The super resolution of a single image is a method for improving the resolution of the image through a reconstruction algorithm by amplifying a low-resolution image. The super-resolution algorithm based on multiple images reconstructs a high-resolution image by using a method of fusing multi-frame similar image sequences.

In the super-resolution method based on a single image, an algorithm establishes a relationship between a low-resolution image and a high-resolution image. Thereby reconstructing a high resolution image from the low resolution image. The conventional algorithm simulates the cause of the low-resolution image in various ways, constructs various degradation models to fit the process of generating the low-resolution image, thereby constructing the relationship between the low-resolution image and the high-resolution image to predict and generate the high-resolution image. Such a simulation process can be described by the following equation:

I_L＝HI_H+n

wherein I_LFor low resolution images, I_HIs I_LAnd H is a degradation model for generating the low-resolution image, and n is a noise interference factor in the process of generating the low-resolution image. H, as a degradation model, can in turn be expressed as:

H＝D_Sub×B×G

wherein D is_SubRepresenting a down-sampling method, B is a blurring factor, and G is a geometric deformation factor.

The methods for solving the above degradation model construction mainly include an interpolation-based method, an image reconstruction-based method, and a learning-based method. In the interpolation method, the super-resolution of the image is realized by decomposing the image, interpolating and returning an interpolation value, the running speed is high, parallel calculation can be performed, and the requirement of real-time super-resolution of the image can be met. However, interpolation cannot predict the high frequency information lost from the low resolution image to the high resolution image, and the resulting high resolution image lacks texture details and sharp edges. The super-resolution algorithm based on image reconstruction is further divided into a spatial domain method and a frequency domain method, and the process from a low-resolution image to a high-resolution image is realized by establishing the corresponding relation between the low-resolution image and the high-resolution image in a spatial domain or a frequency domain and manually designing a corresponding relation model. Such as a comparative classical convex set projection method, maximum a posteriori probability estimation, etc. The method has the defects that the manually designed model cannot be suitable for various image detail recovery, the constructed model can only obtain good effect on a few data, and the image detail definition cannot be further improved under the condition of data increase.

In the learning-based methods, similar to the image reconstruction-based methods, they all implement the transition from the low-resolution image to the high-resolution image by establishing the relationship between the low-resolution image and the high-resolution image, but the learning-based methods use an external training sample to obtain a priori knowledge about the relationship between the low-resolution image and the high-resolution image. Such as manifold learning based methods, sparse representation based methods, and deep neural network based methods. The method is limited by the size of a built dictionary and reasons that data sparsity is difficult to guarantee in learning methods such as sparse representation, and a stable image super-resolution effect cannot be obtained. In the super-resolution method based on the deep neural network, the methods based on the residual error network and the generation countermeasure network, which are proposed, need to learn and train the low-resolution image and the high-resolution image pair through a large number of parameters. Meanwhile, when high-frequency information of a high-resolution image is predicted, deletion still occurs, so that a texture-rich area looks smooth.

The super-resolution method has the limitations of practical conditions in the satellite image super-resolution problem, and cannot acquire satellite images with very high resolution at present, so that data of a high-resolution satellite image and a low-resolution image pair are difficult to acquire during image super-resolution, and many super-resolution methods requiring a low-resolution image and a high-resolution image pair cannot be directly used for satellite image super-resolution tasks. When the satellite image is acquired, the noise influence is serious, so that the particle noise in the acquired image is obvious, and the direct super-resolution of the single image can amplify the noise in the image and influence the definition. As the auxiliary data, although the aerial image covers much less than the satellite image, the aerial image is very much similar to the satellite image, and has a very good definition with respect to the satellite image. The current acquired aerial image data and satellite image data do not have the paired property, namely shooting at non-same place and same time period. Under the existing limited conditions, how to denoise satellite images, how to super-resolve and how to enhance the definition of satellite data by utilizing clear aerial image data become a problem to be solved.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a satellite image super-resolution method based on multi-scale sensing loss and generation countermeasure network combined aerial image prior aiming at the defects in the prior art, so that the problem of insufficient prior of a clear image (lack of a clear satellite image) in the process of a super-resolution algorithm only using a satellite image can be solved, and a clearer satellite image can be generated. Meanwhile, under the condition of only using satellite data, a clearer super-resolution image can be generated compared with other methods due to the fact that multi-scale perception loss is added.

The invention adopts the following technical scheme:

a satellite image super-resolution method combining an antagonistic network with aerial image prior uses a 16-level noise-containing image and a corresponding 16-level noise-free image to form an image pair training denoising model, and then uses aerial data to train the image super-resolution model; the method comprises the steps of constructing a GMM model external prior dictionary by adopting aerial images, guiding an internally unclear satellite image to be reconstructed, finishing post-processing of a generated super-resolution image, sharpening the image by adopting a Gaussian filtering mode, and finally obtaining a high-resolution image of an original satellite image, so that the improvement of the image visual quality on the basis of the original satellite image is realized.

Specifically, the method comprises the following steps:

s1, defining a generator, a decision device and a multi-scale perception loss network in the generation countermeasure network;

s2, utilizing the image extracted from 18 levels in the existing satellite data to down-sample to 16 levels, and setting the obtained 16-level satellite image as the de-noised target I_{D_H}Satellite data extracted from 16 levels as noisy image I_{D_L}Forming an image pair, and setting the generated noise-free satellite image as I_{D_GH}；

S3, performing initialization training on the generator in the denoising model by using the image pair formed in the step S2, and calculating the mean square error of the pixels between the image generated by the generator and the corresponding target image by using the mean square error as a loss function in the initialization training to obtain the MSE generator loss function loss_MSECalculating gradient and returning and adjusting model parameters;

s4, after 100 epochs of initialization training, carrying out complete model training, calculating loss and corresponding gradient, transmitting back parameter models in an adjustment generator and a decision device, and sensing a loss network VGG19 without adjusting parameters;

s5, training 200 epochs to converge according to the settings, storing the model, using a training generator for denoising, and obtaining a denoised image I_{D_GH}Defining a satellite image super-resolution model as the input of image super-resolution;

s6, repeating the steps S3-S5, completing the super-resolution network training process and the denoising model, and then generating a super-resolution image I_{SR_GH}Constructing an external prior dictionary by adopting a Gaussian mixture model;

s7, constructing a GMM external prior dictionary, dividing clear aerial 17-level images into 15-15 small blocks, and then performing preliminary grouping according to Euclidean distance;

and S8, grouping and reconstructing the satellite images according to the reconstructed internal image blocks, and carrying out image sharpening operation on the reconstructed satellite images to obtain a final result image.

Further, in step S1, the generator in the generation countermeasure network is defined as: using a residual error network as a generator, wherein the residual error network comprises 16 residual error modules, and each residual error module comprises three convolutional layers;

the structure of the decision device is defined as: a 10-layer convolutional neural network is used as a decision device, and the convolutional layer of the convolutional neural network is convolved by a hole;

the perceptual loss at multiple scales is defined as: using VGG19 network pre-trained on IMAGENET1000 class classification database as loss-aware network by using_{conv2_2}，_{conv3_4}，_{conv4_4}And constructing a multi-scale perception loss through the multi-scale feature map in the multiple layers.

Further, in step S3, the MSE generator loss function loss_MSEThe following were used:

loss_MSE＝MSE(I_{D_GH},I_{D_H})

further, in step S4, during model training, the MSE generator loss function loss in the generator loss function is set_MSELoss of perception function (loss)_vggAnd loss function loss_GANThe generator loss function when the weighted sum forms the whole training is as follows:

loss_G＝loss_MSE+loss_vgg+loss_GAN

further, loss of perception_vggThe following were used:

loss_vgg＝10^-6×(loss_{mse_conv2_2}+loss_{mse_conv3_4}+loss_{mse_conv4_4})

loss_{mse_conv2_2}＝MSE(f_{i_conv2_2},f_{t_conv2_2})

loss_{mse_conv3_4}＝MSE(f_{i_conv3_4},f_{t_conv3_4})

loss_{mse_conv4_4}＝MSE(f_{i_conv4_4},f_{t_conv4_4})

wherein f is_{i_conv2_2}，f_{i_conv3_4}，f_{i_conv4_4}Generating image-to-perceptual model correspondences for input_{conv2_2}，_{conv3_4}，_{conv4_4}Layer feature map, f_{t_conv2_3}，f_{t_conv3_3}，f_{t_conv4_3}Inputting correspondences obtained in a perceptual model for generating corresponding target images of an image_{conv2_2}，_{conv3_4}，_{conv4_4}A layer feature map;

loss of function loss_GANThe following were used:

loss_GAN＝10^-4×cross_entropy(I_{D_GH},True)

cross_entropy(I_{D_GH},True)＝log(D(I_{D_GH}))

wherein D (-) is a decision device.

Further, in step S4, the overall training time decision device loss function loss_DIs defined as:

loss_D＝loss₁+loss₂

loss₁＝sigmoid_cross_entropy(I_{D_GH},False)

loss₂＝sigmoid_cross_entropy(I_{D_H},True)

further, in step S5, the super-resolution model includes a generator, a perceptual model and a determiner, the perceptual model and the determiner have the same structure as that used in the denoising model, and the generator in the image super-resolution model is defined as follows:

the method comprises the steps of constructing a residual module, then overlapping a plurality of residual modules to form a network structure main body, and realizing amplification use of an image through a sub-pixel convolution layer.

Furthermore, the data used by the generator training of the super-resolution model is aerial data, and the input is I_{SR_L} Low resolution 16 level aerial photograph and corresponding high resolution 17 level aerial photograph I_{SR_H}The resultant image pair, the generator output is I_{SR_GH}The loss function of the generator is defined as follows:

loss_{MSE_SR}＝MSE(I_{SR_GH},I_{SR_H})

further, step S7 is specifically as follows:

s701, constructing a GMM (Gaussian mixture model) according to the grouped image blocks, carrying out SVD (singular value decomposition) on a covariance matrix in the obtained model, and constructing a dictionary as external prior to guide the reconstruction of a satellite image;

s702, I output in the previous super-resolution model_{SR_GH}Inputting as an internal image, partitioning according to 15-15 blocks after inputting, and guiding the partitions to perform clustering by using a GMM (Gaussian mixture model) model when an external prior dictionary is constructed;

s703, guiding the internal image block to construct an internal dictionary again by utilizing a dictionary formed by external prior;

and S704, sparsely encoding the internal dictionary, and reconstructing a new internal graphic block group by combining the original internal image block group.

Compared with the prior art, the invention has at least the following beneficial effects:

the invention relates to a satellite image super-resolution method combining an anti-network with aerial image prior, which is designed aiming at the situation that the resolution and the visual effect of a satellite image are expected to be improved in the real situation but no clear aerial image pair corresponding to the satellite image exists, and comprises three parts of image denoising, image super-resolution and image post-processing, and the method flow for gradually improving the final satellite image super-resolution result in the available data range

Furthermore, a denoising model and a super-resolution model in a satellite image super-resolution process are both formed by using a generation countermeasure network, and on the basis, a multi-scale perception loss is added, so that the performance of the generation countermeasure network in image denoising and image super-resolution is further improved, and the perception loss has the effect of restricting the image generated by the generator and the corresponding target from a characteristic domain, so that the generated image is closer to the real target image visually. The multi-scale sensing loss is combined with the sensing loss of multiple scales, and stronger constraint is added, so that the generation effect is further improved.

Further, as different modules in the generation countermeasure network, the generator and the decider have different functions. The definition of the method plays an important role in realizing the denoising and super-resolution of the satellite image. The generator builds losses mainly for point-to-point pixels, and also focuses more on extracting high frequency information (through residual structure) of the image in the network body. The discriminator focuses more on the high-level semantic layer, ensures the consistency of the generated image and the real target image, and needs a larger receptive field (realized by cavity convolution). The multiscale perceptual loss is a constraint in the feature domain between the generated image and the real target image, and is realized by using a network pre-trained on IMAGENET.

Further, the generator generates an image that is noise free and similar to a true sharp image at the pixel level, so the loss function uses an MSE function based on the difference between pixels.

Further, the judger generates similarity of the image and the real clear image from the high-level semantic hierarchy constraint. The cross entropy function is a loss function based on the decision probability, and the probability that the generated image and the real target image are semantically judged to be in the same category is expected to be maximum. I.e. the generated image is as similar as possible to the real target image.

Further, in the super-resolution model, the roles of the determiner and the perception model are the same as in the noise model, so the same structure is used. The generator part, the network body is similar (still needs to generate more high frequency information, and also adopts a residual error structure), but since the super-resolution model needs to generate an image with a size larger than that of the input low-resolution image, the design of matching the sub-pixel convolution layer with the grape convolution layer is used for realizing the generation.

Further, under the practical situation, a clearer satellite image (close to the definition of an aerial image) and a lower-resolution satellite image pair cannot be obtained, and the realization effect of satellite image super-resolution is limited.

In conclusion, the invention realizes the denoising model and the image super-resolution model by the constraint combination of the pixel level, the semantic level and the multi-scale characteristic domain, and introduces the aerial images to train the image for the super-resolution model and build the GMM model dictionary in the image post-processing aiming at the unpaired satellite image training data to guide and reconstruct clearer satellite images.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

FIG. 1 is an overall flow diagram;

FIG. 2 is a diagram of a generator in a denoising model;

FIG. 3 is a structural diagram of a discriminator in a denoising model;

FIG. 4 is a block diagram of VGG19 in a denoiser;

FIG. 5 is a diagram of a generator structure in an image super-resolution model;

FIG. 6 is a flow chart for constructing a GMM model using aerial images and guiding satellite image reconstruction;

FIG. 7 is a diagram illustrating the effect of the present invention;

FIG. 8 is a graph comparing the results of the present invention.

Detailed Description

The invention provides a satellite image super-resolution method based on multi-scale perception loss and generation countermeasure network combined aerial image prior, which comprises the steps of firstly training a denoising model by using an image formed by a 16-level noise-containing image and a corresponding 16-level noise-free image, and then training the image super-resolution model by using clear aerial data. Because the satellite image and the aerial image do not exist in the condition of being paired, when the generated super-resolution image is subjected to image post-processing, the clear aerial image is adopted to construct the external prior dictionary of the GMM model, and the satellite image with unclear interior is guided to be reconstructed. And after reconstruction, in order to further improve the image quality, sharpening the image by using a Gaussian filtering mode. And finally, obtaining a high-resolution image of the original satellite image, and realizing the improvement of the image visual quality on the basis of the original satellite image. The effectiveness of the scheme can be seen from the experimental links. An effective idea is provided for solving the problems of satellite image super-resolution and image quality improvement under the condition of conditional limitation in reality.

Referring to fig. 1, the present invention provides a satellite image super-resolution method based on multi-scale sensing loss and generation countermeasure network combined aerial image prior, which includes the following specific steps:

s1, the generation countermeasure network for realizing the denoising function comprises three parts, namely a generator, a judger and a VGG19 network pre-trained by an IMAGENET database;

s101, defining a generator in the generated countermeasure network, wherein a residual error network is used as the generator, the generator comprises 16 residual error modules, and each residual error module comprises three convolutional layers. The time-domain denoising function to be realized here does not need to enlarge the image, and the specific structure is shown in fig. 2.

S102, defining a structure of a decision device, wherein the decision device uses a convolutional neural network with 10 layers, the convolutional layers use cavity convolution, the size of a receptive field is increased under the condition of not using a pooling layer by setting the range size of the cavity convolution, the accuracy of the decision device is improved, the specific structure is shown in FIG. 3, the structure of the decision device comprises 10 convolutional layers, the number of convolution kernels of each layer is respectively 64, 128, 256, 512, 1024, 512, 256, 128, 128 and 128, the mode arrangement of sequentially increasing and then decreasing, the sizes of the convolution kernels of the first 7 layers are all 4 x 4, the step length is 2, and the sliding convolution is sequentially carried out, wherein the increasing number of the convolution kernels means as many feature types as possible. The last layer of convolution kernel uses a size of 1 x 1, which has the effect of reducing the number of parameters. Since the number of the convolution kernels is increased, the number of the channels is increased, and the adjustment needs to be performed by adding such a layer.

S103, defining multi-scale perception loss, using a VGG19 network pre-trained on an IMAGENET1000 classification database as a perception loss network, and different from other perception loss, using the VGG19 network_{conv2_2}，_{conv3_4}，_{conv4_4}The specific structure of the multi-scale feature map in multiple layers for constructing the multi-scale perception loss and improving the image quality generated by a generator is shown in fig. 4, wherein the convolution module comprises two convolution layers and a pooling layer, the first convolution module comprises two convolution layers and a pooling layer, and the second convolution module comprises four convolution layers and a pooling layer. All convolutional layers use 3 × 3 convolutional kernels, the step length is 1, the number of convolutional kernels sequentially adopts a mode of increasing gradually layer by layer in a similar decision device as follows: 64, 64, 128, 128, 256, 256, 256, 256, 512, 512, 512, 512, 512, wherein_{conv2_2}，_{conv3_4}，_{conv4_4}The output of the second convolution module, the output of the third convolution module and the output of the fourth convolution module, respectively.

S2, down-sampling to 16 levels by using images extracted from 18 levels in the existing satellite data (generally, common satellite images are all 16 levels, and the acquisition cost of 18-level data is high), so that the obtained 16-level data is clearer, but the clear data is very little due to the high acquisition cost of 18-level satellite data.

Setting the obtained 16-grade satellite image as a de-noised target I_{D_H}The common satellite data extracted directly from 16 levels is taken as a noisy image I_{D_L}By doing soBy forming an image pair, let the generated noise-free satellite image be I_{D_GH}；

S3, performing initialization training on the generator in the denoising model by using the image pair formed in the step S2, calculating the mean square error of the pixels between the image generated by the generator and the corresponding target image by using the Mean Square Error (MSE) as a loss function in the initialization training, calculating the gradient and returning the adjustment model parameter loss_MSEThe following were used:

loss_MSE＝MSE(I_{D_GH},I_{D_H})

s4, after the initial training of about 100 epochs (an epoch means that all image data in the image library are trained and calculated as an epoch), training of a complete model is carried out;

at this time, all three networks need to participate in training, but the VGG19 does not adjust parameters, and only needs to output sensing loss and transmit the sensing loss to the generator and the decision device to adjust parameters; the loss function of the generator is different for the overall training compared to the training initiated individually.

In the overall training, the loss function of the generator comprises three parts: MSE generator loss, perception loss and countermeasure loss, wherein the three parts form a generator loss function in the whole training after weighted addition:

loss_G＝loss_MSE+loss_vgg+loss_GAN

therein, loss_MSELoss as the loss function at initial training_vggTo perceive the loss:

loss_vgg＝10^-6×(loss_{mse_conv2_2}+loss_{mse_conv3_4}+loss_{mse_conv4_4})

loss_{mse_conv2_2}＝MSE(f_{i_conv2_2},f_{t_conv2_2})

loss_{mse_conv3_4}＝MSE(f_{i_conv3_4},f_{t_conv3_4})

loss_{mse_conv4_4}＝MSE(f_{i_conv4_4},f_{t_conv4_4})

loss_GANto combat the loss function:

loss_GAN＝10^-4×cross_entropy(I_{D_GH},True)

cross_entropy(I_{D_GH},True)＝log(D(I_{D_GH}))

wherein D (-) is a decision device.

The overall training decision-maker loss function is defined as:

loss_D＝loss₁+loss₂

loss₁＝sigmoid_cross_entropy(I_{D_GH},False)

loss₂＝sigmoid_cross_entropy(I_{D_H},True)

therein, loss_DFor the judger loss, the loss and the corresponding gradient are calculated and the parameter model in the judger is adjusted back.

S5, training 200 epochs to converge according to the settings, and storing the model, wherein a generator of the training is used for later denoising processing, and the obtained denoised image is I_{D_GH}Then defining a satellite image super-resolution model as the input of the super-resolution of the subsequent image;

the super-resolution model also mainly comprises three parts, namely a generator, a perception model and a decider. Wherein the structure used by the perception model and the decider is the same as that used in the denoising model.

Defining a generator in an image super-resolution model: the residual error network is also used in the main structure of the generator part, that is, a network structure main body is constructed by constructing a residual error module and then stacking a plurality of residual error modules, and then a sub-pixel convolution layer (sub-pixel) is used for amplifying the image, the specific structure is shown in fig. 5, the structure of the super-resolution generator is similar to that in the denoising model defined above, a mode of stacking a plurality of residual error modules is adopted, wherein the convolution layers all adopt 3 × 3 convolution kernels, the number of the convolution kernels is 64, the following sub-pixel convolution layers and the convolution layers correspondingly connected with the following sub-pixel convolution layers all adopt 256 convolution kernels, the convolution layers adopt 3 × 3 convolution kernels, the scale of the first sub-pixel convolution layer in the super-resolution model for realizing x2 is 1, and the scale of the second sub-pixel convolution layer is 2.

The super-resolution model generator trains aerial photography data with input I_{SR_L} Low resolution 16 level aerial photograph and corresponding high resolution 17 level aerial photograph I_{SR_H}The pair of images formed, the output of the generator is I_{SR_GH}。

The loss function of the generator is defined as:

loss_{MSE_SR}＝MSE(I_{SR_GH},I_{SR_H})

s6, repeating the steps S3-S5, completing the super-resolution network training process and the denoising model, and then generating a super-resolution image I_{SR_GH}In order to further combine clear prior in the aerial image into the satellite image, a Gaussian Mixture Model (GMM) is adopted to construct an external prior dictionary to guide the method of combining internal image reconstruction and image sharpening to further improve the quality of the generated super-resolution satellite image;

and S7, constructing a GMM external prior dictionary to guide the internal image to reconstruct a clearer satellite image (originally used for image denoising). The situation that an image pair cannot be formed between an aerial image and a satellite image is utilized, a generated confrontation network model which is proposed before cannot be directly used for training, and rich details in a clear aerial image can be indirectly introduced into the satellite image generated in a super-resolution mode by using a mode of constructing a GMM external prior dictionary; constructing a GMM external prior dictionary, dividing a clear aerial 17-level image into 15 × 15 small blocks, and performing preliminary grouping (according to Euclidean distance) after the blocks, as shown in FIG. 6;

s702, I output in the previous super-resolution model_{SR_GH}Inputting a rear block (15 x 15) as an internal image input, and guiding the block to cluster by using a GMM model when an external prior dictionary is constructed;

s703, guiding the internal image block to construct an internal dictionary again by using a dictionary formed by external prior;

And S8, grouping according to the reconstructed internal graph blocks, reconstructing a satellite graph, and carrying out image sharpening operation on the reconstructed satellite graph to enable edges in the image to be clearer and obtain a final result graph.

The invention combines multi-scale perception loss and generation of a countermeasure network, and realizes super resolution of satellite images under the condition of certain condition limitation. The method comprises the steps of training a network for denoising by using a satellite image, training a network for realizing image super-resolution by using an aerial image, and further reconstructing an image after super-resolution reconstruction by combining with a feature prior in the aerial image which is extracted clearly by using a Gaussian mixture model. And the edges in the image are sharpened through one Gaussian filtering again, and finally a clearer satellite image is generated.

The invention solves the problems of image resolution and image quality improvement under the limited condition. By using multi-scale perceptual distortion loss, multi-scale constraints on generating image feature domains are achieved to generate better performing images.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

A. Conditions of the experiment

1. Database for experimental use

The experimental data is satellite image data and aerial image data provided in a satellite image super-resolution project. Not public data sets, are shown only partially here. The satellite image data includes:

data type 1: satellite images (containing granular obvious noise) extracted from 16 levels are not high in definition;

data type 2: the satellite images extracted from 18 levels onwards (granular noise is not noticeable) are slightly more sharp. When the satellite images extracted from the 18 th level are down-sampled to the 16 th level, the satellite images which are clearer than the satellite images extracted from the 16 th level can be obtained. However, since the satellite images extracted from 18 levels are high in cost, the satellite images are generally difficult to obtain in large quantities, and the satellite images extracted from 16 levels are generally common. Therefore, the realization of the project has great research significance and value by training an image super-resolution model on the basis of obtaining a small part of satellite images extracted from 18 levels and then realizing similar or even superior satellite images extracted from 18 levels (assisted by using clear aerial images) according to the low-definition satellite images extracted from 16 levels by the image super-resolution technology. This type of data is less available here, but overlaps with the coverage area in data type 1, so a small number of image pairs can be constructed for model training.

Data type 3: clear aerial data are clearer relative to satellite images due to shooting height and shooting mode. In the aerial image at the same level as the satellite image, the aerial image is much clearer and contains abundant texture information. However, the coverage area of the aerial images is limited, the sources are limited, the aerial images with the time periods close to the same positions in the data type 1 and the data type 2 cannot be obtained, the image pair formed by the satellite images and the aerial images does not exist, and the aerial images cannot be directly used for training, as shown in table 1.

TABLE 1 data set and distribution thereof

Data type/rating	Stage 15	16 stage	Stage 17	18 stages	Total up to
						Data type 1	12989	51956	Is free of	Is free of	64945
Data type 2	1583	6332	25328	101302	134555
						Data type 3	1689	7104	27988	111952	148733

2. Experimental requirements

The experiment was divided into three sections: denoising model training, image super-resolution model training and image post-processing experiments.

And (3) denoising model training: an image pair is formed by a satellite image (containing noise) extracted from 16 levels and a satellite image (containing no noise but having low definition) down-sampled to 16 levels after being extracted from 18 levels. As training data, the generation of the countermeasure network proposed in the present scheme is trained. After the training is finished, a generator model is used for inputting a satellite image containing noise to obtain a satellite image without noise. To ensure the robustness of the model, the test uses satellite images that are all different urban areas from the training, again noisy images extracted from level 16.

Training an image super-resolution model: model training is performed by constructing an image pair using a 17-level aerial image and a 16-level aerial image obtained by 17-level down-sampling. After the generation of the countermeasure network proposed in the training scheme is completed, a model of the generator is utilized to input a 16-level satellite image without noise, and a corresponding 17-level high-resolution image can be generated. And comparing the visual effects of the resulting high resolution images

Image post-processing experiment: and carrying out image post-processing on the satellite image subjected to denoising and image super-resolution processing to further improve the image quality. Firstly, a GMM external prior dictionary is obtained by utilizing clear 17-level aerial image training as guidance, a 17-level satellite image obtained by super resolution is input, an internal dictionary is constructed under the external prior guidance, an image is reconstructed, and a satellite image combined with clear prior in the aerial image is obtained. And on the basis, the image is sharpened by using a Gaussian filtering method to obtain the final post-processed image. The resulting image is compared to the original image for clarity and visual effect.

3. Experimental parameter settings

The same setting is adopted when the denoising model and the image super-resolution model are trained. The first is the initial training of the generator, with an initial learning rate of 0.0001 and a training period of 100 epochs (one epoch for all passes of the training data). When the network is integrally trained, the initial learning rate is still set to be 0.0001, the training period is set to be 200 epcoh, and the learning rate is attenuated to be 0.00001 once when the training period reaches half.

In image post-processing, the external prior dictionary for building the GMM model comprises the following parameters: setting the step length of the blocks to be 3, setting the size of the blocks to be 15 x 15, selecting 10 image blocks with the closest Euclidean distance as a group during clustering, wherein the GMM comprises 32 Gaussian models, namely fitting 32 categories. And adopting Gaussian filtering during image sharpening, setting the filtering radius to be 1.5, and setting the sharpening intensity to be 2.

B. Evaluation criteria for experimental results

Since the actual test input is a satellite image extracted from level 16 (containing noise), there is no corresponding sharp level 17 satellite image. The measurement can not be directly carried out by using the general PSNR, SSIM and other weighing standards. The effectiveness of this solution is illustrated here by a comparison of graphs listing some of the test results.

C. Comparative test protocol

Referring to fig. 7 and 8, the above-mentioned test results show the effectiveness of the proposed scheme in practical situations. The restrictive conditions in the background of the scheme result in that a general image super-resolution algorithm cannot be trained and processed directly, and the expected effect can be achieved only by means of a series of image processing algorithms. Final test generated image effects based on the original noise-containing 16-level satellite images, not only was noise removed, but super-resolution to 17 levels (i.e., length by 2 in size) was achieved. And by means of aerial photography definition (non-co-location), the improvement and improvement of the definition of the generated 17-level satellite image are realized.

The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims

1. A satellite image super-resolution method combining an antagonistic network with an aerial image prior is characterized in that a 16-level noise-containing image and a 16-level noise-free image corresponding to the 16-level noise-containing image are used for forming an image pair training denoising model, and then the aerial data is used for training the image super-resolution model; the method comprises the following steps of constructing a GMM model external prior dictionary by adopting aerial images, guiding an internally unclear satellite image to be reconstructed, finishing post-processing of a generated super-resolution image, sharpening the image by adopting a Gaussian filtering mode, and finally obtaining a high-resolution image of an original satellite image, so that the image visual quality on the basis of the original satellite image is improved, and comprises the following steps:

s1, defining a generator, a decision device and a multi-scale perception loss network in the generation countermeasure network, wherein the generator in the generation countermeasure network is defined as: using a residual error network as a generator, wherein the residual error network comprises 16 residual error modules, and each residual error module comprises three convolutional layers;

the perceptual loss at multiple scales is defined as: constructing multi-scale perceptual loss by using a VGG19 network pre-trained on an IMAGENET1000 class classification database as a perceptual loss network and using conv2_2, conv3_4 and conv4_4 and multi-scale feature maps in multiple layers;

S3, performing initialization training to the generator in the denoising model by the image pair formed in the step S2, and calculating the image generated by the generator and the pair thereof by taking the mean square error as a loss function in the initialization trainingObtaining MSE generator loss function loss by mean square error of pixel between target images_MSECalculating gradient and returning and adjusting model parameters;

s4, after 100 epochs of initial training, carrying out complete model training, calculating loss and corresponding gradient, returning to a parameter model in an adjustment generator and a decision device, sensing a loss network VGG19 without adjusting parameters, and carrying out MSE generator loss function loss in a generator loss function during model training_MSELoss of perception function (loss)_vggAnd loss function loss_GANThe generator loss function when the weighted sum forms the whole training is as follows:

loss_G＝loss_MSE+loss_vgg+loss_GAN；

s5, training 200 epochs to reach convergence, storing the model, using a training generator for denoising, and obtaining a denoised image I_{D_GH}As an input of image super resolution, an image super resolution model is defined, the super resolution model comprises a generator, a perception model and a judger, the perception model and the judger have the same structure as that used in the denoising model, and the generator in the image super resolution model is defined as follows:

constructing a residual module, then overlapping a plurality of residual modules to form a network structure main body, and realizing amplification use of the image through a sub-pixel convolution layer;

2. The super-resolution method for satellite images against network combined with aerial image priors as claimed in claim 1, whereinCharacterized in that, in step S3, the MSE generator loss function loss_MSEThe following were used:

loss_MSE＝MSE(I_{D_GH},I_{D_H})。

3. the super-resolution method for satellite images against network combined with aerial image priors as claimed in claim 1, wherein in step S4, loss of perception loss is_vggThe following were used:

loss_vgg＝10^-6×(loss_{mse_conv2_2}+loss_{mse_conv3_4}+loss_{mse_conv4_4})

loss_{mse_conv2_2}＝MSE(f_{i_conv2_2},f_{t_conv2_2})

loss_{mse_conv3_4}＝MSE(f_{i_conv3_4},f_{t_conv3_4})

loss_{mse_conv4_4}＝MSE(f_{i_conv4_4},f_{t_conv4_4})

wherein f is_{i_conv2_2}，f_{i_conv3_4}，f_{i_conv4_4}Generating image-to-perception model corresponding conv2_2, conv3_4, conv4_4 layer feature map, f for input_{t_conv2_3}，f_{t_conv3_3}，f_{t_conv4_3}Inputting corresponding conv2_2, conv3_4 and conv4_4 layer feature maps obtained in a perception model for generating image corresponding target images;

loss of function loss_GANThe following were used:

loss_GAN＝10^-4×cross_entropy(I_{D_GH},True)

cross_entropy(I_{D_GH},True)＝log(D(I_{D_GH}))

wherein D (-) is a decision device.

4. The method for super-resolution of satellite images based on countermeasure network combined with aerial image priors as claimed in claim 1, wherein in step S4, the decision-maker loss function loss is used in the whole model training_DIs defined as:

loss_D＝loss₁+loss₂

loss₁＝sigmoid_cross_entropy(I_{D_GH},False)

loss₂＝sigmoid_cross_entropy(I_{D_H},True)。

5. the method for super-resolution of satellite images in combination with aerial image priors through countermeasure network as claimed in claim 1, wherein the data used by the generator training of the super-resolution model is aerial data, and the input is I_{SR_L}Low-resolution 16-level aerial photograph and corresponding high-resolution 17-level aerial photograph I_{SR_H}The resultant image pair, the generator output is I_{SR_GH}The loss function of the generator is defined as follows:

loss_{MSE_SR}＝MSE(I_{SR_GH},I_{SR_H})。

6. the super-resolution method for satellite images in combination with aerial image priors through the countermeasure network as claimed in claim 1, wherein the step S7 is as follows: