CN113920015A

CN113920015A - Infrared image edge preserving super-resolution reconstruction method based on generation countermeasure network

Info

Publication number: CN113920015A
Application number: CN202111269585.3A
Authority: CN
Inventors: 汪洪桥; 付光远; 赵玉清; 伍明; 岳敏
Original assignee: Rocket Force University of Engineering of PLA
Current assignee: Rocket Force University of Engineering of PLA
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-01-11

Abstract

The invention discloses an infrared image edge preserving super-resolution reconstruction method based on a generation countermeasure network, which comprises two layers of generation countermeasure networks, wherein the network decomposes the generation process of an image into two stages, namely a first stage GAN: in the first stage, the low resolution image is used as input and sent to a first stage generator to generate a false image, and the generated image and the real image are sent to a first stage discriminator to discriminate true and false; second stage GAN: in the second stage, the generated low-resolution image is taken as input and sent to a second-stage generator to generate one image; the generated image is sent to a second stage discriminator together with the real image, and the authenticity is discriminated. High resolution images with photorealistic details can be obtained using the method of the invention.

Description

Infrared image edge preserving super-resolution reconstruction method based on generation countermeasure network

Technical Field

The invention relates to the technical field of image processing, in particular to an infrared image edge preserving super-resolution reconstruction method based on a generation countermeasure network.

Background

The infrared imaging technology is used as a passive non-contact detection and identification, and has the advantages of good concealment, strong transmission capability, electromagnetic wave interference resistance, micro-light and night vision capability and the like. Besides being mainly applied to military aspects, the technology can also be widely applied to civil fields such as industry, agriculture, medical treatment, public security reconnaissance and the like. However, infrared images suffer from a number of disadvantages, such as low resolution, low contrast, and edge blurring.

Because the hardware performance of the infrared imaging system needs to be improved by improving the manufacturing process of the infrared detector, huge manpower and financial resources are required to be invested, and the improvement is difficult to realize in a short period, so that the improvement of the infrared image quality by adopting a digital signal processing mode is an economic and effective method.

Super-resolution reconstruction reconstructs high resolution images or sequences from a single or multiple frames of low resolution images. Protter M, Elad M, Takeda H, et al, general the non-local-Means to Super-Resolution Reconfiguration [ J ]. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society,2009,18(1):36. three types are included, namely interpolation-based methods, Reconstruction-based methods, and example learning-based methods. The method based on example learning has the advantages of flexible algorithm structure, more details under the condition of high magnification and the like, and becomes a research hotspot of super-resolution reconstruction in recent years.

The super-resolution reconstruction of the visible light image is realized by using a Convolutional Neural Network (CNN), and the mapping relation between the low-resolution image and the high-resolution image is learned through a large amount of data training. For example, the documents c.dong, c.c.loy, k.he, and x.tang, "Image Super-Resolution Using Deep adaptive Networks," IEEE Trans Pattern indoor interpolation, vol.38, No.2, pp.295-307,2014 use perceptual loss instead of minimum mean square error, and use learned upsampling instead of bicubic interpolation to achieve better results. After that, the researchers have proposed the documents J.Kim, J.Lee, and K.Lee, deep-secure conditional Network for Image Super-resolution.2016, pp.1637-1645.

W.Shi et al.,"Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network,"2016.

And so on for more hierarchical network architectures to pursue better results. In the field of machine learning, the study of generative models has been a difficult problem.

The generation countermeasure network (GAN) i.goodfellow et al, "genetic adaptive networks," in Advances in neural information processing systems,2014, pp.2672-2680 documents propose to meet the requirements of many fields of research and application for generating models, and use only back propagation, avoiding complex markov chains. Meanwhile, the GAN adopts an unsupervised learning mode, and more clear and real samples are generated. As document c.legacy et al, "Photo-reactive Single Image Super-Resolution Using a generic adaptive Network,"2016 proposes a Generative confrontation Network for Image Super Resolution which is capable of recovering Photo-Realistic natural images from 4 times undersampling. However, details generated after the image magnification by the method are usually accompanied by unpleasant artifacts, such as X.Wang et al, "ESRGAN: Enhanced Super-Resolution general adaptive Networks,"2018, in order to further improve the visual quality, proposes a network unit of a Residual-in-Residual Dense Block (RRDB), and improves the loss of a perception domain. Documents q.mao, s.wang, x.zhang, and s.ma, "Enhanced Image Decoding via Edge-rendering genetic adaptive Networks," in 2018IEEE International Conference on Multimedia and exception (ICME),2018 propose a new generation countermeasure framework in order to better recover the Edge structure and texture information of a compressed Image. The Network architecture with novel basic blocks is designed by ESRGAN + N.C.Rakotonina and A.Rasoanaivo, "ESRGAN +: flame improvement Enhanced Super-Resolution genetic adaptive Network," in ICASSP 2020IEEE International Conference on Acoustics, speed and Signal Processing (ICASSP),2020, pp.3637-3641 to replace the basic structure used by the original ESRGAN. In the field of ir images, researchers mostly use sparse coding methods c.kraich and s.pumrin, "Performance analysis on multi-frame image Super-Resolution video prediction," in 2014International Electrical Engineering convergence (iecon), 2014.

Sunyibao, Weshihui, Xiaoliang, Zhengrong, and Lu war force, "polymorphic sparsity regularized image super-resolution algorithm," electronic newspaper, "vol.38, No.12, pp.2898-2903,2010.

The method comprises the steps of training autumn and Zhang Wei, a super-resolution reconstruction algorithm based on image block classification sparse representation, electronic newspaper, vol.40, No.5, and pp.920-925,2012.

S.Yang,M.Wang,Y.Chen,and Y.Sun,"Single-Image Super-Resolution Reconstruction via Learned Geometric Dictionaries and Clustered Sparse Coding,"IEEE Transactions on Image Processing,vol.21,no.9,pp.4016-4028,2012.

Y.Tang, Y.Yuan, P.Yan, and X.Li, "Green repetition in space coding space for single-Image super-resolution," Journal of Visual Communication & Image reproduction, vol.24, No.2, pp.148-159,2013 implements super-resolution reconstruction.

In the prior art, although the convolutional neural network obtains good effect in the super-resolution reconstruction work of a single image; however, due to the defects of lack of details, poor contrast, edge blurring and the like of the infrared image, super-resolution reconstruction of the infrared image with an edge structure and good visual quality is still challenging.

Disclosure of Invention

The invention aims to provide an infrared image edge preserving super-resolution reconstruction method based on a generation countermeasure network, which mainly solves the technical problems in the background technology, can recover vivid and edge-clear images from 4 times of low-sampling infrared images, and improves the perception quality of the reconstructed images through better preserving edge structures and predicting visually pleasing details.

The present invention proposes that the perceptual loss function comprises: contrast loss, image fidelity loss, feature loss, and edge fidelity loss. Experimental results show that the method can recover vivid and clear-edged images from 4X-time low-sampling infrared images.

In order to achieve the purpose of the invention, the technical scheme provided by the invention is as follows:

an infrared image edge preserving super-resolution reconstruction method based on a generation countermeasure network comprises two layers of generation countermeasure networks, wherein the network decomposes the generation process of an image into two stages, namely a first stage GAN: in a first phase, a low resolution image I is taken^LR0As input, to a first stage generator G₀In generating a false image I^SR0Generating a false image I^SR0With real images I^HR0Are sent to a first stage discriminator D₀And identifying true and false; second stage GAN: in the second stage, a low resolution image I is generated^SR0As input, the image is sent to a second stage generator G1 to generate a single image I^SR1(ii) a The generated image I^SR1With real images I^HR1Together, are sent to the second stage discriminator D1 and discriminate between true and false.

As a further preference, the first stage employs a reconstruction loss function.

As a further preference, the first-stage reconstruction loss function includes three parts, which are a contrast loss, an image fidelity loss, and an edge fidelity loss, respectively.

As a further preference, the formula (1) adopted by the first-stage reconstruction loss function is:

L₁＝l₁L_adv+l₂L_mse+l₃L_edge。

the three parts of the reconstruction loss function respectively capture different perception characteristics of a reconstructed image, and aim to obtain a more visually satisfactory reconstructed image;

wherein the weight { l }_iIs a trade-off parameter to balance loss components; first part L_advIs the loss of confrontation between generator G0 of GAN and discriminator D0; this section encourages the generator to generate by attempting to spoof the discriminator networkTo form more vivid high-resolution images.

As a further preference, the L_advThe formula (2) used is:

here, the

Is a reconstructed image

The estimated probability of being discriminated as a true HR image. Minimization of the invention to obtain better gradients

Alternative minimization

As a further preference, the L_mseThe formula (3) used is:

where W, H and C are the height, width and number of channels, respectively, of the image;

in the above formula, L_mseTo ensure fidelity of the restored image, pixel-level MSE loss is exploited.

As a further preference, said L_edgeThe formula (4) used is:

where W, H are the width and height of the image;

in the above formula, L_edgeLoss of edge fidelity to reproduce sharpnessAnd (4) the edge information of the interest.

Edge map of the mark I^EFrom a real 256 x 256 picture I^HR0Extracting the specific edge filter;

the specific edge filter extraction on the 256 × 256 image ISR0 generated by the generator G0.

In the verification experiment of the invention, a Canny edge detection operator is selected, and the network can continuously guide the edge recovery by minimizing the loss of the edge fidelity.

As a further preference, the generator G₀The network structure of (1) comprises three convolution blocks and six residual blocks. Wherein the first volume block comprises a convolutional layer and a PReLU layer; the first volume block is followed by six residual errors; each residual block contains two convolutional layers with a kernel size of 3 × 3 and 64 signatures, two bulk normalization layers and one PReLU layer; the convolution block following the residual block comprises a convolution layer and a batch standardization layer. The last convolutional layer contains a convolutional layer, an upsampling layer, and a PReLU layer.

Generator G₀Features are extracted from the image using 6 residual blocks as a stack, each residual block containing two convolutional layers with a kernel size of 3 × 3 and 64 feature maps, two bulk normalization layers and one PReLU layer.

As a further preference, the generator G₀The parameters and output formats of the layers of the network structure are XXX, [ XXX, XXX, XXX, XXX](ii) a XXX is an integer.

As a further preference, the discriminator network includes 10 convolutional blocks, and the other blocks include convolutional layers, batch normalization layers, and leakage relu layers except the first block, and the number of filter cores is increased continuously from 64 to 1024 cores in the VGG network. VGG is described in the documents K.Simony and A.Zisserman, "Very deep capacitive networks for large-scale image recognition," Computer Science,2014.

To distinguish the generated SR samples from the true HR samples, the present invention trains a discriminator network D₀The overall framework of (1) is an architectural framework following a. radford, l.metz, and s.chitala, "unsuperviced reconstruction Learning with Deep responsive general adaptive Networks,"2015 summarized, using a.l.mas, a.y.hannun, and a.y.ng, "Rectifier non-linear advanced network adaptive models," in proc. icml,2013, vol.30, No.1, p.3. activation avoids maximum pooling of the entire network, using piecewise convolution to reduce image resolution each time the number increases; then connecting a special residual block, wherein the residual block is respectively provided with two convolution layers and a LeakyReLU layer; the output of the last convolution unit is fed into the dense layer with the S-shaped activation function to obtain the true and false results.

Discriminator D₀The network layer structure and parameters are shown in the following table,

as a further preference, the generator G₁Containing 16 residual blocks.

As a further preference, the generator G₁The output format and parameters of each layer are,

as a further preference, the discriminator network D of the second stage₁Network architecture of (1) adopts and discriminators D₀Similar network structure.

As a further preference, the discriminator network D₁The respective layer structure and the network parameters of (a) are,

as a further preference, the second-stage reconstruction loss function includes three partial countermeasure losses, an image fidelity loss, and a feature fidelity loss.

As a further preference, the formula (5) adopted by the second-stage reconstruction loss function is:

L₂＝l′₁L_adv1+l′₂L_mse1+l′₃L_feature

wherein the weight { l'_iIs a trade-off parameter to balance loss components.

First item L_adv1Is the loss of confrontation between generator G1 of GAN and discriminator D1; second term L_mse1Is image fidelity loss; third item L_featureLoss of feature fidelity the present invention refers to documents j.johnson, a.alahi, and l.fei-Fei, "Perceptual Losses for Real-Time Style Transfer and Super-Resolution,"2016 defines the feature space distance as a loss of feature fidelity to encourage the reconstructed image to retain similar features as the original image:

where W, H, and C are the height, width, and number of channels of the image, respectively, and f (x) represents a feature space function, which is a pre-trained VGG-19 network that maps the image into feature space, see the documents K.Simony and A.Zisserman, "Very de ep conditional networks for large-scale image recognition," arXiv compressed arXiv:1409.1556,2014. The fourth pooling layer is used to calculate the L2 distance for feature activation as a function of feature fidelity loss.

A two-stage generation countermeasure network framework is proposed that reconstructs super-resolution images by restoration of edge structure information and retention of feature information. In the first stage, the present invention combines image fidelity loss, antagonism loss and edge fidelity loss to preserve the edges of the image. In the second stage, the present invention mines image visual features in combination with image resistance loss, image fidelity loss, and feature fidelity loss. And realizing edge-preserved infrared image super-resolution reconstruction by iteratively updating the generating network and the discriminator network. A large number of experimental verification results show that the method provided by the invention can better reconstruct the infrared super-resolution image compared with a plurality of image reconstruction methods.

By combining image fidelity loss, antagonism loss, feature fidelity loss and edge fidelity loss, a multi-constraint loss function is designed, and a reconstructed image with high resolution and sharp edges is obtained by continuously updating an iterative minimization loss function.

VGG references K.Simnyan and A.Zisserman, "Very deep capacitive networks for large-scale image recognition," Computer Science,2014.

The English nouns used in the present invention are explained as follows:

and (3) GAN: a Generative Adaptive Networks (GAN) is a deep learning model, and is one of the most promising methods for unsupervised learning in complex distribution in recent years. The model passes through (at least) two modules in the framework: the mutual game learning of the Generative Model (Generative Model) and the Discriminative Model (Discriminative Model) yields a reasonably good output. In the original GAN theory, it is not required that G and D are both neural networks, but only that functions that can be generated and discriminated correspondingly are fitted. Deep neural networks are generally used as G and D in practice. An excellent GAN application requires a good training method, otherwise the output may be unsatisfactory due to the freedom of neural network models.

VGG: the main contribution of the Oxford university computer vision Group (Visual Geometry Group) is to construct convolutional neural network structures with various depths by using a small convolutional kernel (3 x 3) aiming at the influence of the depths of the convolutional neural network on the identification precision of a large-scale image set, and evaluate the network structures, so that the network depths of 16-19 layers are finally proved to obtain better identification precision. This is also commonly used to extract image features such as VGG-16 and VGG-19.

BN: batch standardization of Batch Normalization.

Compared with the prior art, the invention has the following beneficial effects:

high resolution images with photoreal details can be obtained;

the generation of the countermeasure network can be enhanced to better restore the edge structure while maintaining the infrared image detail information; in order to maintain the characteristics and edge information of the image, a multi-constraint loss function for super-resolution reconstruction is provided. The proposed method is validated using images in publicly available datasets and the performance of the invention is compared to other popular methods. The result proves that compared with other methods, the network of the invention can obtain the infrared super-resolution reconstructed image with more vivid edges and clearer edges.

Drawings

FIG. 1 is a schematic block diagram of the present invention;

FIG. 2 is a network architecture diagram of the generator G0;

FIG. 3 is a diagram of a network structure of a discriminator D0;

FIG. 4 is a network architecture diagram of the generator G1;

in the figure: conv (convolution convolutional layer), prlu (linear rectification unit/linear rectification function), DeConv (convolution upsampling convolution), BN (batch normalization layer), prlu (Parametric rectification unit, linear rectification function with parameters), Elementwise Sum, pixelshuffle, Tanh (hyperbaric convolution, Hyperbolic tangent function), Restoration (reconstruction), LeakyReLU (modified linear rectification unit).

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1-4, the detailed technical solution provided by the present invention is:

an infrared image edge preserving super-resolution reconstruction method based on a generation countermeasure network comprises two layers of generation countermeasure networks, wherein the network decomposes the generation process of an image into two stages, namely a first stage GAN: in the first stage, a 128 × 128 low resolution image ILR0 is input to a first stage generator G0 to generate a false 256 × 256 image ISR0, and the generated 256 × 256 image ISR0 is fed to a first stage discriminator D0 together with a true 256 × 256 image IHR0 to discriminate true and false; second stage GAN: in the second stage, the generated 256 × 256 low-resolution image ISR0 is input to the second stage generator G1, and a 512 × 512 image ISR1 is generated; the generated 512 × 512 image ISR1 is sent to the second stage discriminator D1 together with the real 512 × 512 image IHR1, and the true and false are discriminated; han, X.Tao, and H.Li, "StackGAN: Text to Photo-reactive Image Synthesis with Stacked genetic additive Networks," in 2017IEEE International Conference on Computer Vision (ICCV), 2017; zhang et al, "stack gan + +, reactive Image Synthesis with Stacked genetic adaptation Networks," IEEE Transactions on Pattern Analysis & Machine Intelligence, pp.1-1,2018, inspiring that the present invention proposes a simple and effective two-layer generation countermeasure network; the invention follows the network design made by documents Q.Mao, S.Wang, S.Wang, X.Zhang, and S.Ma, "Enhanced Image Decoding via Edge-rendering genetic adaptive Networks," in 2018IEEE International Conference on Multimedia and Expo (ICM), 2018, and constructs a generator model G1 containing 16 residual blocks; the first stage adopts a reconstruction loss function; the first-stage reconstruction loss function comprises three parts, namely countermeasure loss, image fidelity loss and edge fidelity loss; the formula (1) adopted by the first-stage reconstruction loss function is as follows:

L₁＝l₁L_adv+l₂L_mse+l₃L_edge；

wherein the weight { l }_iIs a trade-off parameter to balance loss components; first part L_advIs the loss of confrontation between generator G0 of GAN and discriminator D0; this section encourages generators to generate more realistic high resolution images by attempting to spoof the discriminator network;

said L_advThe formula (2) used is:

here, the

Is a reconstructed image

An estimated probability of being discriminated as a true HR image; minimization of the invention to obtain better gradients

Alternative minimization

As a further preference, the L_mseThe formula (3) used is:

in the above formula, L_mseIn order to ensure the fidelity of the restored image, pixel-level MSE loss is utilized;

as a further preference, said L_edgeThe formula (4) used is:

where W, H are the width and height of the image;

in the above formula, L_edgeEdge fidelity loss to reproduce sharp edge information;

specific edge filter extraction on the 256 × 256 image ISR0 generated by the generator G0;

in the verification experiment of the invention, a Canny edge detection operator is selected, and the network continuously guides the edge recovery by minimizing the loss of the edge fidelity;

the generator G₀Comprises three volume blocks and six residualsAnd (7) difference blocks. Wherein the first volume block comprises a convolutional layer and a PReLU layer; the first volume block is followed by six residual errors; each residual block contains two convolutional layers with a kernel size of 3 × 3 and 64 signatures, two bulk normalization layers and one PReLU layer; the convolution block following the residual block comprises a convolution layer and a batch standardization layer. The last convolutional layer contains a convolutional layer, an upsampling layer, and a PReLU layer.

Extracting features from the image using 6 residual blocks as a stack, each residual block comprising two convolution layers having a kernel size of 3 × 3 and 64 feature maps, two bulk normalization layers and one PReLU layer; the invention follows the Network design made by C.Ledig et al in C.Ledig et al, "Photo-reactive Single Image Super-Resolution Using a genetic adaptive Network,"2016, and introduces skip connection, which has been proved to be effective in training deep neural Network; the invention adopts the Residual blocks proposed in K.He, X.Zhang, S.ren, and J.Sun, Deep Residual Learning for Image registration.2016, pp.770-778 to construct the neural network; the invention here uses 6 residual blocks as a stack to extract features from the image;

the discriminator network comprises 10 convolution blocks, except the first block, the other blocks comprise convolution layers, batch normalization layers and LeakyReLU layers, the number of filter kernels is increased continuously from 64 of VGG network [26] K.Simony and A.Zisserman, version default conditional networks for large-scale image retrieval, and arXiv prediction arXiv:1409.1556,2014 to 1024 kernels;

to distinguish the generated SR samples from the true HR samples, the present invention trains a discriminator network D₀The overall framework of (1) is an architecture framework conforming to RadFord et al A.Radford, L.Metz, and S.Chintala, "Unsupervised reconstruction of left with Deep relational generic adaptive Networks,"2015 summary, using LeakyReLU documents A.L.Maas, A.Y.Handun, and A.Y.Ng, "Rectifier nonlinear network resource models," in Proc.icll, 2013, vol.30, No.1, p.3. activation to avoid maximum pooling of the entire network, using fragmentation when the number increases, each time the number increasesConvolution to reduce image resolution; then connecting a special residual block, wherein the residual block is respectively provided with two convolution layers and a LeakyReLU layer; the output of the last convolution unit is fed into the dense layer with the S-shaped activation function to obtain the true and false results.

as a further preference, the generator G₁Containing 16 residual blocks.

L₂＝l′₁L_adv1+l′₂L_mse1+l′₃L_feature

wherein the weight { l'_iIs a trade-off parameter to balance loss components.

First item L_adv1Is the loss of confrontation between generator G1 of GAN and discriminator D1; second term L_mse1Is image fidelity loss; third item L_featureLoss of feature fidelity the present invention, with reference to document j.johnson, a.alahi, and l.fei-Fei, "Perceptual Losses for Real-Time Style Transfer and Super-Resolution,"2016, defines a feature space distance as a loss of feature fidelity to encourage the reconstructed image to retain similar features as the original image:

where W, H, and C are the height, width, and number of channels of the image, respectively, and f (x) represents a feature space function, which is a pre-trained VGG-19 network document K.Simony and A.Zisserman, "Very deep conditional networks for large-scale image recognition," arXiv prediction arXiv:1409.1556,2014 that maps the image into feature space; the fourth pooling layer is used for calculating the L of feature activation₂Distance is used as a feature fidelity loss function.

The English nouns used in the present invention are explained as follows:

In the training process, 8862 pictures are selected from a training data set of a thermal sensor data set FLIR _ ADAS _1_3 issued by a sensor system developer FLIR company in 2018, firstly, all experimental data are subjected to down-sampling by a factor of 4x, and HR images are reduced to obtain LR images; all experiments were performed on a desktop computer with 2.20GHz × 40Intel Xeon (R) Silver 4114CPU, GeForce GTX 1080Ti, 64GiB memory; set batch size to 4 and use ADAM [28 ] with momentum term b ═ 0.9]As an optimization program; in order to keep the loss function in the same order of magnitude and thus better balance the loss components,/, in equation (1) is given experimental results₁Is arranged as 10^-3，l₂、l₃Is set to 1; l 'in formula (5)'₁Is arranged as 10^-3L 'will'₂Is set to be 1, l'₃Is arranged as 10^-6In training the first phase GAN, the present invention sets the learning rate to 10^-4And reducing the learning rate of the second stage GAN training period to 10^-5。

In order to verify the efficiency of the method proposed by the invention, the invention performs experimental verification on two public data sets: the verification set of FLIR _ ADAS _1_3 and the Itir _ v1_0 data set; the method of the invention and the most advanced methods SRCNN document D.Chao, C.L.Chen, K.He, and X.Tang, "Learning a derived responsive Network for Image Super-Resolution," in ECCV,2014, ESPCN W.Shi et al, "Real-Time Single Image and Video Super-Resolution Using electronic Sub-Pixel responsive Network New Network,"2016, SRGAN C.Ledi et al, "" Photo-reactive Single Image Super-Resolution Using a generic responsive additive Network, "2016, ESRGAN Q.Mao, S.Wang, S.S.S.S.S.converting, X.Zhang, S.M.processing," engineering-obtaining A derived responsive Network and intermediate application, IEEE 2018 + IEEE sample application, and version of the first version of the invention, 2020, pp.3637-3641; three images were selected from the FLIR _ ADAS _1_3 validation set and the Itir _ v1_0 data set, respectively, as shown under the subjective results of several methods of reconstruction; it is not difficult to see from the reconstruction result that the reconstruction result of the method provided by the invention generates finer texture and edge details;

table 3 comparison of the reconstruction results of images in the validation set using FLIR _ ADAS _1_ 3:

in table 3, the first line is the original image, the second line is the reconstruction result of the SRCNN method, the third line is the reconstruction result of the ESPCN method, the fourth line is the reconstruction result of the SRGAN method, the fifth line is the reconstruction result of the ESRGAN method, the sixth line is the reconstruction result of the ESRGAN + method, and the last line Ours is the reconstruction result of the method provided by the present invention;

table 4 comparison of reconstruction results using images in the Itir _ v1_0 dataset:

in table 4, the first line is the original image, the second line is the reconstruction result of the SRCNN method, the third line is the reconstruction result of the ESPCN method, the fourth line is the reconstruction result of the SRGAN method, the fifth line is the reconstruction result of the ESRGAN method, the sixth line is the reconstruction result of the ESRGAN + method, and the last line Ours is the reconstruction result of the method provided by the present invention;

for fair quantitative comparison, the conventional objective indices PSNR document C.Yim and A.C.Bovik, "Quality Assessment of Deblocked Images," IEEE Transmission Image Process, vol.20, No.1, pp.88-98,2011 and SSIM document Z.Wang, A.C.Bovik, H.R.Sheikh, and E.P.Simplex, "Image Quality Assessment of From Error Visibility to Structural Similarity," IEEE Transmission Image Process, vol.13, No.4,2004 were used to evaluate the reconstructed Image Quality; table 5 below shows the results of a quantitative comparison of the different reconstruction methods; it can thus be concluded that the method proposed by the present invention is superior to the SRCNN, ESPCN, SRGAN, ESRGAN and ESRGAN + methods on both data sets;

table 5:

table 5 methods compare the results of the quantification of SRCNN, ESPCN, SRGAN, ESRGAN + on the FLIR _ ADAS _1_3 validation set and TNO data set and the method proposed by the present invention.

The super-resolution results were further compared using the advanced visual task:

the basic visual tasks including image super-resolution reconstruction are served for the advanced visual tasks, and in order to further verify the method, the super-resolution images generated by a plurality of methods are matched with the real high-resolution images; the Scale Invariant Feature Transform (SIFT) is a representation of a Gaussian image gradient statistical result in the field of feature points, and is a common image local feature extraction algorithm; in the matching result, the number of matching points can be used as the standard of matching quality, and the corresponding matching points can also judge the similarity of the local features of the two images; table 6 below shows the results of matching the super-resolution reconstructed image with the original high-resolution image by the SIFT algorithm; quantitatively, the reconstructed image generated by the method provided by the invention obtains more correct matching pairs than other methods;

table 6 super-resolution reconstructed image matching results, the left images are all the original high-resolution images:

in the experiment, the image target Detection is carried out by using the methods of classic YOLO documents J.Redmon, S.Divvala, R.Girshick, and A.Farhadi, "You Only Look one: Unifield, Real-Time Object Detection and 06/082015, and as can be seen from the table 7, the super-resolution reconstructed image generated by the method provided by the invention has better Detection results and can detect more targets;

table 7 target detection results of super-resolution reconstructed images:

the recombination of the above implementation steps is the technical effect which can be expected by the present invention.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. An infrared image edge preserving super-resolution reconstruction method based on a generation countermeasure network is characterized by comprising two layers of generation countermeasure networks, wherein the generation process of an image is decomposed into two stages by the networks, and a first stage GAN: in the first stage, the low-resolution image is used as input and sent to a first stage generator to generate a false image, the generated false image and a real image are sent to a first stage discriminator to discriminate true and false; second stage GAN: in the second stage, the generated low-resolution image is taken as input and sent to a second-stage generator to generate one image; the generated image is sent to a second stage discriminator together with the real image, and the authenticity is discriminated.

2. The method for edge-preserving super-resolution reconstruction of infrared images based on generation of countermeasure networks as claimed in claim 1, wherein the first stage employs a reconstruction loss function.

3. The infrared image edge preserving super-resolution reconstruction method based on the generation countermeasure network of claim 1, wherein the first-stage reconstruction loss function comprises three parts, which are countermeasure loss, image fidelity loss and edge fidelity loss.

4. The method for edge-preserving super-resolution reconstruction of infrared images based on generation of countermeasure network as claimed in claim 3, wherein the formula adopted by the first-stage reconstruction loss function is:

L₁＝l₁L_adv+l₂L_mse+l₃L_edge，

wherein the weight { l }_iIs a trade-off parameter to balance loss components; first part L_advIs the loss of confrontation between the generator G0 of GAN and the discriminator D0.

5. The method of claim 4, wherein L is an edge preserving super-resolution reconstruction method for the infrared image based on the generation countermeasure network_advThe formula used is:

is a reconstructed image

Estimated probability of being discriminated as a true HR image, minimization for better gradient

Alternative minimization

6. The method of claim 4, wherein L is an edge preserving super-resolution reconstruction method for the infrared image based on the generation countermeasure network_mseThe formula used is:

where W, H and C are the height, width and number of channels, L, respectively, of the image_mseTo ensure fidelity of the restored image, pixel-level MSE loss is exploited.

7. The method of claim 4, wherein L is an edge preserving super-resolution reconstruction method for the infrared image based on the generation countermeasure network_edgeThe formula used is:

where W, H are the width and height of the image, L_edgeEdge fidelity is lost in order to reproduce sharp edge information.

8. The method of claim 1, wherein the generator G is used for generating an edge-preserving super-resolution reconstruction of the infrared image based on the countermeasure network₀The network structure of (1) comprises three convolution blocks and six residual blocks; wherein the first volume block comprises a convolutional layer and a PReLU layer; the first volume block is followed by six residual errors; each residual block contains two convolutional layers with a kernel size of 3 × 3 and 64 signatures, two bulk normalization layers and oneA PReLU layer; the convolution block connected behind the residual block comprises a convolution layer and a batch standardization layer; the last convolutional layer contains a convolutional layer, an upsampling layer, and a PReLU layer.

9. The method of claim 1, wherein the generator G is used for generating an edge-preserving super-resolution reconstruction of the infrared image based on the countermeasure network₀The parameters and output formats of the layers of the network structure are XXX, [ XXX, XXX, XXX, XXX](ii) a XXX is an integer.

10. The method of claim 1, wherein the discriminator network comprises 10 convolution blocks, and all the blocks except the first block comprise convolution layer, batch normalization layer and LeakyReLU layer, and the number of filter kernels is increased from 64 to 1024 kernels in VGG.