CN113920015A - Infrared image edge preserving super-resolution reconstruction method based on generation countermeasure network - Google Patents

Infrared image edge preserving super-resolution reconstruction method based on generation countermeasure network Download PDF

Info

Publication number
CN113920015A
CN113920015A CN202111269585.3A CN202111269585A CN113920015A CN 113920015 A CN113920015 A CN 113920015A CN 202111269585 A CN202111269585 A CN 202111269585A CN 113920015 A CN113920015 A CN 113920015A
Authority
CN
China
Prior art keywords
image
stage
edge
network
resolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111269585.3A
Other languages
Chinese (zh)
Inventor
汪洪桥
付光远
赵玉清
伍明
岳敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rocket Force University of Engineering of PLA
Original Assignee
Rocket Force University of Engineering of PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rocket Force University of Engineering of PLA filed Critical Rocket Force University of Engineering of PLA
Priority to CN202111269585.3A priority Critical patent/CN113920015A/en
Publication of CN113920015A publication Critical patent/CN113920015A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an infrared image edge preserving super-resolution reconstruction method based on a generation countermeasure network, which comprises two layers of generation countermeasure networks, wherein the network decomposes the generation process of an image into two stages, namely a first stage GAN: in the first stage, the low resolution image is used as input and sent to a first stage generator to generate a false image, and the generated image and the real image are sent to a first stage discriminator to discriminate true and false; second stage GAN: in the second stage, the generated low-resolution image is taken as input and sent to a second-stage generator to generate one image; the generated image is sent to a second stage discriminator together with the real image, and the authenticity is discriminated. High resolution images with photorealistic details can be obtained using the method of the invention.

Description

Infrared image edge preserving super-resolution reconstruction method based on generation countermeasure network
Technical Field
The invention relates to the technical field of image processing, in particular to an infrared image edge preserving super-resolution reconstruction method based on a generation countermeasure network.
Background
The infrared imaging technology is used as a passive non-contact detection and identification, and has the advantages of good concealment, strong transmission capability, electromagnetic wave interference resistance, micro-light and night vision capability and the like. Besides being mainly applied to military aspects, the technology can also be widely applied to civil fields such as industry, agriculture, medical treatment, public security reconnaissance and the like. However, infrared images suffer from a number of disadvantages, such as low resolution, low contrast, and edge blurring.
Because the hardware performance of the infrared imaging system needs to be improved by improving the manufacturing process of the infrared detector, huge manpower and financial resources are required to be invested, and the improvement is difficult to realize in a short period, so that the improvement of the infrared image quality by adopting a digital signal processing mode is an economic and effective method.
Super-resolution reconstruction reconstructs high resolution images or sequences from a single or multiple frames of low resolution images. Protter M, Elad M, Takeda H, et al, general the non-local-Means to Super-Resolution Reconfiguration [ J ]. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society,2009,18(1):36. three types are included, namely interpolation-based methods, Reconstruction-based methods, and example learning-based methods. The method based on example learning has the advantages of flexible algorithm structure, more details under the condition of high magnification and the like, and becomes a research hotspot of super-resolution reconstruction in recent years.
The super-resolution reconstruction of the visible light image is realized by using a Convolutional Neural Network (CNN), and the mapping relation between the low-resolution image and the high-resolution image is learned through a large amount of data training. For example, the documents c.dong, c.c.loy, k.he, and x.tang, "Image Super-Resolution Using Deep adaptive Networks," IEEE Trans Pattern indoor interpolation, vol.38, No.2, pp.295-307,2014 use perceptual loss instead of minimum mean square error, and use learned upsampling instead of bicubic interpolation to achieve better results. After that, the researchers have proposed the documents J.Kim, J.Lee, and K.Lee, deep-secure conditional Network for Image Super-resolution.2016, pp.1637-1645.
W.Shi et al.,"Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network,"2016.
And so on for more hierarchical network architectures to pursue better results. In the field of machine learning, the study of generative models has been a difficult problem.
The generation countermeasure network (GAN) i.goodfellow et al, "genetic adaptive networks," in Advances in neural information processing systems,2014, pp.2672-2680 documents propose to meet the requirements of many fields of research and application for generating models, and use only back propagation, avoiding complex markov chains. Meanwhile, the GAN adopts an unsupervised learning mode, and more clear and real samples are generated. As document c.legacy et al, "Photo-reactive Single Image Super-Resolution Using a generic adaptive Network,"2016 proposes a Generative confrontation Network for Image Super Resolution which is capable of recovering Photo-Realistic natural images from 4 times undersampling. However, details generated after the image magnification by the method are usually accompanied by unpleasant artifacts, such as X.Wang et al, "ESRGAN: Enhanced Super-Resolution general adaptive Networks,"2018, in order to further improve the visual quality, proposes a network unit of a Residual-in-Residual Dense Block (RRDB), and improves the loss of a perception domain. Documents q.mao, s.wang, x.zhang, and s.ma, "Enhanced Image Decoding via Edge-rendering genetic adaptive Networks," in 2018IEEE International Conference on Multimedia and exception (ICME),2018 propose a new generation countermeasure framework in order to better recover the Edge structure and texture information of a compressed Image. The Network architecture with novel basic blocks is designed by ESRGAN + N.C.Rakotonina and A.Rasoanaivo, "ESRGAN +: flame improvement Enhanced Super-Resolution genetic adaptive Network," in ICASSP 2020IEEE International Conference on Acoustics, speed and Signal Processing (ICASSP),2020, pp.3637-3641 to replace the basic structure used by the original ESRGAN. In the field of ir images, researchers mostly use sparse coding methods c.kraich and s.pumrin, "Performance analysis on multi-frame image Super-Resolution video prediction," in 2014International Electrical Engineering convergence (iecon), 2014.
Sunyibao, Weshihui, Xiaoliang, Zhengrong, and Lu war force, "polymorphic sparsity regularized image super-resolution algorithm," electronic newspaper, "vol.38, No.12, pp.2898-2903,2010.
The method comprises the steps of training autumn and Zhang Wei, a super-resolution reconstruction algorithm based on image block classification sparse representation, electronic newspaper, vol.40, No.5, and pp.920-925,2012.
S.Yang,M.Wang,Y.Chen,and Y.Sun,"Single-Image Super-Resolution Reconstruction via Learned Geometric Dictionaries and Clustered Sparse Coding,"IEEE Transactions on Image Processing,vol.21,no.9,pp.4016-4028,2012.
Y.Tang, Y.Yuan, P.Yan, and X.Li, "Green repetition in space coding space for single-Image super-resolution," Journal of Visual Communication & Image reproduction, vol.24, No.2, pp.148-159,2013 implements super-resolution reconstruction.
In the prior art, although the convolutional neural network obtains good effect in the super-resolution reconstruction work of a single image; however, due to the defects of lack of details, poor contrast, edge blurring and the like of the infrared image, super-resolution reconstruction of the infrared image with an edge structure and good visual quality is still challenging.
Disclosure of Invention
The invention aims to provide an infrared image edge preserving super-resolution reconstruction method based on a generation countermeasure network, which mainly solves the technical problems in the background technology, can recover vivid and edge-clear images from 4 times of low-sampling infrared images, and improves the perception quality of the reconstructed images through better preserving edge structures and predicting visually pleasing details.
The present invention proposes that the perceptual loss function comprises: contrast loss, image fidelity loss, feature loss, and edge fidelity loss. Experimental results show that the method can recover vivid and clear-edged images from 4X-time low-sampling infrared images.
In order to achieve the purpose of the invention, the technical scheme provided by the invention is as follows:
an infrared image edge preserving super-resolution reconstruction method based on a generation countermeasure network comprises two layers of generation countermeasure networks, wherein the network decomposes the generation process of an image into two stages, namely a first stage GAN: in a first phase, a low resolution image I is takenLR0As input, to a first stage generator G0In generating a false image ISR0Generating a false image ISR0With real images IHR0Are sent to a first stage discriminator D0And identifying true and false; second stage GAN: in the second stage, a low resolution image I is generatedSR0As input, the image is sent to a second stage generator G1 to generate a single image ISR1(ii) a The generated image ISR1With real images IHR1Together, are sent to the second stage discriminator D1 and discriminate between true and false.
As a further preference, the first stage employs a reconstruction loss function.
As a further preference, the first-stage reconstruction loss function includes three parts, which are a contrast loss, an image fidelity loss, and an edge fidelity loss, respectively.
As a further preference, the formula (1) adopted by the first-stage reconstruction loss function is:
L1=l1Ladv+l2Lmse+l3Ledge
the three parts of the reconstruction loss function respectively capture different perception characteristics of a reconstructed image, and aim to obtain a more visually satisfactory reconstructed image;
wherein the weight { l }iIs a trade-off parameter to balance loss components; first part LadvIs the loss of confrontation between generator G0 of GAN and discriminator D0; this section encourages the generator to generate by attempting to spoof the discriminator networkTo form more vivid high-resolution images.
As a further preference, the LadvThe formula (2) used is:
Figure BDA0003328222790000051
here, the
Figure BDA0003328222790000052
Is a reconstructed image
Figure BDA0003328222790000053
The estimated probability of being discriminated as a true HR image. Minimization of the invention to obtain better gradients
Figure BDA0003328222790000054
Alternative minimization
Figure BDA0003328222790000055
As a further preference, the LmseThe formula (3) used is:
Figure BDA0003328222790000056
where W, H and C are the height, width and number of channels, respectively, of the image;
in the above formula, LmseTo ensure fidelity of the restored image, pixel-level MSE loss is exploited.
As a further preference, said LedgeThe formula (4) used is:
Figure BDA0003328222790000057
where W, H are the width and height of the image;
in the above formula, LedgeLoss of edge fidelity to reproduce sharpnessAnd (4) the edge information of the interest.
Edge map of the mark IEFrom a real 256 x 256 picture IHR0Extracting the specific edge filter;
Figure BDA0003328222790000058
the specific edge filter extraction on the 256 × 256 image ISR0 generated by the generator G0.
In the verification experiment of the invention, a Canny edge detection operator is selected, and the network can continuously guide the edge recovery by minimizing the loss of the edge fidelity.
As a further preference, the generator G0The network structure of (1) comprises three convolution blocks and six residual blocks. Wherein the first volume block comprises a convolutional layer and a PReLU layer; the first volume block is followed by six residual errors; each residual block contains two convolutional layers with a kernel size of 3 × 3 and 64 signatures, two bulk normalization layers and one PReLU layer; the convolution block following the residual block comprises a convolution layer and a batch standardization layer. The last convolutional layer contains a convolutional layer, an upsampling layer, and a PReLU layer.
Generator G0Features are extracted from the image using 6 residual blocks as a stack, each residual block containing two convolutional layers with a kernel size of 3 × 3 and 64 feature maps, two bulk normalization layers and one PReLU layer.
As a further preference, the generator G0The parameters and output formats of the layers of the network structure are XXX, [ XXX, XXX, XXX, XXX](ii) a XXX is an integer.
Figure BDA0003328222790000061
Figure BDA0003328222790000071
As a further preference, the discriminator network includes 10 convolutional blocks, and the other blocks include convolutional layers, batch normalization layers, and leakage relu layers except the first block, and the number of filter cores is increased continuously from 64 to 1024 cores in the VGG network. VGG is described in the documents K.Simony and A.Zisserman, "Very deep capacitive networks for large-scale image recognition," Computer Science,2014.
To distinguish the generated SR samples from the true HR samples, the present invention trains a discriminator network D0The overall framework of (1) is an architectural framework following a. radford, l.metz, and s.chitala, "unsuperviced reconstruction Learning with Deep responsive general adaptive Networks,"2015 summarized, using a.l.mas, a.y.hannun, and a.y.ng, "Rectifier non-linear advanced network adaptive models," in proc. icml,2013, vol.30, No.1, p.3. activation avoids maximum pooling of the entire network, using piecewise convolution to reduce image resolution each time the number increases; then connecting a special residual block, wherein the residual block is respectively provided with two convolution layers and a LeakyReLU layer; the output of the last convolution unit is fed into the dense layer with the S-shaped activation function to obtain the true and false results.
Discriminator D0The network layer structure and parameters are shown in the following table,
Figure BDA0003328222790000072
Figure BDA0003328222790000081
Figure BDA0003328222790000091
as a further preference, the generator G1Containing 16 residual blocks.
As a further preference, the generator G1The output format and parameters of each layer are,
Figure BDA0003328222790000092
Figure BDA0003328222790000101
as a further preference, the discriminator network D of the second stage1Network architecture of (1) adopts and discriminators D0Similar network structure.
As a further preference, the discriminator network D1The respective layer structure and the network parameters of (a) are,
Figure BDA0003328222790000102
Figure BDA0003328222790000111
as a further preference, the second-stage reconstruction loss function includes three partial countermeasure losses, an image fidelity loss, and a feature fidelity loss.
As a further preference, the formula (5) adopted by the second-stage reconstruction loss function is:
L2=l′1Ladv1+l′2Lmse1+l′3Lfeature
wherein the weight { l'iIs a trade-off parameter to balance loss components.
First item Ladv1Is the loss of confrontation between generator G1 of GAN and discriminator D1; second term Lmse1Is image fidelity loss; third item LfeatureLoss of feature fidelity the present invention refers to documents j.johnson, a.alahi, and l.fei-Fei, "Perceptual Losses for Real-Time Style Transfer and Super-Resolution,"2016 defines the feature space distance as a loss of feature fidelity to encourage the reconstructed image to retain similar features as the original image:
Figure BDA0003328222790000121
where W, H, and C are the height, width, and number of channels of the image, respectively, and f (x) represents a feature space function, which is a pre-trained VGG-19 network that maps the image into feature space, see the documents K.Simony and A.Zisserman, "Very de ep conditional networks for large-scale image recognition," arXiv compressed arXiv:1409.1556,2014. The fourth pooling layer is used to calculate the L2 distance for feature activation as a function of feature fidelity loss.
A two-stage generation countermeasure network framework is proposed that reconstructs super-resolution images by restoration of edge structure information and retention of feature information. In the first stage, the present invention combines image fidelity loss, antagonism loss and edge fidelity loss to preserve the edges of the image. In the second stage, the present invention mines image visual features in combination with image resistance loss, image fidelity loss, and feature fidelity loss. And realizing edge-preserved infrared image super-resolution reconstruction by iteratively updating the generating network and the discriminator network. A large number of experimental verification results show that the method provided by the invention can better reconstruct the infrared super-resolution image compared with a plurality of image reconstruction methods.
By combining image fidelity loss, antagonism loss, feature fidelity loss and edge fidelity loss, a multi-constraint loss function is designed, and a reconstructed image with high resolution and sharp edges is obtained by continuously updating an iterative minimization loss function.
VGG references K.Simnyan and A.Zisserman, "Very deep capacitive networks for large-scale image recognition," Computer Science,2014.
The English nouns used in the present invention are explained as follows:
and (3) GAN: a Generative Adaptive Networks (GAN) is a deep learning model, and is one of the most promising methods for unsupervised learning in complex distribution in recent years. The model passes through (at least) two modules in the framework: the mutual game learning of the Generative Model (Generative Model) and the Discriminative Model (Discriminative Model) yields a reasonably good output. In the original GAN theory, it is not required that G and D are both neural networks, but only that functions that can be generated and discriminated correspondingly are fitted. Deep neural networks are generally used as G and D in practice. An excellent GAN application requires a good training method, otherwise the output may be unsatisfactory due to the freedom of neural network models.
VGG: the main contribution of the Oxford university computer vision Group (Visual Geometry Group) is to construct convolutional neural network structures with various depths by using a small convolutional kernel (3 x 3) aiming at the influence of the depths of the convolutional neural network on the identification precision of a large-scale image set, and evaluate the network structures, so that the network depths of 16-19 layers are finally proved to obtain better identification precision. This is also commonly used to extract image features such as VGG-16 and VGG-19.
BN: batch standardization of Batch Normalization.
Figure BDA0003328222790000131
Compared with the prior art, the invention has the following beneficial effects:
high resolution images with photoreal details can be obtained;
the generation of the countermeasure network can be enhanced to better restore the edge structure while maintaining the infrared image detail information; in order to maintain the characteristics and edge information of the image, a multi-constraint loss function for super-resolution reconstruction is provided. The proposed method is validated using images in publicly available datasets and the performance of the invention is compared to other popular methods. The result proves that compared with other methods, the network of the invention can obtain the infrared super-resolution reconstructed image with more vivid edges and clearer edges.
Drawings
FIG. 1 is a schematic block diagram of the present invention;
FIG. 2 is a network architecture diagram of the generator G0;
FIG. 3 is a diagram of a network structure of a discriminator D0;
FIG. 4 is a network architecture diagram of the generator G1;
in the figure: conv (convolution convolutional layer), prlu (linear rectification unit/linear rectification function), DeConv (convolution upsampling convolution), BN (batch normalization layer), prlu (Parametric rectification unit, linear rectification function with parameters), Elementwise Sum, pixelshuffle, Tanh (hyperbaric convolution, Hyperbolic tangent function), Restoration (reconstruction), LeakyReLU (modified linear rectification unit).
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-4, the detailed technical solution provided by the present invention is:
an infrared image edge preserving super-resolution reconstruction method based on a generation countermeasure network comprises two layers of generation countermeasure networks, wherein the network decomposes the generation process of an image into two stages, namely a first stage GAN: in the first stage, a 128 × 128 low resolution image ILR0 is input to a first stage generator G0 to generate a false 256 × 256 image ISR0, and the generated 256 × 256 image ISR0 is fed to a first stage discriminator D0 together with a true 256 × 256 image IHR0 to discriminate true and false; second stage GAN: in the second stage, the generated 256 × 256 low-resolution image ISR0 is input to the second stage generator G1, and a 512 × 512 image ISR1 is generated; the generated 512 × 512 image ISR1 is sent to the second stage discriminator D1 together with the real 512 × 512 image IHR1, and the true and false are discriminated; han, X.Tao, and H.Li, "StackGAN: Text to Photo-reactive Image Synthesis with Stacked genetic additive Networks," in 2017IEEE International Conference on Computer Vision (ICCV), 2017; zhang et al, "stack gan + +, reactive Image Synthesis with Stacked genetic adaptation Networks," IEEE Transactions on Pattern Analysis & Machine Intelligence, pp.1-1,2018, inspiring that the present invention proposes a simple and effective two-layer generation countermeasure network; the invention follows the network design made by documents Q.Mao, S.Wang, S.Wang, X.Zhang, and S.Ma, "Enhanced Image Decoding via Edge-rendering genetic adaptive Networks," in 2018IEEE International Conference on Multimedia and Expo (ICM), 2018, and constructs a generator model G1 containing 16 residual blocks; the first stage adopts a reconstruction loss function; the first-stage reconstruction loss function comprises three parts, namely countermeasure loss, image fidelity loss and edge fidelity loss; the formula (1) adopted by the first-stage reconstruction loss function is as follows:
L1=l1Ladv+l2Lmse+l3Ledge
the three parts of the reconstruction loss function respectively capture different perception characteristics of a reconstructed image, and aim to obtain a more visually satisfactory reconstructed image;
wherein the weight { l }iIs a trade-off parameter to balance loss components; first part LadvIs the loss of confrontation between generator G0 of GAN and discriminator D0; this section encourages generators to generate more realistic high resolution images by attempting to spoof the discriminator network;
said LadvThe formula (2) used is:
Figure BDA0003328222790000161
here, the
Figure BDA0003328222790000162
Is a reconstructed image
Figure BDA0003328222790000163
An estimated probability of being discriminated as a true HR image; minimization of the invention to obtain better gradients
Figure BDA0003328222790000164
Alternative minimization
Figure BDA0003328222790000165
As a further preference, the LmseThe formula (3) used is:
Figure BDA0003328222790000166
where W, H and C are the height, width and number of channels, respectively, of the image;
in the above formula, LmseIn order to ensure the fidelity of the restored image, pixel-level MSE loss is utilized;
as a further preference, said LedgeThe formula (4) used is:
Figure BDA0003328222790000167
where W, H are the width and height of the image;
in the above formula, LedgeEdge fidelity loss to reproduce sharp edge information;
edge map of the mark IEFrom a real 256 x 256 picture IHR0Extracting the specific edge filter;
Figure BDA0003328222790000168
specific edge filter extraction on the 256 × 256 image ISR0 generated by the generator G0;
in the verification experiment of the invention, a Canny edge detection operator is selected, and the network continuously guides the edge recovery by minimizing the loss of the edge fidelity;
the generator G0Comprises three volume blocks and six residualsAnd (7) difference blocks. Wherein the first volume block comprises a convolutional layer and a PReLU layer; the first volume block is followed by six residual errors; each residual block contains two convolutional layers with a kernel size of 3 × 3 and 64 signatures, two bulk normalization layers and one PReLU layer; the convolution block following the residual block comprises a convolution layer and a batch standardization layer. The last convolutional layer contains a convolutional layer, an upsampling layer, and a PReLU layer.
Extracting features from the image using 6 residual blocks as a stack, each residual block comprising two convolution layers having a kernel size of 3 × 3 and 64 feature maps, two bulk normalization layers and one PReLU layer; the invention follows the Network design made by C.Ledig et al in C.Ledig et al, "Photo-reactive Single Image Super-Resolution Using a genetic adaptive Network,"2016, and introduces skip connection, which has been proved to be effective in training deep neural Network; the invention adopts the Residual blocks proposed in K.He, X.Zhang, S.ren, and J.Sun, Deep Residual Learning for Image registration.2016, pp.770-778 to construct the neural network; the invention here uses 6 residual blocks as a stack to extract features from the image;
the discriminator network comprises 10 convolution blocks, except the first block, the other blocks comprise convolution layers, batch normalization layers and LeakyReLU layers, the number of filter kernels is increased continuously from 64 of VGG network [26] K.Simony and A.Zisserman, version default conditional networks for large-scale image retrieval, and arXiv prediction arXiv:1409.1556,2014 to 1024 kernels;
to distinguish the generated SR samples from the true HR samples, the present invention trains a discriminator network D0The overall framework of (1) is an architecture framework conforming to RadFord et al A.Radford, L.Metz, and S.Chintala, "Unsupervised reconstruction of left with Deep relational generic adaptive Networks,"2015 summary, using LeakyReLU documents A.L.Maas, A.Y.Handun, and A.Y.Ng, "Rectifier nonlinear network resource models," in Proc.icll, 2013, vol.30, No.1, p.3. activation to avoid maximum pooling of the entire network, using fragmentation when the number increases, each time the number increasesConvolution to reduce image resolution; then connecting a special residual block, wherein the residual block is respectively provided with two convolution layers and a LeakyReLU layer; the output of the last convolution unit is fed into the dense layer with the S-shaped activation function to obtain the true and false results.
Discriminator D0The network layer structure and parameters are shown in the following table,
Figure BDA0003328222790000181
Figure BDA0003328222790000191
as a further preference, the generator G1Containing 16 residual blocks.
As a further preference, the generator G1The output format and parameters of each layer are,
Figure BDA0003328222790000192
Figure BDA0003328222790000201
as a further preference, the discriminator network D of the second stage1Network architecture of (1) adopts and discriminators D0Similar network structure.
As a further preference, the discriminator network D1The respective layer structure and the network parameters of (a) are,
Figure BDA0003328222790000202
Figure BDA0003328222790000211
as a further preference, the second-stage reconstruction loss function includes three partial countermeasure losses, an image fidelity loss, and a feature fidelity loss.
As a further preference, the formula (5) adopted by the second-stage reconstruction loss function is:
L2=l′1Ladv1+l′2Lmse1+l′3Lfeature
wherein the weight { l'iIs a trade-off parameter to balance loss components.
First item Ladv1Is the loss of confrontation between generator G1 of GAN and discriminator D1; second term Lmse1Is image fidelity loss; third item LfeatureLoss of feature fidelity the present invention, with reference to document j.johnson, a.alahi, and l.fei-Fei, "Perceptual Losses for Real-Time Style Transfer and Super-Resolution,"2016, defines a feature space distance as a loss of feature fidelity to encourage the reconstructed image to retain similar features as the original image:
Figure BDA0003328222790000221
where W, H, and C are the height, width, and number of channels of the image, respectively, and f (x) represents a feature space function, which is a pre-trained VGG-19 network document K.Simony and A.Zisserman, "Very deep conditional networks for large-scale image recognition," arXiv prediction arXiv:1409.1556,2014 that maps the image into feature space; the fourth pooling layer is used for calculating the L of feature activation2Distance is used as a feature fidelity loss function.
A two-stage generation countermeasure network framework is proposed that reconstructs super-resolution images by restoration of edge structure information and retention of feature information. In the first stage, the present invention combines image fidelity loss, antagonism loss and edge fidelity loss to preserve the edges of the image. In the second stage, the present invention mines image visual features in combination with image resistance loss, image fidelity loss, and feature fidelity loss. And realizing edge-preserved infrared image super-resolution reconstruction by iteratively updating the generating network and the discriminator network. A large number of experimental verification results show that the method provided by the invention can better reconstruct the infrared super-resolution image compared with a plurality of image reconstruction methods.
By combining image fidelity loss, antagonism loss, feature fidelity loss and edge fidelity loss, a multi-constraint loss function is designed, and a reconstructed image with high resolution and sharp edges is obtained by continuously updating an iterative minimization loss function.
The English nouns used in the present invention are explained as follows:
and (3) GAN: a Generative Adaptive Networks (GAN) is a deep learning model, and is one of the most promising methods for unsupervised learning in complex distribution in recent years. The model passes through (at least) two modules in the framework: the mutual game learning of the Generative Model (Generative Model) and the Discriminative Model (Discriminative Model) yields a reasonably good output. In the original GAN theory, it is not required that G and D are both neural networks, but only that functions that can be generated and discriminated correspondingly are fitted. Deep neural networks are generally used as G and D in practice. An excellent GAN application requires a good training method, otherwise the output may be unsatisfactory due to the freedom of neural network models.
VGG: the main contribution of the Oxford university computer vision Group (Visual Geometry Group) is to construct convolutional neural network structures with various depths by using a small convolutional kernel (3 x 3) aiming at the influence of the depths of the convolutional neural network on the identification precision of a large-scale image set, and evaluate the network structures, so that the network depths of 16-19 layers are finally proved to obtain better identification precision. This is also commonly used to extract image features such as VGG-16 and VGG-19.
Figure BDA0003328222790000231
Figure BDA0003328222790000241
In the training process, 8862 pictures are selected from a training data set of a thermal sensor data set FLIR _ ADAS _1_3 issued by a sensor system developer FLIR company in 2018, firstly, all experimental data are subjected to down-sampling by a factor of 4x, and HR images are reduced to obtain LR images; all experiments were performed on a desktop computer with 2.20GHz × 40Intel Xeon (R) Silver 4114CPU, GeForce GTX 1080Ti, 64GiB memory; set batch size to 4 and use ADAM [28 ] with momentum term b ═ 0.9]As an optimization program; in order to keep the loss function in the same order of magnitude and thus better balance the loss components,/, in equation (1) is given experimental results1Is arranged as 10-3,l2、l3Is set to 1; l 'in formula (5)'1Is arranged as 10-3L 'will'2Is set to be 1, l'3Is arranged as 10-6In training the first phase GAN, the present invention sets the learning rate to 10-4And reducing the learning rate of the second stage GAN training period to 10-5
In order to verify the efficiency of the method proposed by the invention, the invention performs experimental verification on two public data sets: the verification set of FLIR _ ADAS _1_3 and the Itir _ v1_0 data set; the method of the invention and the most advanced methods SRCNN document D.Chao, C.L.Chen, K.He, and X.Tang, "Learning a derived responsive Network for Image Super-Resolution," in ECCV,2014, ESPCN W.Shi et al, "Real-Time Single Image and Video Super-Resolution Using electronic Sub-Pixel responsive Network New Network,"2016, SRGAN C.Ledi et al, "" Photo-reactive Single Image Super-Resolution Using a generic responsive additive Network, "2016, ESRGAN Q.Mao, S.Wang, S.S.S.S.S.converting, X.Zhang, S.M.processing," engineering-obtaining A derived responsive Network and intermediate application, IEEE 2018 + IEEE sample application, and version of the first version of the invention, 2020, pp.3637-3641; three images were selected from the FLIR _ ADAS _1_3 validation set and the Itir _ v1_0 data set, respectively, as shown under the subjective results of several methods of reconstruction; it is not difficult to see from the reconstruction result that the reconstruction result of the method provided by the invention generates finer texture and edge details;
table 3 comparison of the reconstruction results of images in the validation set using FLIR _ ADAS _1_ 3:
Figure BDA0003328222790000251
Figure BDA0003328222790000261
in table 3, the first line is the original image, the second line is the reconstruction result of the SRCNN method, the third line is the reconstruction result of the ESPCN method, the fourth line is the reconstruction result of the SRGAN method, the fifth line is the reconstruction result of the ESRGAN method, the sixth line is the reconstruction result of the ESRGAN + method, and the last line Ours is the reconstruction result of the method provided by the present invention;
table 4 comparison of reconstruction results using images in the Itir _ v1_0 dataset:
Figure BDA0003328222790000262
Figure BDA0003328222790000271
in table 4, the first line is the original image, the second line is the reconstruction result of the SRCNN method, the third line is the reconstruction result of the ESPCN method, the fourth line is the reconstruction result of the SRGAN method, the fifth line is the reconstruction result of the ESRGAN method, the sixth line is the reconstruction result of the ESRGAN + method, and the last line Ours is the reconstruction result of the method provided by the present invention;
for fair quantitative comparison, the conventional objective indices PSNR document C.Yim and A.C.Bovik, "Quality Assessment of Deblocked Images," IEEE Transmission Image Process, vol.20, No.1, pp.88-98,2011 and SSIM document Z.Wang, A.C.Bovik, H.R.Sheikh, and E.P.Simplex, "Image Quality Assessment of From Error Visibility to Structural Similarity," IEEE Transmission Image Process, vol.13, No.4,2004 were used to evaluate the reconstructed Image Quality; table 5 below shows the results of a quantitative comparison of the different reconstruction methods; it can thus be concluded that the method proposed by the present invention is superior to the SRCNN, ESPCN, SRGAN, ESRGAN and ESRGAN + methods on both data sets;
table 5:
Figure BDA0003328222790000281
table 5 methods compare the results of the quantification of SRCNN, ESPCN, SRGAN, ESRGAN + on the FLIR _ ADAS _1_3 validation set and TNO data set and the method proposed by the present invention.
The super-resolution results were further compared using the advanced visual task:
the basic visual tasks including image super-resolution reconstruction are served for the advanced visual tasks, and in order to further verify the method, the super-resolution images generated by a plurality of methods are matched with the real high-resolution images; the Scale Invariant Feature Transform (SIFT) is a representation of a Gaussian image gradient statistical result in the field of feature points, and is a common image local feature extraction algorithm; in the matching result, the number of matching points can be used as the standard of matching quality, and the corresponding matching points can also judge the similarity of the local features of the two images; table 6 below shows the results of matching the super-resolution reconstructed image with the original high-resolution image by the SIFT algorithm; quantitatively, the reconstructed image generated by the method provided by the invention obtains more correct matching pairs than other methods;
table 6 super-resolution reconstructed image matching results, the left images are all the original high-resolution images:
Figure BDA0003328222790000291
in the experiment, the image target Detection is carried out by using the methods of classic YOLO documents J.Redmon, S.Divvala, R.Girshick, and A.Farhadi, "You Only Look one: Unifield, Real-Time Object Detection and 06/082015, and as can be seen from the table 7, the super-resolution reconstructed image generated by the method provided by the invention has better Detection results and can detect more targets;
table 7 target detection results of super-resolution reconstructed images:
Figure BDA0003328222790000301
the recombination of the above implementation steps is the technical effect which can be expected by the present invention.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. An infrared image edge preserving super-resolution reconstruction method based on a generation countermeasure network is characterized by comprising two layers of generation countermeasure networks, wherein the generation process of an image is decomposed into two stages by the networks, and a first stage GAN: in the first stage, the low-resolution image is used as input and sent to a first stage generator to generate a false image, the generated false image and a real image are sent to a first stage discriminator to discriminate true and false; second stage GAN: in the second stage, the generated low-resolution image is taken as input and sent to a second-stage generator to generate one image; the generated image is sent to a second stage discriminator together with the real image, and the authenticity is discriminated.
2. The method for edge-preserving super-resolution reconstruction of infrared images based on generation of countermeasure networks as claimed in claim 1, wherein the first stage employs a reconstruction loss function.
3. The infrared image edge preserving super-resolution reconstruction method based on the generation countermeasure network of claim 1, wherein the first-stage reconstruction loss function comprises three parts, which are countermeasure loss, image fidelity loss and edge fidelity loss.
4. The method for edge-preserving super-resolution reconstruction of infrared images based on generation of countermeasure network as claimed in claim 3, wherein the formula adopted by the first-stage reconstruction loss function is:
L1=l1Ladv+l2Lmse+l3Ledge
wherein the weight { l }iIs a trade-off parameter to balance loss components; first part LadvIs the loss of confrontation between the generator G0 of GAN and the discriminator D0.
5. The method of claim 4, wherein L is an edge preserving super-resolution reconstruction method for the infrared image based on the generation countermeasure networkadvThe formula used is:
Figure FDA0003328222780000011
Figure FDA0003328222780000012
is a reconstructed image
Figure FDA0003328222780000013
Estimated probability of being discriminated as a true HR image, minimization for better gradient
Figure FDA0003328222780000021
Alternative minimization
Figure FDA0003328222780000022
6. The method of claim 4, wherein L is an edge preserving super-resolution reconstruction method for the infrared image based on the generation countermeasure networkmseThe formula used is:
Figure FDA0003328222780000023
where W, H and C are the height, width and number of channels, L, respectively, of the imagemseTo ensure fidelity of the restored image, pixel-level MSE loss is exploited.
7. The method of claim 4, wherein L is an edge preserving super-resolution reconstruction method for the infrared image based on the generation countermeasure networkedgeThe formula used is:
Figure FDA0003328222780000024
where W, H are the width and height of the image, LedgeEdge fidelity is lost in order to reproduce sharp edge information.
8. The method of claim 1, wherein the generator G is used for generating an edge-preserving super-resolution reconstruction of the infrared image based on the countermeasure network0The network structure of (1) comprises three convolution blocks and six residual blocks; wherein the first volume block comprises a convolutional layer and a PReLU layer; the first volume block is followed by six residual errors; each residual block contains two convolutional layers with a kernel size of 3 × 3 and 64 signatures, two bulk normalization layers and oneA PReLU layer; the convolution block connected behind the residual block comprises a convolution layer and a batch standardization layer; the last convolutional layer contains a convolutional layer, an upsampling layer, and a PReLU layer.
9. The method of claim 1, wherein the generator G is used for generating an edge-preserving super-resolution reconstruction of the infrared image based on the countermeasure network0The parameters and output formats of the layers of the network structure are XXX, [ XXX, XXX, XXX, XXX](ii) a XXX is an integer.
10. The method of claim 1, wherein the discriminator network comprises 10 convolution blocks, and all the blocks except the first block comprise convolution layer, batch normalization layer and LeakyReLU layer, and the number of filter kernels is increased from 64 to 1024 kernels in VGG.
CN202111269585.3A 2021-10-29 2021-10-29 Infrared image edge preserving super-resolution reconstruction method based on generation countermeasure network Pending CN113920015A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111269585.3A CN113920015A (en) 2021-10-29 2021-10-29 Infrared image edge preserving super-resolution reconstruction method based on generation countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111269585.3A CN113920015A (en) 2021-10-29 2021-10-29 Infrared image edge preserving super-resolution reconstruction method based on generation countermeasure network

Publications (1)

Publication Number Publication Date
CN113920015A true CN113920015A (en) 2022-01-11

Family

ID=79243463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111269585.3A Pending CN113920015A (en) 2021-10-29 2021-10-29 Infrared image edge preserving super-resolution reconstruction method based on generation countermeasure network

Country Status (1)

Country Link
CN (1) CN113920015A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114463449A (en) * 2022-01-12 2022-05-10 武汉大学 Hyperspectral image compression method based on edge guide

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114463449A (en) * 2022-01-12 2022-05-10 武汉大学 Hyperspectral image compression method based on edge guide

Similar Documents

Publication Publication Date Title
CN110119780B (en) Hyper-spectral image super-resolution reconstruction method based on generation countermeasure network
Lei et al. Coupled adversarial training for remote sensing image super-resolution
CN110706157B (en) Face super-resolution reconstruction method for generating confrontation network based on identity prior
CN106952228B (en) Super-resolution reconstruction method of single image based on image non-local self-similarity
Wang et al. Ultra-dense GAN for satellite imagery super-resolution
CN112734646B (en) Image super-resolution reconstruction method based on feature channel division
CN112001847A (en) Method for generating high-quality image by relatively generating antagonistic super-resolution reconstruction model
CN113096017B (en) Image super-resolution reconstruction method based on depth coordinate attention network model
Huang et al. Deep hyperspectral image fusion network with iterative spatio-spectral regularization
Hayat Super-resolution via deep learning
CN106920214B (en) Super-resolution reconstruction method for space target image
CN110136060B (en) Image super-resolution reconstruction method based on shallow dense connection network
CN111640059B (en) Multi-dictionary image super-resolution method based on Gaussian mixture model
CN108765280A (en) A kind of high spectrum image spatial resolution enhancement method
Li et al. Image super-resolution with parametric sparse model learning
CN109272452A (en) Learn the method for super-resolution network in wavelet field jointly based on bloc framework subband
Chen et al. MICU: Image super-resolution via multi-level information compensation and U-net
CN109949217B (en) Video super-resolution reconstruction method based on residual learning and implicit motion compensation
CN111489405B (en) Face sketch synthesis system for generating confrontation network based on condition enhancement
Bao et al. SCTANet: A spatial attention-guided CNN-transformer aggregation network for deep face image super-resolution
Zhang et al. Learning stacking regressors for single image super-resolution
Xia et al. Meta-learning-based degradation representation for blind super-resolution
CN110097499B (en) Single-frame image super-resolution reconstruction method based on spectrum mixing kernel Gaussian process regression
Hua et al. Dynamic scene deblurring with continuous cross-layer attention transmission
CN113920015A (en) Infrared image edge preserving super-resolution reconstruction method based on generation countermeasure network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination