CN113920015A - Infrared image edge preserving super-resolution reconstruction method based on generation countermeasure network - Google Patents
Infrared image edge preserving super-resolution reconstruction method based on generation countermeasure network Download PDFInfo
- Publication number
- CN113920015A CN113920015A CN202111269585.3A CN202111269585A CN113920015A CN 113920015 A CN113920015 A CN 113920015A CN 202111269585 A CN202111269585 A CN 202111269585A CN 113920015 A CN113920015 A CN 113920015A
- Authority
- CN
- China
- Prior art keywords
- image
- stage
- edge
- network
- resolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 72
- 230000008569 process Effects 0.000 claims abstract description 7
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 21
- 238000010606 normalization Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 description 30
- 230000003044 adaptive effect Effects 0.000 description 12
- 238000012549 training Methods 0.000 description 8
- 230000000007 visual effect Effects 0.000 description 8
- 230000002068 genetic effect Effects 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000012795 verification Methods 0.000 description 6
- 102100034112 Alkyldihydroxyacetonephosphate synthase, peroxisomal Human genes 0.000 description 5
- 101000799143 Homo sapiens Alkyldihydroxyacetonephosphate synthase, peroxisomal Proteins 0.000 description 5
- 230000004913 activation Effects 0.000 description 5
- 238000000848 angular dependent Auger electron spectroscopy Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 5
- 230000008485 antagonism Effects 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 230000008447 perception Effects 0.000 description 4
- 238000011176 pooling Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- NYPYHUZRZVSYKL-UHFFFAOYSA-N 2-azaniumyl-3-(4-hydroxy-3,5-diiodophenyl)propanoate Chemical compound OC(=O)C(N)CC1=CC(I)=C(O)C(I)=C1 NYPYHUZRZVSYKL-UHFFFAOYSA-N 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000010200 validation analysis Methods 0.000 description 3
- 101150102734 ISR1 gene Proteins 0.000 description 2
- 239000000654 additive Substances 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000003708 edge detection Methods 0.000 description 2
- 230000004438 eyesight Effects 0.000 description 2
- 238000003331 infrared imaging Methods 0.000 description 2
- 230000014759 maintenance of location Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 238000001303 quality assessment method Methods 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- VCLHHRGZKNUOAQ-UHFFFAOYSA-N 2,5-dichloro-n-[4-[(2,5-dichlorobenzoyl)amino]phenyl]benzamide Chemical compound ClC1=CC=C(Cl)C(C(=O)NC=2C=CC(NC(=O)C=3C(=CC=C(Cl)C=3)Cl)=CC=2)=C1 VCLHHRGZKNUOAQ-UHFFFAOYSA-N 0.000 description 1
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 241000171877 Chitala Species 0.000 description 1
- 241000058375 Polysiphonia simplex Species 0.000 description 1
- BQCADISMDOOEFD-UHFFFAOYSA-N Silver Chemical compound [Ag] BQCADISMDOOEFD-UHFFFAOYSA-N 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004870 electrical engineering Methods 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000004297 night vision Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000004332 silver Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4046—Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10048—Infrared image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an infrared image edge preserving super-resolution reconstruction method based on a generation countermeasure network, which comprises two layers of generation countermeasure networks, wherein the network decomposes the generation process of an image into two stages, namely a first stage GAN: in the first stage, the low resolution image is used as input and sent to a first stage generator to generate a false image, and the generated image and the real image are sent to a first stage discriminator to discriminate true and false; second stage GAN: in the second stage, the generated low-resolution image is taken as input and sent to a second-stage generator to generate one image; the generated image is sent to a second stage discriminator together with the real image, and the authenticity is discriminated. High resolution images with photorealistic details can be obtained using the method of the invention.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to an infrared image edge preserving super-resolution reconstruction method based on a generation countermeasure network.
Background
The infrared imaging technology is used as a passive non-contact detection and identification, and has the advantages of good concealment, strong transmission capability, electromagnetic wave interference resistance, micro-light and night vision capability and the like. Besides being mainly applied to military aspects, the technology can also be widely applied to civil fields such as industry, agriculture, medical treatment, public security reconnaissance and the like. However, infrared images suffer from a number of disadvantages, such as low resolution, low contrast, and edge blurring.
Because the hardware performance of the infrared imaging system needs to be improved by improving the manufacturing process of the infrared detector, huge manpower and financial resources are required to be invested, and the improvement is difficult to realize in a short period, so that the improvement of the infrared image quality by adopting a digital signal processing mode is an economic and effective method.
Super-resolution reconstruction reconstructs high resolution images or sequences from a single or multiple frames of low resolution images. Protter M, Elad M, Takeda H, et al, general the non-local-Means to Super-Resolution Reconfiguration [ J ]. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society,2009,18(1):36. three types are included, namely interpolation-based methods, Reconstruction-based methods, and example learning-based methods. The method based on example learning has the advantages of flexible algorithm structure, more details under the condition of high magnification and the like, and becomes a research hotspot of super-resolution reconstruction in recent years.
The super-resolution reconstruction of the visible light image is realized by using a Convolutional Neural Network (CNN), and the mapping relation between the low-resolution image and the high-resolution image is learned through a large amount of data training. For example, the documents c.dong, c.c.loy, k.he, and x.tang, "Image Super-Resolution Using Deep adaptive Networks," IEEE Trans Pattern indoor interpolation, vol.38, No.2, pp.295-307,2014 use perceptual loss instead of minimum mean square error, and use learned upsampling instead of bicubic interpolation to achieve better results. After that, the researchers have proposed the documents J.Kim, J.Lee, and K.Lee, deep-secure conditional Network for Image Super-resolution.2016, pp.1637-1645.
W.Shi et al.,"Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network,"2016.
And so on for more hierarchical network architectures to pursue better results. In the field of machine learning, the study of generative models has been a difficult problem.
The generation countermeasure network (GAN) i.goodfellow et al, "genetic adaptive networks," in Advances in neural information processing systems,2014, pp.2672-2680 documents propose to meet the requirements of many fields of research and application for generating models, and use only back propagation, avoiding complex markov chains. Meanwhile, the GAN adopts an unsupervised learning mode, and more clear and real samples are generated. As document c.legacy et al, "Photo-reactive Single Image Super-Resolution Using a generic adaptive Network,"2016 proposes a Generative confrontation Network for Image Super Resolution which is capable of recovering Photo-Realistic natural images from 4 times undersampling. However, details generated after the image magnification by the method are usually accompanied by unpleasant artifacts, such as X.Wang et al, "ESRGAN: Enhanced Super-Resolution general adaptive Networks,"2018, in order to further improve the visual quality, proposes a network unit of a Residual-in-Residual Dense Block (RRDB), and improves the loss of a perception domain. Documents q.mao, s.wang, x.zhang, and s.ma, "Enhanced Image Decoding via Edge-rendering genetic adaptive Networks," in 2018IEEE International Conference on Multimedia and exception (ICME),2018 propose a new generation countermeasure framework in order to better recover the Edge structure and texture information of a compressed Image. The Network architecture with novel basic blocks is designed by ESRGAN + N.C.Rakotonina and A.Rasoanaivo, "ESRGAN +: flame improvement Enhanced Super-Resolution genetic adaptive Network," in ICASSP 2020IEEE International Conference on Acoustics, speed and Signal Processing (ICASSP),2020, pp.3637-3641 to replace the basic structure used by the original ESRGAN. In the field of ir images, researchers mostly use sparse coding methods c.kraich and s.pumrin, "Performance analysis on multi-frame image Super-Resolution video prediction," in 2014International Electrical Engineering convergence (iecon), 2014.
Sunyibao, Weshihui, Xiaoliang, Zhengrong, and Lu war force, "polymorphic sparsity regularized image super-resolution algorithm," electronic newspaper, "vol.38, No.12, pp.2898-2903,2010.
The method comprises the steps of training autumn and Zhang Wei, a super-resolution reconstruction algorithm based on image block classification sparse representation, electronic newspaper, vol.40, No.5, and pp.920-925,2012.
S.Yang,M.Wang,Y.Chen,and Y.Sun,"Single-Image Super-Resolution Reconstruction via Learned Geometric Dictionaries and Clustered Sparse Coding,"IEEE Transactions on Image Processing,vol.21,no.9,pp.4016-4028,2012.
Y.Tang, Y.Yuan, P.Yan, and X.Li, "Green repetition in space coding space for single-Image super-resolution," Journal of Visual Communication & Image reproduction, vol.24, No.2, pp.148-159,2013 implements super-resolution reconstruction.
In the prior art, although the convolutional neural network obtains good effect in the super-resolution reconstruction work of a single image; however, due to the defects of lack of details, poor contrast, edge blurring and the like of the infrared image, super-resolution reconstruction of the infrared image with an edge structure and good visual quality is still challenging.
Disclosure of Invention
The invention aims to provide an infrared image edge preserving super-resolution reconstruction method based on a generation countermeasure network, which mainly solves the technical problems in the background technology, can recover vivid and edge-clear images from 4 times of low-sampling infrared images, and improves the perception quality of the reconstructed images through better preserving edge structures and predicting visually pleasing details.
The present invention proposes that the perceptual loss function comprises: contrast loss, image fidelity loss, feature loss, and edge fidelity loss. Experimental results show that the method can recover vivid and clear-edged images from 4X-time low-sampling infrared images.
In order to achieve the purpose of the invention, the technical scheme provided by the invention is as follows:
an infrared image edge preserving super-resolution reconstruction method based on a generation countermeasure network comprises two layers of generation countermeasure networks, wherein the network decomposes the generation process of an image into two stages, namely a first stage GAN: in a first phase, a low resolution image I is takenLR0As input, to a first stage generator G0In generating a false image ISR0Generating a false image ISR0With real images IHR0Are sent to a first stage discriminator D0And identifying true and false; second stage GAN: in the second stage, a low resolution image I is generatedSR0As input, the image is sent to a second stage generator G1 to generate a single image ISR1(ii) a The generated image ISR1With real images IHR1Together, are sent to the second stage discriminator D1 and discriminate between true and false.
As a further preference, the first stage employs a reconstruction loss function.
As a further preference, the first-stage reconstruction loss function includes three parts, which are a contrast loss, an image fidelity loss, and an edge fidelity loss, respectively.
As a further preference, the formula (1) adopted by the first-stage reconstruction loss function is:
L1=l1Ladv+l2Lmse+l3Ledge。
the three parts of the reconstruction loss function respectively capture different perception characteristics of a reconstructed image, and aim to obtain a more visually satisfactory reconstructed image;
wherein the weight { l }iIs a trade-off parameter to balance loss components; first part LadvIs the loss of confrontation between generator G0 of GAN and discriminator D0; this section encourages the generator to generate by attempting to spoof the discriminator networkTo form more vivid high-resolution images.
As a further preference, the LadvThe formula (2) used is:
here, theIs a reconstructed imageThe estimated probability of being discriminated as a true HR image. Minimization of the invention to obtain better gradientsAlternative minimization
As a further preference, the LmseThe formula (3) used is:
where W, H and C are the height, width and number of channels, respectively, of the image;
in the above formula, LmseTo ensure fidelity of the restored image, pixel-level MSE loss is exploited.
As a further preference, said LedgeThe formula (4) used is:
where W, H are the width and height of the image;
in the above formula, LedgeLoss of edge fidelity to reproduce sharpnessAnd (4) the edge information of the interest.
Edge map of the mark IEFrom a real 256 x 256 picture IHR0Extracting the specific edge filter;the specific edge filter extraction on the 256 × 256 image ISR0 generated by the generator G0.
In the verification experiment of the invention, a Canny edge detection operator is selected, and the network can continuously guide the edge recovery by minimizing the loss of the edge fidelity.
As a further preference, the generator G0The network structure of (1) comprises three convolution blocks and six residual blocks. Wherein the first volume block comprises a convolutional layer and a PReLU layer; the first volume block is followed by six residual errors; each residual block contains two convolutional layers with a kernel size of 3 × 3 and 64 signatures, two bulk normalization layers and one PReLU layer; the convolution block following the residual block comprises a convolution layer and a batch standardization layer. The last convolutional layer contains a convolutional layer, an upsampling layer, and a PReLU layer.
Generator G0Features are extracted from the image using 6 residual blocks as a stack, each residual block containing two convolutional layers with a kernel size of 3 × 3 and 64 feature maps, two bulk normalization layers and one PReLU layer.
As a further preference, the generator G0The parameters and output formats of the layers of the network structure are XXX, [ XXX, XXX, XXX, XXX](ii) a XXX is an integer.
As a further preference, the discriminator network includes 10 convolutional blocks, and the other blocks include convolutional layers, batch normalization layers, and leakage relu layers except the first block, and the number of filter cores is increased continuously from 64 to 1024 cores in the VGG network. VGG is described in the documents K.Simony and A.Zisserman, "Very deep capacitive networks for large-scale image recognition," Computer Science,2014.
To distinguish the generated SR samples from the true HR samples, the present invention trains a discriminator network D0The overall framework of (1) is an architectural framework following a. radford, l.metz, and s.chitala, "unsuperviced reconstruction Learning with Deep responsive general adaptive Networks,"2015 summarized, using a.l.mas, a.y.hannun, and a.y.ng, "Rectifier non-linear advanced network adaptive models," in proc. icml,2013, vol.30, No.1, p.3. activation avoids maximum pooling of the entire network, using piecewise convolution to reduce image resolution each time the number increases; then connecting a special residual block, wherein the residual block is respectively provided with two convolution layers and a LeakyReLU layer; the output of the last convolution unit is fed into the dense layer with the S-shaped activation function to obtain the true and false results.
Discriminator D0The network layer structure and parameters are shown in the following table,
as a further preference, the generator G1Containing 16 residual blocks.
As a further preference, the generator G1The output format and parameters of each layer are,
as a further preference, the discriminator network D of the second stage1Network architecture of (1) adopts and discriminators D0Similar network structure.
As a further preference, the discriminator network D1The respective layer structure and the network parameters of (a) are,
as a further preference, the second-stage reconstruction loss function includes three partial countermeasure losses, an image fidelity loss, and a feature fidelity loss.
As a further preference, the formula (5) adopted by the second-stage reconstruction loss function is:
L2=l′1Ladv1+l′2Lmse1+l′3Lfeature
wherein the weight { l'iIs a trade-off parameter to balance loss components.
First item Ladv1Is the loss of confrontation between generator G1 of GAN and discriminator D1; second term Lmse1Is image fidelity loss; third item LfeatureLoss of feature fidelity the present invention refers to documents j.johnson, a.alahi, and l.fei-Fei, "Perceptual Losses for Real-Time Style Transfer and Super-Resolution,"2016 defines the feature space distance as a loss of feature fidelity to encourage the reconstructed image to retain similar features as the original image:
where W, H, and C are the height, width, and number of channels of the image, respectively, and f (x) represents a feature space function, which is a pre-trained VGG-19 network that maps the image into feature space, see the documents K.Simony and A.Zisserman, "Very de ep conditional networks for large-scale image recognition," arXiv compressed arXiv:1409.1556,2014. The fourth pooling layer is used to calculate the L2 distance for feature activation as a function of feature fidelity loss.
A two-stage generation countermeasure network framework is proposed that reconstructs super-resolution images by restoration of edge structure information and retention of feature information. In the first stage, the present invention combines image fidelity loss, antagonism loss and edge fidelity loss to preserve the edges of the image. In the second stage, the present invention mines image visual features in combination with image resistance loss, image fidelity loss, and feature fidelity loss. And realizing edge-preserved infrared image super-resolution reconstruction by iteratively updating the generating network and the discriminator network. A large number of experimental verification results show that the method provided by the invention can better reconstruct the infrared super-resolution image compared with a plurality of image reconstruction methods.
By combining image fidelity loss, antagonism loss, feature fidelity loss and edge fidelity loss, a multi-constraint loss function is designed, and a reconstructed image with high resolution and sharp edges is obtained by continuously updating an iterative minimization loss function.
VGG references K.Simnyan and A.Zisserman, "Very deep capacitive networks for large-scale image recognition," Computer Science,2014.
The English nouns used in the present invention are explained as follows:
and (3) GAN: a Generative Adaptive Networks (GAN) is a deep learning model, and is one of the most promising methods for unsupervised learning in complex distribution in recent years. The model passes through (at least) two modules in the framework: the mutual game learning of the Generative Model (Generative Model) and the Discriminative Model (Discriminative Model) yields a reasonably good output. In the original GAN theory, it is not required that G and D are both neural networks, but only that functions that can be generated and discriminated correspondingly are fitted. Deep neural networks are generally used as G and D in practice. An excellent GAN application requires a good training method, otherwise the output may be unsatisfactory due to the freedom of neural network models.
VGG: the main contribution of the Oxford university computer vision Group (Visual Geometry Group) is to construct convolutional neural network structures with various depths by using a small convolutional kernel (3 x 3) aiming at the influence of the depths of the convolutional neural network on the identification precision of a large-scale image set, and evaluate the network structures, so that the network depths of 16-19 layers are finally proved to obtain better identification precision. This is also commonly used to extract image features such as VGG-16 and VGG-19.
BN: batch standardization of Batch Normalization.
Compared with the prior art, the invention has the following beneficial effects:
high resolution images with photoreal details can be obtained;
the generation of the countermeasure network can be enhanced to better restore the edge structure while maintaining the infrared image detail information; in order to maintain the characteristics and edge information of the image, a multi-constraint loss function for super-resolution reconstruction is provided. The proposed method is validated using images in publicly available datasets and the performance of the invention is compared to other popular methods. The result proves that compared with other methods, the network of the invention can obtain the infrared super-resolution reconstructed image with more vivid edges and clearer edges.
Drawings
FIG. 1 is a schematic block diagram of the present invention;
FIG. 2 is a network architecture diagram of the generator G0;
FIG. 3 is a diagram of a network structure of a discriminator D0;
FIG. 4 is a network architecture diagram of the generator G1;
in the figure: conv (convolution convolutional layer), prlu (linear rectification unit/linear rectification function), DeConv (convolution upsampling convolution), BN (batch normalization layer), prlu (Parametric rectification unit, linear rectification function with parameters), Elementwise Sum, pixelshuffle, Tanh (hyperbaric convolution, Hyperbolic tangent function), Restoration (reconstruction), LeakyReLU (modified linear rectification unit).
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-4, the detailed technical solution provided by the present invention is:
an infrared image edge preserving super-resolution reconstruction method based on a generation countermeasure network comprises two layers of generation countermeasure networks, wherein the network decomposes the generation process of an image into two stages, namely a first stage GAN: in the first stage, a 128 × 128 low resolution image ILR0 is input to a first stage generator G0 to generate a false 256 × 256 image ISR0, and the generated 256 × 256 image ISR0 is fed to a first stage discriminator D0 together with a true 256 × 256 image IHR0 to discriminate true and false; second stage GAN: in the second stage, the generated 256 × 256 low-resolution image ISR0 is input to the second stage generator G1, and a 512 × 512 image ISR1 is generated; the generated 512 × 512 image ISR1 is sent to the second stage discriminator D1 together with the real 512 × 512 image IHR1, and the true and false are discriminated; han, X.Tao, and H.Li, "StackGAN: Text to Photo-reactive Image Synthesis with Stacked genetic additive Networks," in 2017IEEE International Conference on Computer Vision (ICCV), 2017; zhang et al, "stack gan + +, reactive Image Synthesis with Stacked genetic adaptation Networks," IEEE Transactions on Pattern Analysis & Machine Intelligence, pp.1-1,2018, inspiring that the present invention proposes a simple and effective two-layer generation countermeasure network; the invention follows the network design made by documents Q.Mao, S.Wang, S.Wang, X.Zhang, and S.Ma, "Enhanced Image Decoding via Edge-rendering genetic adaptive Networks," in 2018IEEE International Conference on Multimedia and Expo (ICM), 2018, and constructs a generator model G1 containing 16 residual blocks; the first stage adopts a reconstruction loss function; the first-stage reconstruction loss function comprises three parts, namely countermeasure loss, image fidelity loss and edge fidelity loss; the formula (1) adopted by the first-stage reconstruction loss function is as follows:
L1=l1Ladv+l2Lmse+l3Ledge;
the three parts of the reconstruction loss function respectively capture different perception characteristics of a reconstructed image, and aim to obtain a more visually satisfactory reconstructed image;
wherein the weight { l }iIs a trade-off parameter to balance loss components; first part LadvIs the loss of confrontation between generator G0 of GAN and discriminator D0; this section encourages generators to generate more realistic high resolution images by attempting to spoof the discriminator network;
said LadvThe formula (2) used is:
here, theIs a reconstructed imageAn estimated probability of being discriminated as a true HR image; minimization of the invention to obtain better gradientsAlternative minimization
As a further preference, the LmseThe formula (3) used is:
where W, H and C are the height, width and number of channels, respectively, of the image;
in the above formula, LmseIn order to ensure the fidelity of the restored image, pixel-level MSE loss is utilized;
as a further preference, said LedgeThe formula (4) used is:
where W, H are the width and height of the image;
in the above formula, LedgeEdge fidelity loss to reproduce sharp edge information;
edge map of the mark IEFrom a real 256 x 256 picture IHR0Extracting the specific edge filter;specific edge filter extraction on the 256 × 256 image ISR0 generated by the generator G0;
in the verification experiment of the invention, a Canny edge detection operator is selected, and the network continuously guides the edge recovery by minimizing the loss of the edge fidelity;
the generator G0Comprises three volume blocks and six residualsAnd (7) difference blocks. Wherein the first volume block comprises a convolutional layer and a PReLU layer; the first volume block is followed by six residual errors; each residual block contains two convolutional layers with a kernel size of 3 × 3 and 64 signatures, two bulk normalization layers and one PReLU layer; the convolution block following the residual block comprises a convolution layer and a batch standardization layer. The last convolutional layer contains a convolutional layer, an upsampling layer, and a PReLU layer.
Extracting features from the image using 6 residual blocks as a stack, each residual block comprising two convolution layers having a kernel size of 3 × 3 and 64 feature maps, two bulk normalization layers and one PReLU layer; the invention follows the Network design made by C.Ledig et al in C.Ledig et al, "Photo-reactive Single Image Super-Resolution Using a genetic adaptive Network,"2016, and introduces skip connection, which has been proved to be effective in training deep neural Network; the invention adopts the Residual blocks proposed in K.He, X.Zhang, S.ren, and J.Sun, Deep Residual Learning for Image registration.2016, pp.770-778 to construct the neural network; the invention here uses 6 residual blocks as a stack to extract features from the image;
the discriminator network comprises 10 convolution blocks, except the first block, the other blocks comprise convolution layers, batch normalization layers and LeakyReLU layers, the number of filter kernels is increased continuously from 64 of VGG network [26] K.Simony and A.Zisserman, version default conditional networks for large-scale image retrieval, and arXiv prediction arXiv:1409.1556,2014 to 1024 kernels;
to distinguish the generated SR samples from the true HR samples, the present invention trains a discriminator network D0The overall framework of (1) is an architecture framework conforming to RadFord et al A.Radford, L.Metz, and S.Chintala, "Unsupervised reconstruction of left with Deep relational generic adaptive Networks,"2015 summary, using LeakyReLU documents A.L.Maas, A.Y.Handun, and A.Y.Ng, "Rectifier nonlinear network resource models," in Proc.icll, 2013, vol.30, No.1, p.3. activation to avoid maximum pooling of the entire network, using fragmentation when the number increases, each time the number increasesConvolution to reduce image resolution; then connecting a special residual block, wherein the residual block is respectively provided with two convolution layers and a LeakyReLU layer; the output of the last convolution unit is fed into the dense layer with the S-shaped activation function to obtain the true and false results.
Discriminator D0The network layer structure and parameters are shown in the following table,
as a further preference, the generator G1Containing 16 residual blocks.
As a further preference, the generator G1The output format and parameters of each layer are,
as a further preference, the discriminator network D of the second stage1Network architecture of (1) adopts and discriminators D0Similar network structure.
As a further preference, the discriminator network D1The respective layer structure and the network parameters of (a) are,
as a further preference, the second-stage reconstruction loss function includes three partial countermeasure losses, an image fidelity loss, and a feature fidelity loss.
As a further preference, the formula (5) adopted by the second-stage reconstruction loss function is:
L2=l′1Ladv1+l′2Lmse1+l′3Lfeature
wherein the weight { l'iIs a trade-off parameter to balance loss components.
First item Ladv1Is the loss of confrontation between generator G1 of GAN and discriminator D1; second term Lmse1Is image fidelity loss; third item LfeatureLoss of feature fidelity the present invention, with reference to document j.johnson, a.alahi, and l.fei-Fei, "Perceptual Losses for Real-Time Style Transfer and Super-Resolution,"2016, defines a feature space distance as a loss of feature fidelity to encourage the reconstructed image to retain similar features as the original image:
where W, H, and C are the height, width, and number of channels of the image, respectively, and f (x) represents a feature space function, which is a pre-trained VGG-19 network document K.Simony and A.Zisserman, "Very deep conditional networks for large-scale image recognition," arXiv prediction arXiv:1409.1556,2014 that maps the image into feature space; the fourth pooling layer is used for calculating the L of feature activation2Distance is used as a feature fidelity loss function.
A two-stage generation countermeasure network framework is proposed that reconstructs super-resolution images by restoration of edge structure information and retention of feature information. In the first stage, the present invention combines image fidelity loss, antagonism loss and edge fidelity loss to preserve the edges of the image. In the second stage, the present invention mines image visual features in combination with image resistance loss, image fidelity loss, and feature fidelity loss. And realizing edge-preserved infrared image super-resolution reconstruction by iteratively updating the generating network and the discriminator network. A large number of experimental verification results show that the method provided by the invention can better reconstruct the infrared super-resolution image compared with a plurality of image reconstruction methods.
By combining image fidelity loss, antagonism loss, feature fidelity loss and edge fidelity loss, a multi-constraint loss function is designed, and a reconstructed image with high resolution and sharp edges is obtained by continuously updating an iterative minimization loss function.
The English nouns used in the present invention are explained as follows:
and (3) GAN: a Generative Adaptive Networks (GAN) is a deep learning model, and is one of the most promising methods for unsupervised learning in complex distribution in recent years. The model passes through (at least) two modules in the framework: the mutual game learning of the Generative Model (Generative Model) and the Discriminative Model (Discriminative Model) yields a reasonably good output. In the original GAN theory, it is not required that G and D are both neural networks, but only that functions that can be generated and discriminated correspondingly are fitted. Deep neural networks are generally used as G and D in practice. An excellent GAN application requires a good training method, otherwise the output may be unsatisfactory due to the freedom of neural network models.
VGG: the main contribution of the Oxford university computer vision Group (Visual Geometry Group) is to construct convolutional neural network structures with various depths by using a small convolutional kernel (3 x 3) aiming at the influence of the depths of the convolutional neural network on the identification precision of a large-scale image set, and evaluate the network structures, so that the network depths of 16-19 layers are finally proved to obtain better identification precision. This is also commonly used to extract image features such as VGG-16 and VGG-19.
In the training process, 8862 pictures are selected from a training data set of a thermal sensor data set FLIR _ ADAS _1_3 issued by a sensor system developer FLIR company in 2018, firstly, all experimental data are subjected to down-sampling by a factor of 4x, and HR images are reduced to obtain LR images; all experiments were performed on a desktop computer with 2.20GHz × 40Intel Xeon (R) Silver 4114CPU, GeForce GTX 1080Ti, 64GiB memory; set batch size to 4 and use ADAM [28 ] with momentum term b ═ 0.9]As an optimization program; in order to keep the loss function in the same order of magnitude and thus better balance the loss components,/, in equation (1) is given experimental results1Is arranged as 10-3,l2、l3Is set to 1; l 'in formula (5)'1Is arranged as 10-3L 'will'2Is set to be 1, l'3Is arranged as 10-6In training the first phase GAN, the present invention sets the learning rate to 10-4And reducing the learning rate of the second stage GAN training period to 10-5。
In order to verify the efficiency of the method proposed by the invention, the invention performs experimental verification on two public data sets: the verification set of FLIR _ ADAS _1_3 and the Itir _ v1_0 data set; the method of the invention and the most advanced methods SRCNN document D.Chao, C.L.Chen, K.He, and X.Tang, "Learning a derived responsive Network for Image Super-Resolution," in ECCV,2014, ESPCN W.Shi et al, "Real-Time Single Image and Video Super-Resolution Using electronic Sub-Pixel responsive Network New Network,"2016, SRGAN C.Ledi et al, "" Photo-reactive Single Image Super-Resolution Using a generic responsive additive Network, "2016, ESRGAN Q.Mao, S.Wang, S.S.S.S.S.converting, X.Zhang, S.M.processing," engineering-obtaining A derived responsive Network and intermediate application, IEEE 2018 + IEEE sample application, and version of the first version of the invention, 2020, pp.3637-3641; three images were selected from the FLIR _ ADAS _1_3 validation set and the Itir _ v1_0 data set, respectively, as shown under the subjective results of several methods of reconstruction; it is not difficult to see from the reconstruction result that the reconstruction result of the method provided by the invention generates finer texture and edge details;
table 3 comparison of the reconstruction results of images in the validation set using FLIR _ ADAS _1_ 3:
in table 3, the first line is the original image, the second line is the reconstruction result of the SRCNN method, the third line is the reconstruction result of the ESPCN method, the fourth line is the reconstruction result of the SRGAN method, the fifth line is the reconstruction result of the ESRGAN method, the sixth line is the reconstruction result of the ESRGAN + method, and the last line Ours is the reconstruction result of the method provided by the present invention;
table 4 comparison of reconstruction results using images in the Itir _ v1_0 dataset:
in table 4, the first line is the original image, the second line is the reconstruction result of the SRCNN method, the third line is the reconstruction result of the ESPCN method, the fourth line is the reconstruction result of the SRGAN method, the fifth line is the reconstruction result of the ESRGAN method, the sixth line is the reconstruction result of the ESRGAN + method, and the last line Ours is the reconstruction result of the method provided by the present invention;
for fair quantitative comparison, the conventional objective indices PSNR document C.Yim and A.C.Bovik, "Quality Assessment of Deblocked Images," IEEE Transmission Image Process, vol.20, No.1, pp.88-98,2011 and SSIM document Z.Wang, A.C.Bovik, H.R.Sheikh, and E.P.Simplex, "Image Quality Assessment of From Error Visibility to Structural Similarity," IEEE Transmission Image Process, vol.13, No.4,2004 were used to evaluate the reconstructed Image Quality; table 5 below shows the results of a quantitative comparison of the different reconstruction methods; it can thus be concluded that the method proposed by the present invention is superior to the SRCNN, ESPCN, SRGAN, ESRGAN and ESRGAN + methods on both data sets;
table 5:
table 5 methods compare the results of the quantification of SRCNN, ESPCN, SRGAN, ESRGAN + on the FLIR _ ADAS _1_3 validation set and TNO data set and the method proposed by the present invention.
The super-resolution results were further compared using the advanced visual task:
the basic visual tasks including image super-resolution reconstruction are served for the advanced visual tasks, and in order to further verify the method, the super-resolution images generated by a plurality of methods are matched with the real high-resolution images; the Scale Invariant Feature Transform (SIFT) is a representation of a Gaussian image gradient statistical result in the field of feature points, and is a common image local feature extraction algorithm; in the matching result, the number of matching points can be used as the standard of matching quality, and the corresponding matching points can also judge the similarity of the local features of the two images; table 6 below shows the results of matching the super-resolution reconstructed image with the original high-resolution image by the SIFT algorithm; quantitatively, the reconstructed image generated by the method provided by the invention obtains more correct matching pairs than other methods;
table 6 super-resolution reconstructed image matching results, the left images are all the original high-resolution images:
in the experiment, the image target Detection is carried out by using the methods of classic YOLO documents J.Redmon, S.Divvala, R.Girshick, and A.Farhadi, "You Only Look one: Unifield, Real-Time Object Detection and 06/082015, and as can be seen from the table 7, the super-resolution reconstructed image generated by the method provided by the invention has better Detection results and can detect more targets;
table 7 target detection results of super-resolution reconstructed images:
the recombination of the above implementation steps is the technical effect which can be expected by the present invention.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (10)
1. An infrared image edge preserving super-resolution reconstruction method based on a generation countermeasure network is characterized by comprising two layers of generation countermeasure networks, wherein the generation process of an image is decomposed into two stages by the networks, and a first stage GAN: in the first stage, the low-resolution image is used as input and sent to a first stage generator to generate a false image, the generated false image and a real image are sent to a first stage discriminator to discriminate true and false; second stage GAN: in the second stage, the generated low-resolution image is taken as input and sent to a second-stage generator to generate one image; the generated image is sent to a second stage discriminator together with the real image, and the authenticity is discriminated.
2. The method for edge-preserving super-resolution reconstruction of infrared images based on generation of countermeasure networks as claimed in claim 1, wherein the first stage employs a reconstruction loss function.
3. The infrared image edge preserving super-resolution reconstruction method based on the generation countermeasure network of claim 1, wherein the first-stage reconstruction loss function comprises three parts, which are countermeasure loss, image fidelity loss and edge fidelity loss.
4. The method for edge-preserving super-resolution reconstruction of infrared images based on generation of countermeasure network as claimed in claim 3, wherein the formula adopted by the first-stage reconstruction loss function is:
L1=l1Ladv+l2Lmse+l3Ledge,
wherein the weight { l }iIs a trade-off parameter to balance loss components; first part LadvIs the loss of confrontation between the generator G0 of GAN and the discriminator D0.
5. The method of claim 4, wherein L is an edge preserving super-resolution reconstruction method for the infrared image based on the generation countermeasure networkadvThe formula used is:
6. The method of claim 4, wherein L is an edge preserving super-resolution reconstruction method for the infrared image based on the generation countermeasure networkmseThe formula used is:
where W, H and C are the height, width and number of channels, L, respectively, of the imagemseTo ensure fidelity of the restored image, pixel-level MSE loss is exploited.
7. The method of claim 4, wherein L is an edge preserving super-resolution reconstruction method for the infrared image based on the generation countermeasure networkedgeThe formula used is:
where W, H are the width and height of the image, LedgeEdge fidelity is lost in order to reproduce sharp edge information.
8. The method of claim 1, wherein the generator G is used for generating an edge-preserving super-resolution reconstruction of the infrared image based on the countermeasure network0The network structure of (1) comprises three convolution blocks and six residual blocks; wherein the first volume block comprises a convolutional layer and a PReLU layer; the first volume block is followed by six residual errors; each residual block contains two convolutional layers with a kernel size of 3 × 3 and 64 signatures, two bulk normalization layers and oneA PReLU layer; the convolution block connected behind the residual block comprises a convolution layer and a batch standardization layer; the last convolutional layer contains a convolutional layer, an upsampling layer, and a PReLU layer.
9. The method of claim 1, wherein the generator G is used for generating an edge-preserving super-resolution reconstruction of the infrared image based on the countermeasure network0The parameters and output formats of the layers of the network structure are XXX, [ XXX, XXX, XXX, XXX](ii) a XXX is an integer.
10. The method of claim 1, wherein the discriminator network comprises 10 convolution blocks, and all the blocks except the first block comprise convolution layer, batch normalization layer and LeakyReLU layer, and the number of filter kernels is increased from 64 to 1024 kernels in VGG.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111269585.3A CN113920015A (en) | 2021-10-29 | 2021-10-29 | Infrared image edge preserving super-resolution reconstruction method based on generation countermeasure network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111269585.3A CN113920015A (en) | 2021-10-29 | 2021-10-29 | Infrared image edge preserving super-resolution reconstruction method based on generation countermeasure network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113920015A true CN113920015A (en) | 2022-01-11 |
Family
ID=79243463
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111269585.3A Pending CN113920015A (en) | 2021-10-29 | 2021-10-29 | Infrared image edge preserving super-resolution reconstruction method based on generation countermeasure network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113920015A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114463449A (en) * | 2022-01-12 | 2022-05-10 | 武汉大学 | Hyperspectral image compression method based on edge guide |
-
2021
- 2021-10-29 CN CN202111269585.3A patent/CN113920015A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114463449A (en) * | 2022-01-12 | 2022-05-10 | 武汉大学 | Hyperspectral image compression method based on edge guide |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110119780B (en) | Hyper-spectral image super-resolution reconstruction method based on generation countermeasure network | |
Lei et al. | Coupled adversarial training for remote sensing image super-resolution | |
CN110706157B (en) | Face super-resolution reconstruction method for generating confrontation network based on identity prior | |
CN106952228B (en) | Super-resolution reconstruction method of single image based on image non-local self-similarity | |
Wang et al. | Ultra-dense GAN for satellite imagery super-resolution | |
CN112734646B (en) | Image super-resolution reconstruction method based on feature channel division | |
CN112001847A (en) | Method for generating high-quality image by relatively generating antagonistic super-resolution reconstruction model | |
CN113096017B (en) | Image super-resolution reconstruction method based on depth coordinate attention network model | |
Huang et al. | Deep hyperspectral image fusion network with iterative spatio-spectral regularization | |
Hayat | Super-resolution via deep learning | |
CN106920214B (en) | Super-resolution reconstruction method for space target image | |
CN110136060B (en) | Image super-resolution reconstruction method based on shallow dense connection network | |
CN111640059B (en) | Multi-dictionary image super-resolution method based on Gaussian mixture model | |
CN108765280A (en) | A kind of high spectrum image spatial resolution enhancement method | |
Li et al. | Image super-resolution with parametric sparse model learning | |
CN109272452A (en) | Learn the method for super-resolution network in wavelet field jointly based on bloc framework subband | |
Chen et al. | MICU: Image super-resolution via multi-level information compensation and U-net | |
CN109949217B (en) | Video super-resolution reconstruction method based on residual learning and implicit motion compensation | |
CN111489405B (en) | Face sketch synthesis system for generating confrontation network based on condition enhancement | |
Bao et al. | SCTANet: A spatial attention-guided CNN-transformer aggregation network for deep face image super-resolution | |
Zhang et al. | Learning stacking regressors for single image super-resolution | |
Xia et al. | Meta-learning-based degradation representation for blind super-resolution | |
CN110097499B (en) | Single-frame image super-resolution reconstruction method based on spectrum mixing kernel Gaussian process regression | |
Hua et al. | Dynamic scene deblurring with continuous cross-layer attention transmission | |
CN113920015A (en) | Infrared image edge preserving super-resolution reconstruction method based on generation countermeasure network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |