CN110570353A

CN110570353A - Dense connection generation countermeasure network single image super-resolution reconstruction method

Info

Publication number: CN110570353A
Application number: CN201910797707.2A
Authority: CN
Inventors: 李素梅; 陈圣
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-08-27
Filing date: 2019-08-27
Publication date: 2019-12-13
Anticipated expiration: 2039-08-27
Also published as: CN110570353B

Abstract

The invention belongs to the field of video and image processing, and aims to further improve the reconstruction effect and the reconstruction precision of a high-resolution image and promote the structure of a generated countermeasure network and the improvement of a loss function, the method for reconstructing the super-resolution of a single image of the intensively connected and generated countermeasure network comprises two parts, namely a generation network and a countermeasure network, wherein the generation network adopts a basic framework of a residual error intensive network (RDN), the countermeasure network adopts deep convolution to generate a network frame of the countermeasure network DCGAN discriminator, the low-resolution image is used as input and sent to the generation network to be processed, the obtained output is sent to the countermeasure network for judgment, the judgment result is fed back to the generation network through the loss function, and circulating the steps until the confrontation network is judged to be qualified, generating a clear image by using the generation network, and then completing super-resolution reconstruction of the low-resolution image by using the trained generation network. The invention is mainly applied to the image processing occasion.

Description

dense connection generation countermeasure network single image super-resolution reconstruction method

Technical Field

the method belongs to the field of video and image processing, and relates to the improvement of an image super-resolution reconstruction algorithm, the fusion of a deep learning theory and image super-resolution reconstruction, the realization and the application of a dense residual convolution neural network and a generation countermeasure network in the field of high-resolution image reconstruction. In particular to a method for generating single image super-resolution reconstruction of a countermeasure network based on dense connection.

Background

Image super-resolution refers to the process of obtaining a corresponding high resolution image by using a sequence of single or multiple low resolution degraded images. In many practical applications in the field of image processing, people often want to acquire a high-resolution original image, because the high-resolution image means a higher pixel density, and can provide richer high-frequency detail information, thereby establishing a good foundation for post-processing of the image and accurate extraction and utilization of the image information. However, in reality, due to limitations of hardware imaging equipment and lighting conditions, interference of human or natural factors, and the like, different types of noise may be introduced during imaging, transmission, storage, and the like, and these factors directly affect the quality of an image, and it is often difficult to obtain a desired high-resolution image. Therefore, how to improve the quality of the acquired image and obtain a high-resolution image meeting the application requirements becomes a key research topic in image processing. Meanwhile, as a practical technology with strong specialty, the image super-resolution reconstruction has a very wide application prospect in the fields of biomedical [1], satellite remote sensing [2], medical images and public safety [3] and national defense military and science and technology, and is increasingly paid attention to by people. For example: the super-resolution reconstruction technology is adopted in a high-definition digital television signal application system, so that the transmission cost of signals can be further reduced, and meanwhile, the definition and the quality of pictures can be ensured. The method can often acquire multi-frame images about the same region in military images and satellite observation images, and can realize image observation higher than the system resolution ratio by adopting a super-resolution reconstruction technology based on the multi-frame images, thereby improving the target observation accuracy. The super-resolution technology is utilized in a medical imaging system (CT, Magnetic Resonance Imaging (MRI)) to improve the image quality, clearly present the details of a lesion target and assist the treatment of a patient. In public places such as banks, traffic intersections, markets and the like, more detailed information can be captured through super-resolution reconstruction of key parts of the monitoring images, and important clues are provided for processing of public safety events.

image super-resolution reconstruction is an image processing method with great practical application value, and the concept of the image super-resolution reconstruction is essentially originated from related research in the optical field, wherein the super-resolution means restoring image information beyond the diffraction limit of a spectrum. The concept of super-resolution was first explicitly proposed in the literature on radar research by Toraldo di Francia, while the complex principle of super-resolution for images was first proposed by Harris and Goodman in a method that becomes the spectral extrapolation of Harris-Goodman. Since the initial image super-resolution research is mainly performed on a single-frame image, and the super-resolution effect of the image is greatly limited, although many scholars propose methods for image restoration, the methods only achieve a good simulation effect under certain premise assumption conditions, and the effect is not ideal in practical application. In 1984, Tsai and Huang firstly propose a super-resolution reconstruction method based on multi-frame or sequence low-resolution images and a reconstruction method based on frequency domain approximation, so that the research of multi-frame image super-resolution reconstruction is greatly improved and developed. After decades of research and exploration, many kinds of specific reconstruction methods have emerged in the field of image super-resolution. The super-resolution reconstruction of images can be classified into a reconstruction method based on a single frame image and a reconstruction method based on a multi-frame sequential image according to the difference in the number of processed original low-resolution images. The former mainly uses the prior information of a single frame image to recover the lost high-frequency information when the image is acquired. The latter not only utilizes the prior information of a single frame image, but also considers fully utilizing complementary information among different frame images, and provides more complete and sufficient characteristic data for restoring high-frequency information of the image, so that the super-resolution restoration effect of the super-resolution restoration method is usually superior to that of the former. However, in most practical situations, it is sometimes difficult to acquire multi-frame captured images of the same scene, and the super-resolution research based on multi-frame images is also based on processing a single-frame image, so the super-resolution research based on a single-frame image has been a research hotspot in the field of image super-resolution. According to a specific implementation method, super-resolution reconstruction of an image can be divided into a frequency domain method and a spatial domain method. The frequency domain method is to remove aliasing of the spectrum in the frequency domain, thereby improving spatial resolution accordingly. The frequency domain methods that are currently more popular include energy continuous degradation and antialiasing reconstruction methods. The frequency domain method has the advantages of simple theory, low operation complexity and easy realization of parallel processing, and has the defects that the theoretical premise of the method is too ideal, the method cannot be effectively applied to most practical occasions, only can be limited to be applied to a simpler degradation model, and the included airspace priori knowledge is limited. The spatial domain method is wide in application range, has strong capacity of containing space prior constraints, and mainly comprises an iterative back projection method, a set theory method, a statistical recovery method and the like. And for an objective evaluation system of the image super-resolution reconstruction effect, as the most key evaluation indexes in the field of image super-resolution reconstruction, a peak signal-to-noise ratio (PSNR) and a Structural Similarity (SSIM) are key parameters for measuring and comparing the final reconstruction effect. The peak signal-to-noise ratio is the pixel value deviation of the high-resolution image obtained by accumulating calculation reconstruction pixel-by-pixel and the original real high-resolution image, so that the deviation degree of the high-resolution image and the original real high-resolution image on the whole pixel value is reflected, and the measurement unit is dB. The structural similarity focuses on comparing the similarity of the reconstructed image and the original image in the aspects of texture features, structural features and the like, the measurement result is a real number between 0 and 1, and generally, the closer to 1, the better the recovery performance of the reconstruction method on the image structure and the texture is, and the structural similarity of the reconstructed image and the original high-resolution image can be maintained to a greater extent. In addition, when the reconstruction effect of the reconstructed image is subjectively evaluated, the quality of various algorithms is evaluated by adopting a method of amplifying a local key area of the image and observing and comparing the recovery degree of different algorithms on image details and high-frequency information.

Currently, image super-resolution technology is mainly divided into three research directions: interpolation-based methods [16,17], reconstruction-based methods [18,19,20] and learning-based methods [21,22,23 ]. The interpolation-based method typically includes bilinear interpolation, bicubic interpolation, and the like. The method is simple and easy to implement, has relatively low complexity, and has relatively poor recovery effect on high-frequency effective information. The reconstruction-based method mainly comprises a convex set projection method, a Bayes analysis method, an iterative back projection method, a maximum posterior probability method, a regularization method, a mixing method and the like. The learning-based method mainly comprises an Example-based method, a neighborhood embedding method (neighbor embedding), a support vector regression method (SVR) and a sparse representation method (sparse representation).

The learning-based image reconstruction method can often obtain high-level information of more images and is beneficial to restoring high-frequency information of the images, so that the method is easier to obtain more ideal reconstruction results. Particularly, in recent years, with the emergence of artificial intelligence wave, the application of deep learning theory in the field of classical image processing is increasing, and excellent results superior to those of the traditional algorithm are obtained continuously. In view of this, relevant scientific researchers have conducted intensive research on the application of deep learning in the field of image super-resolution, and have obtained a great deal of research results. On the European computer vision conference (ECCV) in 2014, the director et al from the university of Chinese in hong Kong first proposed the idea of applying a Convolutional Neural Network (CNN) to the field of image super-resolution reconstruction, and realized the complete end-to-end mapping from a low-resolution image to a high-resolution image by constructing a simple three-layer CNN network SRCNN [4], wherein the feature extraction of the image, the nonlinear mapping of the low-resolution and high-resolution images, and the construction and output of the final image are all completely completed by a network, and the model parameters of the network are obtained by learning based on a large number of data sets. SRCNN 4 obtains an experimental effect obviously superior to that of the traditional super-resolution algorithm, proves a good application prospect of deep learning in the super-resolution field, and also indicates a new direction for the super-resolution research of images. After SRCNN 4, it aims at the problems of relatively shallow SRCNN 4 network layer, weak feature extraction and mapping capability, insufficient receptive field, slow convergence speed, etc. Kim et al, Korea seoul university, proposed a very deep super-resolution reconstruction network (VDSR 5) comprising 20 convolutional layers, which greatly increased the network's receptive field and enhanced the network's learning ability. And a global residual error structure is introduced into the network, so that the learning target of the network is changed into a residual error image, thereby greatly reducing the learning difficulty of the network and accelerating the convergence rate. Meanwhile, in order to control the number of parameters of the network, Kim et al try to utilize a recursive structure in DRCN [6] (deep recursive convolutional network), ensuring that the depth of the network is increased by widely using recursive blocks without introducing new parameters to be learned, and improving the reconstruction effect. In order to realize the scale enlargement of low-resolution images and effectively reduce the computation amount of the network, the dungeon et al leads the deconvolution network to be introduced into the improved FSRCNN [11] network, and the scale enlargement of the images is realized by the self parameter learning of the network. With the increase of the network depth, the learning effect of the network is seriously influenced by the reduction of the learning efficiency caused by the network degradation problem, and the ReNet [7] proposes a local residual error structure by He Kming et al, and effectively avoids the effect degradation caused by the over-depth of the network by creating a short shortcut connection in a local residual error block, thereby further enhancing the training speed and the learning effect of the network. On the basis of fully analyzing the networks of ResNet [7], DRCN [6], VDSR [5], Tai et al, the advantages of local and global residual errors and recursive structure are combined, a deep recursive residual error network (DRRN [8]) is proposed, and the reconstruction effect is remarkably improved. SRCNN, DRCN and DRRN all require preprocessing outside the network, and cannot realize end-to-end reconstruction from low resolution images to high resolution images, reducing the efficiency of the network. Sub-pixel convolution layers (sub-pixel volumetric layers) are introduced to achieve end-to-end image reconstruction Wenzhe Shi [9], and the like, and an up-sampling process is included in a network, so that the efficiency of a model is greatly improved. In 2017, Jia-Bin Huang [10] and the like utilize a deep Laplacian pyramid to realize rapid and accurate image super-resolution, the article combines the traditional Laplacian pyramid with a convolutional neural network, the network realizes the gradual amplification of low-resolution images, and parameter sharing among the pyramids of different levels is realized through recursion, so that the calculation amount is reduced, and the precision is effectively improved. With the extensive research of deep learning, the image reconstruction algorithm based on the convolutional neural network has greatly improved the precision and the speed. However, the reconstruction result has poor effect in areas with repeated textures, boundaries, corners and the like, and cannot meet the subjective vision of human eyes. SRGAN [11] appeared in 2016, and authors adopted an architecture based on generation of a countermeasure network and introduced a visual loss function. From the quantitative evaluation results, the evaluation value obtained by SRGAN is not very high. But subjectively the high resolution image generated by the SRGAN appears more realistic. The generation of SRGAN raises the research enthusiasm of generation of an antagonistic network in the field of image over-segmentation, Bingzhe Wu et al [12] in 2017 propose SRPGAN, construct a more stable visual loss function based on a discrimination network, and utilize a Charbonnier loss function as the content loss of a model. SRPGAN greatly improves the SSIM value of the reconstruction result. Wang [13] improved SRGAN, which uses Residual-in-Residual Dense Block (RRDB) with BN layer removed as a generation network, and the visual loss of SRGAN is based on an output feature map after an activation function, and in the model, authors adopt the feature map before activation to calculate the visual loss. The improved model has improved brightness and repeated texture area performance.

Disclosure of Invention

in order to overcome the defects of the prior art, the invention aims to utilize the capacity that the generation of the countermeasure network can effectively recover an eye-comfortable result and fully utilize a dense residual error structure closely related with the residual errors to realize the fast and accurate learning of the high-frequency characteristics of the image, further improve the reconstruction effect and the reconstruction precision of the high-resolution image, and simultaneously promote the structure of the generation of the countermeasure network and the improvement of a loss function to a certain extent and the deep application and development of the structure in the field of the super-resolution reconstruction of the image. The method comprises the steps of generating a network and a countermeasure network, wherein the network is generated by adopting a basic frame of a residual dense network RDN, 5 dense connecting block DCB blocks are used as basic modules, the countermeasure network is generated by adopting deep convolution to generate a network frame of a countermeasure network DCGAN discriminator, a low-resolution image is used as an input and sent to the generated network, the obtained output is sent to the countermeasure network for judgment after the network is processed, the judgment result is fed back to the generated network through a loss function, the process is circulated until the countermeasure network is judged to be qualified, the generated network can generate a clear image, and then the trained generated network is used for completing super-resolution reconstruction of the low-resolution image.

The training set is required to be manufactured and data is required to be preprocessed:

Firstly, down-sampling processing is carried out on an original high-resolution color image to obtain a corresponding low-resolution image, the low-resolution image obtained under the real condition is simulated by the low-resolution image, the low-resolution image is used as input, and then the down-sampling processing is carried out on the high-resolution image by utilizing a bicubic interpolation formula:

I_lr＝W(x)*I_hr

Wherein I_lrFor downsampling of the resulting low-resolution image, I_hrFor high resolution images, W (x) is a weight matrix for bicubic interpolation, according to I_lrAnd I_hrCalculating the distance x of the middle corresponding pixel point:

then the low resolution image I obtained by down sampling_lrAnd carrying out data normalization processing on the high-resolution image to obtain a normalized image matrix I_lrb:I_lrb＝I_lr/255，I_hrb:I_hrb＝I_hrAnd 255, randomly cutting the low-resolution image and the corresponding high-resolution image, using the cut of the low-resolution image which is manufactured finally for the input of the cascade residual error network, using the cut of the high-resolution image as a label of the network, and finishing the training of the neural network by using the manufactured training set.

the first two layers of the generated network basic framework are shallow feature extraction layers, and the size and the number of the cores are (3, 64); the middle is a feature extraction layer which consists of 5 DCB modules, the output of each module is sent to a concatant (Concatenated layer) layer, and the concatant layer is followed by a bottleneck layer with the core size and the number of (1, 64); then, making a residual error between the output of the bottleneck layer and the output of the first layer; and finally, an upper sampling layer is formed, and the size, the step length and the number of the cores are (6,2,2 and 3).

Each DCB block contains four convolutional layers Conv1, 2,3,4 and a bottleneck layer Conv5, after each convolutional layer there is a cascade operation to achieve dense connection in the residual, the bottleneck layer at the end of the DCB is a local feature fusion layer used for fusing a large number of feature maps;

The convolution kernel size of the four convolutional layers in the DCB is set to 3 x 3, the kernel size of the last bottleneck layer is set to 1 x 1,Suppose the input and output of the D-th DCB block are D, respectively_d-1And D_d，D_cDenoted as the 4 th concat [29]]The output of the layer, then:

D_c＝f_cat4f_cr4(f_cat3f_cr3(f_cat2f_cr2(f_cat1f_cr1(D_d-1)))) (1)

Wherein f is_criDenotes the convolution of the i (i ═ 1,2,3,4) th convolution layer with the ReLU layer, the ReLU activation operation, f_catiConcat [29] representing the ith (1,2,3,4) convolutional layer]Cascade operation, using f_borepresenting the convolution operation in the bottleneck layer, the output of the DCB is represented as:

D_d＝f_bo(D_c) (2)

The bottleneck layer in DCB is a local feature fusion operation for adaptively fusing D_d-1Features of the model and outputs of all convolutional layers in the current model.

In a Deep generation countermeasure network DCGAN (Deep rational generated adaptive Networks), a long-step convolution is used for replacing an upsampling layer, a normalization layer normalizes the output of a characteristic layer together, an activation function is adjusted in a discriminator to prevent gradient sparsity, the countermeasure network based on the DCGAN is composed of a convolution block, 6 CBL blocks and a Dense connection, LeakyReLU is used as an activation function delta in the CBL blocks, a full connection layer Dense1024 with the output of 1024 and a full connection layer Dense1 with the output of 1 are realized by the convolution layer, finally, an output value is obtained through a sigmoid function, the sizes of convolution kernels in the network are all 3 multiplied by 3, and the filling is all 1.

Loss functionThe method is formed by weighted combination of three parts:

The first part is l_{im age}Is based on the L1 norm loss function of the pixel points,Wherein G isⁱ(x) Representing the i-th low-resolution image X which is input and is obtained after passing through the generator and the resolution is improved, XⁱN represents the number of images for the corresponding original image; convolutional neural network VGG16[20]Content loss function l_VGG[23]，Result G obtained by training modelⁱ(x) From the original sharp image XⁱAre fed into pre-trained VGG16[ 20)]In the network of (1), the Euclidean distance, φ, between feature maps obtained by the kth convolutional layer is calculated_k,jRepresents VGG16[20]N represents the total amount of the feature maps output by the kth convolutional layer, and the content loss function can ensure that the contents of the two images are similar. Against loss l_D，

The invention has the characteristics and beneficial effects that:

The loss function of the SRGAN is improved, using the L1 norm for generating losses and the L2 norm for sensing losses. The dense residual structure is proposed as a generating network, and not only can the high-frequency abstract features of the picture be fully extracted, but also the low-level features can be reserved, so that the result can better meet the visual requirement. As shown in fig. 2, the test results on the reference data set show that the model comparison SRGAN achieves more excellent results in both objective indicators and subjective visual effects.

Description of the drawings:

Fig. 1 is based on generating an image super-resolution reconstruction model of a countermeasure network.

Figure 2 comparison of 4 times reconstruction results for different loss functions.

FIG. 3.4 times magnification compares our reconstruction results with LapSRN, VDSR, and SRGAN. We use color boxes to highlight sub-regions that contain rich details. We enlarge the sub-area in the lower box to show more detail. As can be seen from the subregion image, the method has strong capability of recovering high-frequency details and sharp edges.

FIG. 4 a dense connection module (DCB) architecture.

FIG. 5A specific structure of a CBL cell.

Detailed Description

Compared with the SRGAN, the method for generating the network adopts the Residual-in-Residual detect Block to extract high-level features. In contrast to SRPGAN, the content loss function employs a feature-based 1 norm. In contrast to ESRGAN, the generation network uses a global feature fusion layer before upsampling, and the activation function of the RRDB module uses relu. The experimental result shows that the generated picture has better visual effect.

As a classic topological structure in an artificial neural network, a convolutional neural network has extremely wide application in the fields of pattern recognition, image and voice information analysis and processing and the like. In the super-resolution reconstruction field of images, the Dong-super et al first proposes SRCNN 4 network, and after successfully applying Convolutional Neural Network (CNN) to the recovery and reconstruction of high-resolution images, many improved CNNs are proposed one after another and all the improved CNNs are obviously improved on key reconstruction effect evaluation indexes. However, the reconstruction result has poor effect in areas with repeated textures, boundaries, corners and the like, and cannot meet the subjective vision of human eyes. SRGAN [11] appeared in 2016, and authors adopted an architecture based on generation of a countermeasure network and introduced a perceptual loss function. From the quantitative evaluation results, the evaluation value obtained by SRGAN is not very high. But subjectively the high resolution image generated by the SRGAN appears more realistic.

Super resolution generation countermeasure networks (SRGANs) are a pioneering effort that can generate realistic textures during super resolution of a single image. However, since the loss function takes a pixel-based L2 norm, the details that create the illusion are often accompanied by unpleasant artifacts. To this end, we propose to generate a countermeasure network based on dense connections, as shown in fig. 1.

The model combines a Residual Dense Network (RDN) [14] basic framework model with a deep convolution-generated countermeasure network (DCGAN) [28] network. The network is generated by using the basic framework of the RDN [14] as reference and using 5 Dense Connection Block (DCB) blocks as basic modules, and the network framework of the DCGAN discriminator [28] is used as reference in the countermeasure network. Specific implementation procedures will be described below. The input and output of the model are all color images.

The image super-resolution reconstruction method based on the dense connection generation countermeasure network mainly relates to the following contents: the generation network of the model is based on a densely connected residual error structure, and the high-frequency characteristics of the input image are quickly and accurately learned by utilizing the close relation among the residual errors. Our countermeasure network is modeled as that of DCGAN [28 ]. The generation loss function of the generation network is adjusted to be L1 norm, the L1 cost function can obtain real texture characteristics conforming to human eye subjective characteristics, the perception loss function based on VGG is still based on L2 norm, and the combination of the two loss functions ensures that the reconstruction result is very close to the target image in low-level pixel value, high-level abstract characteristics and the whole. The loss function of the countermeasure network removes the original logarithm operation and ensures that the generator obtains the same distribution as the original data. The final reconstruction effect is greatly improved by generating the game of the network and the game of the countermeasure network, the work flow of the network is introduced in the following specific implementation link of the scheme, the detailed structure of the generated network is shown, and the final reconstruction effect is compared and analyzed.

Training a sample: the disclosed database VOC2012[24] is used herein for training of networks, the data set being a benchmark test applied to classification recognition and detection of visual objects, the picture set comprising 20 catalogs. The data set has good image quality and complete labels, and is very suitable for testing the performance of the algorithm. From this data set, 16,700 images were selected for training of the network and 100 images were selected for the validation set of the network. Experiments achieved 4 times up-sampling, and 22 × 22 low-resolution patches were obtained by processing the randomly cropped 88 × 88 sharp color image using bicubic interpolation as input to the network.

Testing a sample: the Set5[25], Set14[26] and BSD100[27] are used as a test data Set, the model directly processes three-channel input (RGB) images, and results show that the model can not only reconstruct and obtain results which accord with human eyes subjectivity when sampling factors are 2,4 and 8, but also objective evaluation indexes greatly exceed those of the conventional GAN network, and the model has great practical application value.

the method is explained in detail below with reference to the technical scheme:

After the model construction is completed, a proper optimization algorithm needs to be selected to minimize a loss function to obtain an optimal parameter, the weight and deviation of the model are updated by the model through an Adaptive Moment Estimation method (Adam: Adaptive motion Estimation), and the Adam algorithm is different from the traditional random gradient descent SGD. The stochastic gradient descent keeps a single learning rate updating all weights, and the learning rate does not change during the training process. Adam, in turn, designs independent adaptive learning rates for different parameters by computing first and second order moment estimates of the gradient. The algorithm parameters include: step size ε (default 0.001), the exponential decay rate ρ of the moment estimate₁And ρ₂(default 0.9 and 0.999), small constant δ for numerical stability (default 10: 10)^-8). Our implementation is based on a pytorech. We trained 3 models separately with scaling factors of 2,4, and 8, respectively.

The method comprises the following specific steps:

1 training set preparation and data preprocessing

Firstly, downsampling processing is carried out on an original high-resolution color image to obtain a corresponding low-resolution image, and the low-resolution image acquired under a real condition is simulated by using the low-resolution image as input. Then, utilizing a bicubic interpolation formula to perform downsampling processing on the high-resolution image:

I_lr＝W(x)*I_hr

Wherein I_lrFor downsampling of the resulting low-resolution image, I_hrfor high resolution images, W (x) is a weight matrix for bicubic interpolation, which can be based on I_lrAnd I_hrcalculating the distance x of the middle corresponding pixel point:

Since the image data is fed into the neural network for training, the low resolution image I obtained by down-sampling is required_lrAnd carrying out data normalization processing on the high-resolution image to obtain a normalized image matrix I_lrb:I_lrb＝I_lr/255，I_hrb:I_hrb＝I_hrAnd/255, randomly slicing the low resolution image and the corresponding high resolution image, wherein in our embodiment, all the low resolution image slices are set to 22 × 22, and the corresponding high resolution image is also sliced into small blocks of a predetermined size according to the magnification, for example, the high resolution image slices are 44 × 44 in the case of 2-magnification reconstruction, and 88 × 88 in the case of 4-magnification reconstruction. And finally, the manufactured cut blocks of the low-resolution images are used for inputting the cascade residual error network, the cut blocks of the high-resolution images are used as labels of the network, and the manufactured training set is used for finishing the training of the neural network.

2 generating network and dense connectivity Module (DCB) Structure analysis and training Process

The basic framework of the generated network is the same as RDN [14], namely the first two layers are shallow feature extraction layers, and the size and the number of cores are (3, 64); the middle is a feature extraction layer which is composed of 5 DCB modules, the output of each module is sent to a concat [29] layer, and the concat [29] layer is followed by a bottleneck layer with the core size and the number of (1, 64); then, making a residual error between the output of the bottleneck layer and the output of the first layer; and finally, an upper sampling layer is formed, and the size, the step length and the number of the cores are (6,2,2 and 3).

Let us now explain the details of the DCB block. As shown in fig. 4. Our DCB blocks each contain four convolutional layers (Conv1, 2,3,4) and one bottleneck layer (Conv 5). After each convolutional layer, there is a cascading operation to achieve dense connections in the residual, which means that the output feature maps of all previous convolutional layers are cascaded and fused. The bottleneck layer at the end of the DCB is a local feature fusion layer used for fusing a large number of feature maps.

for each layer setup in the DCB, we set the convolution kernel size of all four convolutional layers to 3 × 3, and finally the kernel size of the bottleneck layer to 1 × 1. Suppose the input and output of the D-th DCB block are D, respectively_d-1And D_dThen we can express the relationship between them as follows. First, D_cdenoted as the 4 th concat [29]]Output of the layer:

D_c＝f_cat4f_cr4(f_cat3f_cr3(f_cat2f_cr2(f_cat1f_cr1(D_d-1)))) (1)

Wherein f is_criDenotes the convolution of the i (i ═ 1,2,3,4) th convolution layer with the ReLU layer, the ReLU activation operation, f_catiConcat [29] representing the ith (1,2,3,4) convolutional layer]The cascade operation. Next, we use f_boRepresenting the convolution operation in the bottleneck layer, the output of the DCB can be expressed as:

D_d＝f_bo(D_c) (2)

in effect, the bottleneck layer in DCB is a local feature fusion operation for adaptively fusing D_d-1features of the model and outputs of all convolutional layers in the current model. Through feature fusion, not only are feature mappings of different levels fused, but also the growth rate of the bottleneck layer is set to 64, so that the computational complexity is effectively reduced. We use a 1 x 1 convolutional layer to control the output of information.

3 antagonistic network structure analysis and training process

the GAN can get clearer samples than the traditional network structure. GAN has been studied extensively as soon as it appears, resulting in a large number of excellent networks. The generation of the network herein is based on the more influential DCGAN [28 ].

compared with the original GAN, DCGAN [28] almost completely uses convolution layer to replace the full connection layer, the whole network has no pooling layer and upsampling layer, the long-step convolution is used to replace the upsampling layer, the output of the characteristic layer is normalized together through the normalization layer, the network convergence is accelerated, the training stability is improved, the activation function is adjusted in the discriminator, and the gradient sparseness is prevented. Although DCGAN [28] has a good architecture, it still cannot balance the training process between the generation network and the discrimination network, and there is a case where the training is unstable. The DCGAN [28] based countermeasure network in the model is composed of a volume block, 6 CBL blocks and a Dense connection, the structure of the CBL block is shown in figure five, here, LeakyReLU is used as an activation function delta, the expression of the LeakyReLU is the same as that of the PReLU, only alpha in the formula is not a learnable coefficient but a fixed small constant 0.2, Dense1024 and Dense1 in figure 2 are realized by a volume layer, finally, an output value is obtained through a sigmoid function, the sizes of convolution kernels in the network are all 3 x 3, and padding is all 1.

4 loss function

The loss function is used for measuring the difference of data distribution obtained by the model between real data distribution, and the mean square error function is adopted as the loss function in most models in the image reconstruction field. The objective evaluation index of the result obtained by utilizing the pixel point-based function reconstruction is higher, but the phenomena of high-frequency information loss and over-smoothness exist. This is because the sensitivity of the human eye to errors is not absolute, and the perception results are affected by many factors, such as the human eye being more sensitive to brightness and less concerned about other details. The method improves the loss function and provides a new loss functionThe method is formed by weighted combination of three parts.

The first part is l_image，Is based on the L1 norm loss function of the pixel points. Wherein G isⁱ(x) Representing the i-th low-resolution image X which is input and is obtained after passing through the generator and the resolution is improved, Xⁱn represents the number of images for the corresponding original image. The second part isBased on VGG16[20]content loss function l_VGG[23]，result G obtained by training modelⁱ(x) From the original sharp image XⁱAre fed into pre-trained VGG16[ 20)]In the network of (1), the Euclidean distance, φ, between feature maps obtained by the kth convolutional layer is calculated_k,jRepresents VGG16[20]n represents the total amount of the feature maps output by the kth convolutional layer, and the content loss function can ensure that the contents of the two images are similar. Against loss l_D，Where the loss function is not logarithmic compared to conventional GAN, countering the loss ensures that the generator gets the same distribution as the original data.

5 evaluation of reconstitution Effect

the obtained results are subjectively and objectively evaluated, 15 scorers are selected for subjective quality evaluation of the images to respectively Score reconstruction results obtained by different algorithms on set5, set14 and BSD100, a subjective quality scoring method (MOS: Mean Opinion Score) is adopted to measure the subjective quality of the images, and the scorers need to give a Score between 1 and 5 points to the result obtained by each method, wherein 5 points represent that the images are clear and have good quality; score 1 indicates that the picture is very blurred and severely hampers viewing. A scorer objectively scores 12 versions of models on set5, set14 and BSD100, an objective evaluation index uses a peak signal-to-noise ratio (PSNR) and a Structural Similarity Index (SSIM) as evaluation criteria, the PSNR mainly measures differences among images according to differences among corresponding image pixel points, and table 1 shows all comparison results under 2-fold, 4-fold and 8-fold sampling factors. Table 2 shows the MOS index at four times amplification.

TABLE 1 different algorithms reconstruct the resulting MOS values at × 4 magnification

TABLE 2 average PSNR/SSIM values obtained by carrying out x 2, x 4 and x 8 multiplying power reconstruction on three test sets by various algorithms

From the objective quality evaluation in table 1, it can be seen that the PSNR and SSIM of the recovered picture are somewhat different than those of the CNN-based network, but exceed those of the srna-based network. From table 2 it can be seen that our subjective quality assessment (MOS) exceeded the previous framework.

The countermeasure loss function in the traditional network generation loss function is the Minimum Square Error (MSE), the objective evaluation index of the result obtained by utilizing the pixel point-based function reconstruction is higher, but the phenomena of high-frequency information loss and over-smoothness exist. We use a loss function based on the 1 norm as the content loss function. We present some of the results of subjective and objective quality assessment of different loss functions in table 3.

TABLE 3X 4 subjective and objective evaluation indexes of different loss functions

From fig. 2, it can be seen that L1 has better perceptual quality than the result of MSE reconstruction, and from the result of local magnification, (b), (d), (f), (h), (j), (L) obtains more texture details, which can generate a result subjectively closer to the original image, and the experimental result also proves that the subjective and objective evaluation indexes have certain discrepancy.

We also performed a series of experiments to demonstrate the effectiveness of our proposed SISR framework and the loss function. In fig. 3 we compare some advanced ideas. We only show the 3 4-fold amplified alignments of LapSRN [10], VDSR [5], and SRGAN [11 ]. To better show the effectiveness of our method, we have selected small areas in the picture that are not easily recoverable to zoom in. From fig. 3 we can see that the picture we have reconstructed is clearer at some texture details. For example, the color at the beak and the texture of the beak are clearer relative to the laprn, VDSR recovery. Of course, we also compared other CNN-based methods, such as SRCNN [4 ]. In contrast, our approach generates richer texture details over other advanced approaches.

In order to generate a result which is subjectively more similar to an original image, a super-resolution model based on generation of a confrontation network image is built, and a test result on a public standard data set shows that the algorithm obtains a more vivid result on the super-resolution of a general image, the details of texture, color and the like of the generated image are more in line with the watching habit of human eyes, and the highest MOS value is obtained compared with the traditional super-resolution algorithm based on the convolutional neural network. The continuous game of the discriminator and the generator in the GAN network enables the details of the generated image to be enriched and closer to the real image, but the obtained details cannot be guaranteed to be the details of the real image, and noise generated by the network can be doped, so that the PSNR value is not high, and the algorithm is not recommended to be adopted in the field of medical images, but the algorithm has great application value in most fields of image reconstruction. In the following, intensive research on the antagonistic training mechanism of GAN has been conducted, and it has been expected that a model which is more excellent in subjective and objective performances is obtained.

Reference to the literature

[1]W.Shi,J.Caballero,C.Ledig,X.Zhuang,W.Bai,K.Bhatia,A.Marvao,T.Dawes,D.ORegan,and D.Rueckert.Cardiac image super-resolution with globalcorrespondence using multi-atlas patchmatch[C].In MICCAI,2013.

[2]M.W.Thornton,P.M.Atkinson,and D.a.Holland.Subpixel mapping of rural land cover objects from fine spatial resolution satellite sensorimagery using super-resolution pixel-swapping[J].International Journal of Remote Sensing,27(3):473–491,2006.1

[3]W.Zou and P.C.Yuen.Very low resolution face recognition problem[J].IEEE Transactions on image processing,21(1):327–340,2012.1

[4]J.Kim,J.K.Lee,and K.M.Lee.Accurate image super resolution using very deep convolutional networks[C].In CVPR,2016.1,2,3,5,6,7,8

[5]C.Dong,C.C.Loy,K.He and X.Tang,"Image Super-Resolution Using Deep Convolutional Networks,"in IEEE Transactions on Pattern Analysis and MachineIntelligence,vol.38,no.2,pp.295-307,Feb.2016.

[6]Kim J,Lee J K,Lee KM.Deeply-Recursive Convolutional Network for Image Super-Resolution[C].2016IEEE Conference on Computer Vision and Pattern Recognition(CVPR),pp.1637-1645,June.2016.

[7]Kim J,Lee J K,Lee K M.Accurate image super-resolution using very deep convolutional networks[C].2016IEEE Conference on Computer Vision and Pattern Recognition(CVPR),pp.1646-1654,June.2016.

[8]Tai Y,Yang J,Liu X.Image Super-Resolution via Deep Recursive Residual Network[C]//2017IEEE Conference on Computer Vision and Pattern Recognition(CVPR).IEEE Computer Society,pp.2790-2798,July.2017.

[9]Shi W,Caballero J,Huszár,Ferenc,et al.Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional NeuralNetwork[J].2016.

[10]Lai W S,Huang J B,Ahuja N,et al.Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution[C]//IEEE Conference on Computer Vision&Pattern Recognition.pp.5835-5843,July.2017.

[11]Ledig C,Theis L,Huszar F,et al.Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network[C].2017IEEE Conference on Computer Vision and Pattern Recognition(CVPR).pp.105-114,July.2017.

[12]Wu B,Duan H,Liu Z,et al.SRPGAN:Perceptual Generative Adversarial Network for Single Image Super Resolution.arXiv:1712.05927v2[cs.CV].pp.1-9,Dec.2017.

[13]Wang X,Yu K,Wu S,et al.ESRGAN:Enhanced Super-Resolution Generative Adversarial Networks.arXiv:1809.00219v2[cs.CV].pp.1-23,Sep.2018.

[14]Zhang Y,Tian Y,Kong Y,et al.Residual Dense Network for Image Super-Resolution[C].2018IEEE/CVF Conference on Computer Vision and Pattern Recognition.pp.2472-2481,June.2018.

[15]Simonyan K,Zisserman A.Very Deep Convolutional Networks for Large-Scale Image Recognition.arXiv:1409.1556v6[cs.CV].pp.1-14,Apr.2015.

[16]H.Chang,D.-Y.Yeung,and Y.Xiong.Super-resolution through neighbor embedding[C].In CVPR,2004.1,8

[17]C.G.Marco Bevilacqua,Aline Roumy and M.-L.A.Morel.Low-complexity single-image super-resolution based on nonnegative neighbor embedding[C].In BMVC,2012.1,2,6,8

[18]D.Glasner,S.Bagon,and M.Irani.Super-resolution from a single image[C].In ICCV,2009.1

[19]J.Yang,J.Wright,T.Huang,and Y.Ma.Image super resolution via sparse representation[J].IEEE Transactions on image processing,19(11):2861–2873,2010.1,5

[20]R.Zeyde,M.Elad,and M.Protter.On single image scale-up using sparse-representations[J].In Curves and Surfaces,pages 711–730.Springer,2012.1,5,8

[21]E.Perez-Pellitero,J.Salvador,J.Ruiz-Hidalgo,and B.Rosenhahn.PSyCo:Manifold span reduction for super resolution[C].In CVPR,2016.1,6,7,8

[22]S.Schulter,C.Leistner,and H.Bischof.Fast and accurate image upscaling with super-resolution forests[C].In CVPR,2015.1,5,6,7,8

[23]R.Timofte,V.D.Smet,and L.V.Gool.A+:Adjusted anchored neighborhood regression for fast super-resolution[C].In ACCV,2014.1,8

[24]http://cvlab.postech.ac.kr/～mooyeol/pascal_voc_2012/

[25]C.M.Bevilacqua,A.Roumy,and M.Morel.Low-complexity single image super-resolution based on non negative neighbor embedding[C].British Machine Vision Conference,2012

[26]R.Zeyde,M.Elad,M.Protter,On single image scale-up using sparse-representations[C].International conference on curves and surfaces.Springer,2010:711-730.

[27]D.Martin,C.Fowlkes,D.Tal,and J.Malik.Adatabase of human segmented natural images and its application to evaluating segmentation algorithms andmeasuring ecological statistics[C].In ICCV,2001.5。

[28]Radford,Alec,L.Metz,and S.Chintala."Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks."Computer Science(2015).

[29]Ronneberger O,Fischer P,Brox T.U-Net:Convolutional Networks for Biomedical Image Segmentation[J].2015。

Claims

1. A method for reconstructing a super-resolution single image of an anti-network generated by dense connection is characterized by comprising a generation network and an anti-network, wherein the generation network adopts a basic frame of a residual dense network RDN, 5 dense connection block DCB blocks are used as basic modules, the anti-network adopts deep convolution to generate a network frame of an anti-network DCGAN discriminator, a low-resolution image is used as an input and sent to the generation network to be processed, the obtained output is sent to the anti-network for judgment, the judgment result is fed back to the generation network through a loss function, the process is repeated until the anti-network is judged to be qualified, the generation network can generate a clear image, and then the trained generation network is used for completing the reconstruction of the super-resolution of the low-resolution image.

2. The method for reconstructing the super-resolution of the single image of the dense connection generation countermeasure network as claimed in claim 1, wherein the training set is manufactured and data is preprocessed:

I_lr＝W(x)*I_hr

3. The method for reconstructing the super-resolution of the single image of the dense connection generation countermeasure network as claimed in claim 1, wherein the first two layers of the network basic framework are shallow feature extraction layers, and the kernel size and number are (3, 64); the middle is a feature extraction layer which consists of 5 DCB modules, the output of each module is sent to a concat (connected layer) layer, and the concat layer is followed by a bottleneck layer with the core size and the number of (1, 64); then, making a residual error between the output of the bottleneck layer and the output of the first layer; and finally, an upper sampling layer is formed, and the size, the step length and the number of the cores are (6,2,2 and 3).

4. The super-resolution reconstruction method for the dense connection generation countermeasure network single image as claimed in claim 3, wherein each DCB block comprises four convolution layers Conv1, 2,3,4 and a bottleneck layer Conv5, after each convolution layer, there is a cascade operation to realize the dense connection in the residual error, and the bottleneck layer at the end of DCB is a local feature fusion layer for fusing a large number of feature maps;

The convolution kernel size of the four convolutional layers in the DCB is set to 3 × 3, the kernel size of the last bottleneck layer is set to 1 × 1, and it is assumed that the input and output of the D-th DCB block are D respectively_d-1And D_d，D_cDenoted as the 4 th concat [29]]The output of the layer, then:

D_c＝f_cat4f_cr4(f_cat3f_cr3(f_cat2f_cr2(f_cat1f_cr1(D_d-1)))) (1)

D_d＝f_bo(D_c)

(2)

5. The super-resolution reconstruction method for the Dense connection generation countermeasure network single image as claimed in claim 3, characterized in that a long-step convolution is used to replace an upsampling layer in a Deep generation countermeasure network DCGAN (Deep probabilistic generic adaptive Networks), a normalization layer normalizes outputs of feature layers together, an activation function is adjusted in a discriminator to prevent gradient sparseness, the countermeasure network based on DCGAN is composed of a convolution block, 6 CBL blocks and a Dense connection, a LeakyReLU is used in a CBL block as an activation function δ, a full connection layer density 1024 with an output of 1024 and a full connection layer density 1 with an output of 1 are realized by convolution layers, and finally an output value is obtained through a sigmoid function, the sizes of convolution kernels in the network are all 3 x 3, and the fillings are all 1;

loss functionThe method is formed by weighted combination of three parts:

The first part is l_imageIs based on the L1 norm loss function of the pixel points,Wherein G isⁱ(x) Representing the i-th low-resolution image X which is input and is obtained after passing through the generator and the resolution is improved, XⁱN represents the number of images for the corresponding original image; convolutional neural network VGG16[20]Content loss function lVGG [23 ]]，Result G obtained by training modelⁱ(x) From the original sharp image XⁱAre fed into pre-trained VGG16[ 20)]In the network of (1), the Euclidean distance, φ, between feature maps obtained by the kth convolutional layer is calculated_k,jrepresents VGG16[20]n represents the total amount of the feature maps output by the kth convolutional layer, and the content loss function can ensure that the contents of the two images are similar. Against loss l_D，