CN114841856A

CN114841856A - Image super-pixel reconstruction method of dense connection network based on depth residual channel space attention

Info

Publication number: CN114841856A
Application number: CN202210214578.1A
Authority: CN
Inventors: 潘杰; 王旭
Original assignee: China University of Mining and Technology CUMT
Current assignee: China University of Mining and Technology CUMT
Priority date: 2022-03-07
Filing date: 2022-03-07
Publication date: 2022-08-02

Abstract

The invention discloses an image super-pixel reconstruction method of a dense connection network based on depth residual channel space attention, which comprises the following steps: inputting a low-resolution image to be reconstructed; establishing a depth network model for super-resolution reconstruction of the image; forward propagation of data; initial features extracted from an input image; obtaining a weight output by using a channel space balance attention module; the low resolution image is refined by upsampling blocks composed of convolutional layers and pixel blending layers to generate a high resolution image similar to the desired target. The method comprehensively considers the problem of improving the information transmission of the convolutional neural network by using the jump connection between layers, considers the interdependency relation between a characteristic channel and a space and selectively emphasizes information characteristics, so that the characteristics jointly act on a network model, and the superpixel reconstruction capability of the model is improved.

Description

Image super-pixel reconstruction method of dense connection network based on depth residual channel space attention

Technical Field

The invention belongs to the field of deep learning, and particularly relates to an image super-pixel reconstruction method of a dense connection network based on depth residual channel space attention.

Background

The purpose of the image super-resolution reconstruction problem is to recover a high-resolution picture from a corresponding low-resolution picture, which is an important branch in the field of computer vision. The method has wide application in practical engineering, such as medical images, monitoring and satellite images and the like. However, image super-resolution is again an indeterminate solution problem, since a low resolution image always has the problem of multiple corresponding high resolution image solutions. With the rapid development of deep learning, the super-resolution network based on deep learning also achieves remarkable effect.

To address the problem of super-resolution ambiguity, many earlier scholars proposed a variety of super-resolution reconstruction methods, including interpolation-based methods, reconstruction-based methods, and learning-based methods. The interpolation-based method is mainly based on the assumption of continuity of images, and because no additional information is introduced, edges and contours of a reconstruction result are generally fuzzy, and textures cannot be well repaired. The high-resolution image block reconstructed by the learning-based method only comes from one nearest sample in the training set, and the expression capability of the model is very limited, so that the quality of the reconstructed image is limited to a great extent. The reconstruction-based method realizes the restoration of the high-resolution image based on the bottom layer characteristics of the image, and the reconstruction effect is limited to a great extent due to the limited expression capability of the bottom layer characteristics on the edges, the contours and the textures of the high-resolution image.

With the rapid development of the deep learning technology GPU, the method can be widely applied, and the image super-resolution reconstruction scope has a new opportunity. In recent years, the image super-resolution reconstruction integrates the deep learning idea, and a great result is achieved. In 2014, an SRCNN image super-resolution reconstruction network containing three layers of convolutions is provided, an image SR thought based on sparse coding is used for reference, three steps of characteristics, characteristic nonlinear mapping and reconstruction process of an image block are completed through convolution operation, convolution operation is used for replacing traditional artificial characteristic extraction, end-to-end mapping between a low-resolution image and a high-resolution image can be directly learned, the network is greatly improved compared with a traditional method, the network is simple in structure and high in training speed, any priori knowledge is not utilized, the network is only suitable for super-resolution reconstruction of a single amplification factor, the network scale is too small, the structure is simple, the pixel range which can be used in the learning process is too small, and therefore few characteristic extraction is caused, and the reconstruction capability is limited. A Deep Recursive Convolutional Network (DRCN) for image super resolution is proposed, which represents a significant advance over the SRCNN. In order to further improve the visual quality of the reconstructed images, a generative pairwise network (GAN) is introduced into the super-resolution reconstruction field, such as SRGAN, ESRGAN, Bi-GANs-ST, etc., the GAN generator network is used to generate high-resolution images, and an additional discriminator is defined, which is used to determine whether the input image is the super-resolution image generated by the generator or the true high-resolution image, i.e. the discriminator is used to predict the probability that the super-resolution image is more true than the true image. However, since the SRGAN is derived from a general convolution method and the same processing method is applied to the extracted image information, the calculation efficiency is low and a false texture inconsistent with the original image is easily generated.

In view of the fact that the channel attention mechanism can calculate channel weights from 0 to 1, has a scaling layer with a scaling factor which can be trained, and successfully improves the performance of ResNet, VGG16 and inclusion on classification tasks, the channel attention mechanism can be well applied to a super-resolution reconstruction network, and the channel attention mechanism is used for residual-residual learning and achieves a certain effect.

In recent years, a combined ResNet and DenseNet method has been used for super-resolution reconstruction of single images. Such as a Residual Dense Network (RDN) composed of Residual Dense Blocks (RDBs), etc. The residual error dense connection network (DRNet) is similar to RDN, and is formed by the network architecture residual error dense connection blocks. The residual-residual dense module (RRDB) in the ESRGAN architecture also combines residual learning and dense concatenation. However, these network architectures combining the two do not perform well in image reconstruction compared to a Residual Channel Attention Network (RCAN) that uses only a residual network.

The innovation of super-resolution reconstruction network design based on deep learning in general is based on the following aspects: (1) innovations to network architecture; (2) innovations based on loss functions; (3) innovation based on training strategy and principle. However, it is difficult to achieve good performance in terms of peak signal-to-noise ratio (PSNR) and perceptual quality by simply superimposing the trainable network layer, and thus it is still a challenge how to utilize a deeper network for the benefit of effect improvement and the problem of avoiding gradient explosion or gradient disappearance caused by the network being too deep.

In the research, the depth of the network is increased by using a residual error network, so that the trainable layer of the network reaches hundreds of layers, the trainable layer is not influenced by the problem of gradient disappearance or gradient explosion, and the ideal effect is achieved. Most super-resolution reconstruction methods based on convolutional neural networks lack the capability of processing information across characteristic channels and spaces, so that the representativeness of the performance of deep networks is hindered, therefore, the research proposes a residual channel spatial attention mechanism to establish a very deep trainable network, and the weight of each characteristic channel is adapted by modeling the interdependency relationship between the characteristic channels, and the attention mechanism can enable the network to concentrate on more useful channel spatial information, so that the capability of distinguishing learning is enhanced. In order to enable the deep network to play a larger effect and reduce the defects brought by network spirit, a residual error structure is introduced in the research, the method is different from the previous method that residual error connection is intensive connection between convolutions, the method proposed by the research is a tightly-connected residual error group, the method can fully utilize residual error blocks in the residual error group to update characteristic values, and the characteristic values are intermittently refined through the intensive connection between the residual error groups.

In the process of facing peak signal to noise ratio (PSNR), the removal of the BN layer is considered to improve the network performance, reduce the computational complexity and improve the generalization capability of the network. Meanwhile, a PReLU nonlinear activation function with stronger classification capability is used for replacing a piecewise linear function ReLU to obtain a better feature extraction effect, so that a super-resolution image with higher quality is obtained during reconstruction.

Disclosure of Invention

Aiming at the problems that the network performance is reduced due to excessive channel dimension caused by the increase of the number of dense connections in the existing super-resolution reconstruction method, the image super-pixel reconstruction method of the dense connection network based on the depth residual channel space attention is provided, and the problem that a low-resolution picture is reconstructed into a high-resolution picture is mainly solved.

The invention aims to solve the problems that the method comprises the following steps:

a super-pixel reconstruction method for an image based on a dense connection network of depth residual channel spatial attention comprises the following steps:

(1) initialization: obtaining a low-resolution picture by performing double-triple downsampling on an original high-resolution image, and preparing a training data set by performing one or more geometric transformations;

(2) inputting a low-resolution picture to be reconstructed: low resolution picture data X _LR Inputting a network, and inputting preset parameters: initial learning rate lr and adam optimizer coefficients β ₁ ,β ₂ And ε;

(3) establishing a deep network model: the model is established based on a deep convolutional neural network and comprises a feature extraction network formed by convolutional layers, a channel space attention mechanism network for feature nonlinear mapping and a reconstruction network;

(4) forward propagation of data: the feature extraction mapping network carries out dense extraction on the image features f, and then the respective domain features are propagated to a higher layer of the network in two paths: one path is transmitted to a channel space attention network for feature extraction and mapping, the other path is connected through short jump for feature information transmission, and finally, the mapped features are recombined through an upper sampling layer to form a high-resolution image;

(5) constructing a loss function: establishing a loss function for updating the parameters of the deep network model;

(5.1) calculating a function L (theta) for evaluating the perception quality difference between different images between an original high-resolution image and an image obtained by super-pixel reconstruction of a low-resolution image;

(5.2) calculating peak signal-to-noise ratio (PSNR);

(5.3) calculating a structural similarity index SSIM between the measurement images;

(6) and (3) a model training process: presetting a proper batch size value, and updating parameters by using a mini-batch random gradient descent method; and extracting features through a feature extractor of a back propagation training model, and performing an upsampling layer on feature values and a channel space attention mechanism network for feature nonlinear mapping.

Further, the initialization process in the step (1) refers to down-sampling the real pictures in the used data set to the required low-resolution pictures by a bicubic down-sampling method, and enriching the original data set by random cropping, rotation and flipping; the low resolution picture is double and quadruple down sampled from the real picture in the data set.

Further, the specific values of the parameters in the step (2) are as follows: initial learning rate lr is 10 ^-4 And after each 200 rounds of training, the distance is reduced to one half of the original distance; adam optimizer coefficient beta ₁ ＝0.9,β ₂ ＝0.999，ε＝10 ^-8 。

Further, the feature extraction mapping network in the step (4) is a channel space attention mechanism module, and the mechanism is divided into an average pooling channel attention mechanism and a maximum pooling space attention mechanism.

Further, the purpose of the channel attention mechanism is to assign greater weight to more important channels in the input feature map; for the single image super-resolution reconstruction task, the high-frequency details of the high-resolution image are recovered as much as possible, the channel with more important information is searched from the input feature map by using the attention of the channel,

in H-W dimensional channels, extracting channel information by using average pooling, and passing the information through a multilayer perceptron consisting of two point-by-point convolution layers; the average channel attention module formula ACAM is as follows:

wherein

Representing a convolutional layer with a convolution kernel of 1 x 1, the number of input channels being n and the number of output channels being

Where r represents the scaling factor between the input channel and the output channel.

Further, the maximal pooling spatial attention mechanism generates weights for the horizontal part of the input feature vector in order to find the lateral region that contributes most to the high resolution reconstruction and to apply more weight to this region, these regions contain high frequency details in the form of extrema in the channel, thus using maximal pooling; maximum pooling spatial attention module formula MSAM:

where PReLU and Sigmoid are defined as:

wherein a is a constant coefficient;

the channel space attention mechanism formula CSAM:

CSAM(x)＝ACAM(x)·MSAM(x)×x (5)

wherein x is an input feature map; since the outputs of ACAM and MSAM have different dimensions, they are fused using a broadcast multiplication mechanism and then multiplied by the input feature map using an element multiplication mechanism to get the final attention mechanism result.

Further, the expression of the perceptual loss function L (θ) for evaluating the perceptual quality difference between different images in the step (5.1) is:

wherein I _LR (I, j, k) and I _HR (i, j, k) are the input low resolution image and the original high resolution image, H, respectively _DRCSA Is the network function of the generated model, h and w are the height and width of the high-resolution image, 3 is the channel number of the image, | survival ₁ Is the L (θ) norm.

Further, the calculation formula for calculating the peak signal-to-noise ratio PSNR in the step (5.2) is:

wherein x _max Is the maximum pixel value in the reference image, the value of the minimizing perceptual loss function L (θ) is equivalent to the value of maximizing the peak signal-to-noise ratio PSNR, | survival ₂ Representing a 2-norm.

Further, in the step (5.3), the structural similarity index SSIM between the quantitative images is determined by the brightness, contrast and structure between the images, and the calculation formula is as follows:

SSIM(LR,HR)＝l(LR,HR) ^α ·c(LR,HR) ^β ·s(LR,HR) ^γ (8)

where α ═ β ═ γ ═ 1, l, c, and s are calculated by the following formulas:

wherein c is ₁ ＝(k ₁ L) ² ,c ₂ ＝(k ₂ L) ² ,c ₃ ＝c ₂ 2, usually L ═ 2 ⁸ -1，k ₁ ＝0.01，k ₂ ＝0.03,μ _LR And mu _HR And σ _LR And σ _HR The mean, variance, and covariance of the super-resolution processed image result and the high-resolution ground-proximity real image are respectively expressed, and calculated by the following equations:

μ _LR ＝f _mean (H(I _LR )) (12)

μ _HR ＝f _mean (I _HR ) (13)

σ _LHR ＝f _var (H(I _LR )*I _HR ) (16)

wherein f is _mean And f _var The mean and variance of the image result after super-resolution processing and the high-resolution ground-proximity real image are respectively represented.

Further, the batchsize in step (6) is set to 16, the reconstruction part uses upsampled layers, double-reconstructing the image uses one upsampled layer, and quadruple-reconstructing the image uses two upsampled layers.

Has the advantages that:

the invention comprehensively considers the factors of image characteristic information extraction and characteristic nonlinear mapping information, combines a dense connection network and a residual learning network, and simultaneously embeds a deep neural network together with an average pooling channel attention mechanism and a maximum pooling space attention mechanism. On one hand, the characteristic values are subjected to addition refinement and splicing storage through a dense connection network and a residual learning network, on the other hand, the crosstalk of error accumulation and the adverse effect of redundant information are reduced through the parallel connection of an average pooling channel attention mechanism and a maximum pooling space attention mechanism, the structure gives consideration to the extraction and fusion of the characteristic information, the adverse effect caused by information errors and redundancy is reduced, and a better super-resolution reconstruction effect can be realized.

The method integrates the factors of image characteristic information extraction and characteristic nonlinear mapping information, and gives consideration to the characteristics of an average pooling channel attention mechanism and a maximum pooling space attention mechanism. By combining the dense connection network and the residual learning network, and embedding the average pooling channel attention mechanism and the maximum pooling spatial attention mechanism into the deep neural network, the addition refinement and splicing storage of characteristic values and the reduction of characteristic information difference and redundancy are realized to reduce the adverse effects brought by the characteristic values, thereby playing a role of capability complementation and improving the accuracy of super-resolution reconstruction. The effectiveness of the super-resolution reconstruction method of the proposed model is proved by experiments on sets 5, Set14, BSD100, Urban100 and other data sets.

Drawings

Fig. 1 is a diagram of a network system structure according to the method of the present invention.

FIG. 2 is a visual analysis of the method of the present invention (Scale 2).

Fig. 3 is a visualization analysis diagram (Scale 4) of the method of the present invention.

Detailed Description

The invention is further described with reference to the following drawings and specific embodiments.

The super-resolution reconstruction of the single image is one of the most challenging problems in computer vision tasks, the super-resolution reconstruction method based on the convolutional neural network has a very good effect in various image applications, the purpose of the super-resolution reconstruction method is to reconstruct a high-resolution image from a corresponding low-resolution image, the early super-resolution reconstruction method of the single image is mainly based on the traditional interpolation algorithm, but the traditional image algorithm is generally difficult to recover high-frequency detail information of the image, so that the reconstructed image is fuzzy, complex in calculation and low in real-time performance, and therefore in recent years, with the successful application of a deep learning algorithm in the super-resolution reconstruction of the image, many super-resolution algorithms based on the convolutional neural network have gradually become mainstream algorithms. The method comprises the steps of comprehensively considering picture characteristic information extraction and characteristic nonlinear mapping information factors, combining a dense connection network and a residual learning network, and simultaneously embedding a deep neural network together with an average pooling channel attention mechanism and a maximum pooling space attention mechanism. On one hand, the characteristic values are subjected to addition refinement and splicing storage through a dense connection network and a residual learning network, on the other hand, the crosstalk of error accumulation and the adverse effect of redundant information are reduced through the parallel connection of an average pooling channel attention mechanism and a maximum pooling space attention mechanism, the structure gives consideration to the extraction and fusion of the characteristic information, the adverse effect caused by information errors and redundancy is reduced, and a better super-resolution reconstruction effect can be realized.

The method comprises the following steps:

(1) initialization: preparing a training data set by carrying out bicubic downsampling and one or more geometric transformations (random cropping, rotation and overturning) on an original high-resolution image, namely downsampling real pictures in the used data set to required low-resolution pictures (two times and four times) by using a bicubic downsampling method, and enriching the original data set by carrying out random cropping, rotation and overturning;

(2) inputting a low-resolution picture to be reconstructed: low resolution picture data X _LR Inputting a network, and inputting a preset initial learning rate lr and adam optimizer coefficients beta 1, beta 2 and epsilon, wherein the specific parameters are as follows: initial learning rate lr is 10 ^-4 And every 200 rounds of reduction is one half of the original, adam optimizer coefficient beta ₁ ＝0.9, β ₂ ＝0.999，ε＝10 ^-8 ；

(4) forward propagation of data: the feature extraction mapping network carries out dense extraction on the image features f, and then the respective domain features are propagated to a higher layer of the network in two paths: one path is transmitted to a channel space attention network for feature extraction and mapping, the other path is connected through short jump for feature information transmission, and finally mapped features are recombined through an upper sampling layer to form a high-resolution image. The purpose of the channel attention mechanism is to assign greater weight to more important channels in the input feature map. For the task of single image super-resolution reconstruction, we need to recover as much high-frequency details of the high-resolution image as possible, we need to use channel attention to find channels with more important information from the input feature map, and in H x W dimension channels, there is a high possibility that some abnormal extreme values are contained, so we only use average pooling to extract channel information, and then pass the information through a multi-layer perceptron composed of two point-by-point convolution layers. The average channel attention module formula (ACAM) is as follows:

wherein

The maximum pooling spatial attention mechanism generates weights for the horizontal part of the input feature vector in order to find the lateral region that can contribute most to the high resolution reconstruction and to apply more weight to this region, which contains high frequency details in the form of extrema in the channel, thus using maximum pooling. Maximum pooling spatial attention Module formula (MSAM):

where PReLU and Sigmoid may be defined as:

channel space attention system formula (CSAM):

CSAM(x)＝ACAM(x)·MSAM(x)×x (5)

where x is the input feature map and a is a constant coefficient. Since the output of ACAM and MSAM have different dimensions, we fuse them by using broadcast multiplication mechanism, then multiply them by the input feature map by using element multiplication mechanism, thus get the final result of attention mechanism. (ii) a

(5.1) calculating a function L (Θ) of the difference between the real reference image and the image after the superpixel reconstruction, the expression of L (Θ) being:

wherein I _LR (I, j, k) and I _HR (i, j, k) are input low resolution image and target high resolution image, H, respectively _DRCSA Is the network function of the generated model, h and w are the height and width of the high-resolution image, 3 is the channel number of the image, | survival ₁ Is the L (Θ) norm.

(5.2) calculating a peak signal-to-noise ratio (PSNR), wherein the calculation formula of the peak signal-to-noise ratio (PSNR) is as follows:

wherein x _max Is the maximum pixel value in the reference image, the value of the minimized perceptual loss function L (Θ) is equivalent to the value of the maximized peak signal-to-noise ratio PSNR; (| ventilation) ₂ Representing a 2-norm.

(5.3) calculating a structural similarity index SSIM between the measurement images, wherein the structural similarity index SSIM between the measurement images, which is determined by the brightness, the contrast and the structure between the images together, has the following calculation formula:

SSIM(LR,HR)＝l(LR,HR) ^α ·c(LR,HR) ^β ·s(LR,HR) ^γ (8)

where α ═ β ═ γ ═ 1, l, c, and s can be calculated from the following equations:

wherein c is ₁ ＝(k ₁ L) ² ,c ₂ ＝(k ₂ L) ² ,c ₃ ＝c ₂ 2, usually L ═ 2 ⁸ -1，k ₁ ＝0.01，k ₂ ＝0.03,μ _LR And mu _HR And σ _LR And σ _HR The mean, variance, and covariance respectively representing the super-resolution processed image result and the high-resolution near-earth real image can be calculated by the following formulas:

μ _LR ＝f _mean (H(I _LR )) (12)

μ _HR ＝f _mean (I _HR ) (13)

σ _LHR ＝f _var (H(I _LR )*I _HR ) (16)

(6) And (3) a model training process: presetting the batch size value to be 16, and updating the parameters by using a mini-batch random gradient descent method; extracting features through a feature extractor of a back propagation training model, and carrying out characteristic value pair and an upper sampling layer for reconstructing a network on a channel space attention mechanism network for feature nonlinear mapping; the reconstruction part uses an upsampled layer, double reconstruction of the image uses one upsampled layer, and quadruple reconstruction uses two upsampled layers.

(7) Preparation of the experiment: the training set used herein is the DIV2K dataset, and after training, the trained model.pt model is output, and then testing is performed on four standard datasets: set5, Set14, BSD100, and Urban 100. For all depth network models, downsampled low-resolution image data is used as network input, and a high-resolution image reconstructed from the low resolution through the proposed deep learning network is used as model input.

(8) The data sets were as follows:

(8.1) DIV2K is an open data set commonly used and trained in super-resolution reconstruction, the total number of pictures is 1000, and the pictures are high-definition pictures with 2K resolution, and 900 of the pictures are used, wherein 800 of the pictures are used as a training set, 100 of the pictures are used as a test set, and each pair of the test sets is used for testing a training model.

The (8.2) Set5 dataset is a published data test Set used by m.bevilacqua, a.roumy et al for the first time in the article Low-Complexity Single-Image Super-Resolution based on non-organic Neighbor Embedding, the test Set has five pictures, namely, five pictures of bay, Bird, Butterfly, Head and Woman, the five pictures are respectively downsampled by using bicubic downsampling to obtain picture Resolution of 1/2 and 1/4, and the reconstructed result is compared with the original Image.

(8.3) Set14 is a more challenging data test Set in super-resolution reconstruction, which was proposed in On Single Image Scale-Up space-retrieval by r.zeyde, m.elad, m.pattern in 2010, and there are 14 total pictures, namely, 14 pictures of baboon, barrara, bridge, coast, comb, face, flowers, foreman, lena, man, monarch, peper, ppt3, and zebra, respectively, among which there are three Single-channel pictures, and compared with the Set5 data Set, the pictures in the Set14 data Set are more complex and diverse in form, and the description of the model effect On the test result generated by the model is more convincing.

(8.4) BSD100 is another classical test set of data, which was proposed by David Martin et al in 2001 and which is much larger than the first two data sets. The method comprises 100 test images in total, comprises wider images, and relates to more complicated situations from natural landscapes to specific objects such as animals, plants, buildings and the like, so that good effects are expected to be obtained, the quality requirement on the super-resolution reconstructed deep network model is higher, and the test result proves the effect of the model more fully.

(8.5) Urban100 dataset is the public test set of data first used by J. -B.Huang, et al in Single image super resolution from transformed selected-displays in 2015. The system consists of 100 clean and tidy urban environments such as high buildings, glass curtain walls, libraries, urban bridges, large screens and the like, and a series of complex scene high-resolution pictures influenced by various interference factors such as light brightness, shooting angles, weather, different time periods and the like, and has various complex and changeable modes and high dissimilarity.

(9) The comparison method comprises the following steps: selecting a representative classical super-resolution reconstruction method for comparison: double cubic interpolation (Bicubic), extremely Deep Super-Resolution reconstruction Convolutional neural Network (Very Deep CNN for SR, VDSR), Laplacian Pyramid Network (Laplacian Pyramid Network for SR, LapSRN), Super-Resolution Convolutional neural Network (Super-Resolution CNN, SRCNN), Fast Super-Resolution by CNN, FSRCNN), Persistent Super-Resolution Memory Network (Deep Persistent Memory Network, MemNet), multiple degeneration Super-Resolution Convolutional neural Network (a single Convolutional Network for multiple resolutions, drmd), Super-Resolution Recursive Deep Convolutional Network (DRCN), Super-Resolution Encoder-Decoder Network (RED). Wherein the bicubic interpolation method is mainly based on the assumption of continuity of the image. VDSR learns the high and low resolution residuals directly. The LapSRN reconstructs and restores the low-resolution image by using residual learning and progressive amplification methods. The SRCNN completes the extraction of the feature map, the nonlinear mapping and the reconstruction of the feature map through convolution operation to realize the reconstruction process from a low-resolution image to a high-resolution image. The FSRCNN extracts features by directly sending low-resolution images into a network model and learns a feature map of a high-resolution image, changes feature dimensions by using a smaller convolution kernel and more mapping layers, and realizes multi-channel nonlinear mapping and feature fusion. MenNet is a deep persistent memory network with memory blocks composed of recursive units and gating units as basic units. SRMDNF concatenates the low resolution image and the degradation map as input, then performs nonlinear mapping using 3 x 3 concatenated convolutional layers, adding sub-pixel convolutional layers after the last convolutional layer. The DRCN applies a recurrent neural network and residual learning to the super-resolution reconstruction of the image, can reduce network parameters and can relieve the problems of gradient disappearance and explosion by using a recursive supervision method. In RED, each convolution layer and the deconvolution layer are connected by a jumper wire, the convolution layer obtains abstract contents of an image, and the deconvolution layer amplifies the characteristic size and restores the details of the image.

(10) Setting parameters: experiments follow standard evaluation protocols suitable for the unsupervised field, ResNet and AttentionBlock are used as basic frames, all experiments are suitable for the unsupervised field, and the mapping relation between the original image with high resolution and the low-resolution image obtained by downsampling the original image is used for training. In the proposed experimental scheme we use tightly connected residual sets instead of tightly connected convolutional layers, and to prevent data overload at the transition layer junction due to too many dense connections we use 5-6 residual sets to construct the overall network framework. First, the first convolutional layer of the framework is used to extract the initial features of the input low resolution image, which are tightly connected to all residual groups and to the last projection layer, we use a similar connection to most residual-based connection networks, i.e. long-jump connection, and in the upsampling module we use convolutional and pixel shuffling layers to integrate and amplify the extracted features, and for double and quadruple amplification, one and two upsampling blocks, respectively.

The peak signal-to-noise ratio results based on super resolution of AttentionBlock and ResNet on data sets Set5, Set14, BSD100, and Urban100 are shown in table 1. For a fair comparison, all comparative experiments gave results from the original papers on this method. The AttentionBlock and ResNet based method is almost superior to all comparative methods. For quantitative comparison, we use peak signal-to-noise ratio (PSNR) and Structural Similarity Index (SSIM), which are only calculated on the Y channel of YCbCr color space, we perform quantitative comparison of double and quadruple amplification on the proposed method and other classical methods, and since Set5, Set14, and BSD100 are usually composed of natural scenes with small height and width, the latest method represented on these datasets has little performance difference from the classical methods, and in contrast, the performance difference of the latest method from the classical methods on Urban100 is much larger

Comparison of super-resolution reconstruction method and classical method in peak signal to noise ratio (PSNR) index presented in Table 1

The structural similarity results based on super resolution of AttentionBlock and ResNet on the data sets Set5, Set14, BSD100, and Urban100 are shown in table 2. The standard is used as another full reference index as a measure, the reference value of the standard is equivalent to the peak signal-to-noise ratio (PSNR), and the index can be regarded as another index measurement task equivalent to the reference value of the table 1. In Table 2, we compare our proposed method with the above classical method under the Structural Similarity (SSIM) metric,

comparison of super-resolution reconstruction method and classical method in Structural Similarity (SSIM) index provided in Table 2

The method integrates image feature information extraction and feature nonlinear mapping information factors, gives consideration to the characteristics of an average pooling channel attention mechanism and a maximum pooling space attention mechanism, combines a dense connection network and a residual learning network, and embeds a deep neural network together with the average pooling channel attention mechanism and the maximum pooling space attention mechanism. The characteristic values are subjected to addition refinement and splicing storage through a dense connection network and a residual learning network, and the addition refinement and splicing storage of the characteristic values and the reduction of characteristic information difference and redundancy are realized through the parallel connection of an average pooling channel attention mechanism and a maximum pooling space attention mechanism, so that adverse effects brought by the characteristic values are reduced, a capability complementation effect is achieved, and the accuracy of super-resolution reconstruction is improved. The structure gives consideration to the extraction and fusion of the characteristic information, reduces adverse effects caused by information errors and redundancy, and can realize a better super-resolution reconstruction effect.

Claims

1. An image superpixel reconstruction method based on a dense connection network of depth residual channel spatial attention is characterized by comprising the following steps:

(5.2) calculating peak signal-to-noise ratio (PSNR);

2. The method for reconstructing image super-pixel based on dense connection network of depth residual channel spatial attention as claimed in claim 1, wherein the initialization procedure in step (1) means down-sampling the real pictures in the used data set to the required low resolution pictures by a bicubic down-sampling method, and enriching the original data set by random cropping, rotation and flipping; the low resolution picture is double and quadruple down sampled from the real picture in the data set.

3. The method for reconstructing image superpixel based on dense connection network of depth residual channel spatial attention as claimed in claim 1, wherein the specific values of the parameters in step (2) are as follows: initial learning rate lr is 10 ^-4 And after each 200 rounds of training, the distance is reduced to one half of the original distance; adam optimizer coefficient beta ₁ ＝0.9,β ₂ ＝0.999，ε＝10 ^-8 。

4. The method for image superpixel reconstruction based on the dense connection network of depth residual channel space attention as claimed in claim 1, wherein said feature extraction mapping network in step (4) is a channel space attention mechanism module, which is divided into an average pooling channel attention mechanism and a maximum pooling space attention mechanism.

5. The method of image superpixel reconstruction based on dense connection network of depth residual channel spatial attention as claimed in claim 4, characterized in that the purpose of said channel attention mechanism is to assign more weight to more important channels in the input feature map; for the single image super-resolution reconstruction task, the high-frequency details of the high-resolution image are recovered as much as possible, the channel with more important information is searched from the input feature map by using the attention of the channel,

wherein

6. The method of claim 4, wherein the maximum pooling spatial attention mechanism generates weights for the horizontal part of the input feature vector in order to find the lateral region that contributes most to the high resolution reconstruction and apply more weight to this region, which contains high frequency details in the form of extreme values in the channel, thus using maximum pooling; maximum pooling spatial attention module formula MSAM:

where PReLU and Sigmoid are defined as:

wherein a is a constant coefficient;

channel space attention mechanism formula CSAM:

CSAM(x)＝ACAM(x)·MSAM(x)×x (5)

wherein x is an input feature map; as the output of ACAM and MSAM has different dimensions, they are fused by using broadcast multiplication mechanism, then multiplied by the input feature mapping by using element multiplication mechanism, so as to obtain the final result of attention mechanism.

7. The method for reconstructing image superpixel based on dense connection network of depth residual channel spatial attention as claimed in claim 1, wherein said expression of perceptual loss function L (θ) for evaluating perceptual quality difference between different images in said step (5.1) is:

8. The method for image superpixel reconstruction of dense connection network based on depth residual channel spatial attention as claimed in claim 1, characterized in that the calculation formula for calculating peak signal-to-noise ratio PSNR in said step (5.2) is:

9. The method for reconstructing image superpixel based on dense connection network of depth residual channel spatial attention as claimed in claim 1, wherein the structural similarity index SSIM between moderate images in said step (5.3) is determined by the brightness, contrast and structure of images, and the calculation formula is:

SSIM(LR,HR)＝l(LR,HR) ^α ·c(LR,HR) ^β ·s(LR,HR) ^γ (8)

μ _LR ＝f _mean (H(I _LR )) (12)

μ _HR ＝f _mean (I _HR ) (13)

σ _LHR ＝f _var (H(I _LR )*I _HR ) (16)

10. The method for image superpixel reconstruction based on dense connection network of depth residual channel spatial attention as claimed in claim 1, characterized in that batchsize in said step (6) is set to 16, the reconstruction part uses upsampled layer, the image is reconstructed twice using one upsampled layer, and the image is reconstructed four times using two upsampled layers.