CN112200724A

CN112200724A - Single-image super-resolution reconstruction system and method based on feedback mechanism

Info

Publication number: CN112200724A
Application number: CN202011139130.5A
Authority: CN
Inventors: 王进; 吴一鸣; 王柳; 陈泽宇; 陈沅涛; 张经宇
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2020-10-22
Filing date: 2020-10-22
Publication date: 2021-01-08
Anticipated expiration: 2040-10-22
Also published as: CN112200724B

Abstract

The invention discloses a single-image super-resolution reconstruction system and a single-image super-resolution reconstruction method based on a feedback mechanism, wherein the feedback mechanism is adopted, and a first iteration of a low-resolution image is formed through a shallow layer feature extraction module, a first deep layer feature extraction module and a first reconstruction module; according to the invention, the deep feature mapping extracted by the first iteration is refined into the shallow feature mapping of the second iteration through the feature refining module, the second deep feature extraction module and the second reconstruction modeling module, and deeper features of the low-resolution image can be extracted under the condition of not deepening the network depth, so that the training effect of the image network model is improved.

Description

Single-image super-resolution reconstruction system and method based on feedback mechanism

Technical Field

The invention relates to the technical field of computer image super-resolution processing, in particular to a single image super-resolution reconstruction system and method based on a feedback mechanism.

Background

The Single Image Super-Resolution (Single Image Super-Resolution) reconstruction algorithm aims to restore a low-Resolution picture into a high-Resolution Image with good visual effect through a series of algorithms. In fact, single image super resolution is an ill-posed algorithmic problem, i.e., for any low resolution image, there may be an infinite number of high resolution images corresponding thereto. Conventional single image super-resolution methods include interpolation-based methods, reconstruction model-based methods, and learning-based methods. The interpolation-based method utilizes a basis function or an interpolation kernel to approximate lost image high-frequency information, and common interpolation methods include nearest neighbor interpolation, bilinear interpolation, bicubic interpolation algorithm and the like. The method based on the reconstruction model enables the ill-posed problem to become solvable by inputting the priori knowledge of the image into the super-resolution reconstruction process of the image; including kernel estimation methods and sparsity of image gradients as a priori knowledge of the algorithm, etc. The method based on learning learns the mapping relation between the low-resolution images and the high-resolution images through a training image data set to predict the lost high-frequency information in the low-resolution images so as to achieve the aim of reconstructing the high-resolution images; including using machine learning, sparse representation, and coupled dictionary training.

Although the traditional single-image super-resolution method can realize high-resolution image reconstruction, with the increase of the magnification factor, the high-frequency information which can be provided by the artificially defined prior knowledge and the observation model for reconstruction is less and less, so that the traditional method is difficult to break through the reconstruction effect. The Super-Resolution algorithm of a single image based on deep learning is the Super-Resolution reconstruction (SRCNN) method based on a Convolutional Neural network proposed by Dong et al in 2015 at the earliest time, and the model adopts a three-layer convolution structure to realize a reconstruction method from low Resolution to high Resolution and obtain a better reconstruction effect than the traditional method. New algorithms are continuously proposed in the follow-up process, but the current single-image super-resolution algorithm based on deep learning has the following defects: only forward propagation is considered for feature learning, even with RNN networks like DRCN, the transfer from shallow to deep features is feed forward propagation and does not exploit feedback mechanisms that are ubiquitous in the human visual system.

Disclosure of Invention

The present invention is directed to solving at least one of the problems of the prior art. Therefore, the invention provides a single-image super-resolution reconstruction system and method based on a feedback mechanism.

In a first aspect of the present invention, a single-image super-resolution reconstruction system based on a feedback mechanism is provided, which is used for training an image network model, and includes:

the shallow feature extraction module is used for extracting shallow features of the low-resolution image subjected to mean shift segmentation;

a first deep feature extraction module for extracting deep features of the low resolution image from the shallow features;

the first reconstruction module is used for reconstructing the deep features output by the first deep feature extraction module to obtain a first super-resolution image, and the first super-resolution image is used for calculating a loss function of the image network model;

the characteristic refining module is used for carrying out cascade and convolution operations on the shallow characteristic and the deep characteristic extracted by the first deep characteristic extraction module to obtain a refined characteristic;

a second deep feature extraction module for extracting deep features of the low resolution image from the refined features;

and the second reconstruction module is used for reconstructing the deep features output by the second deep feature extraction module to obtain a second super-resolution image, and the second super-resolution image is used for calculating a loss function of the image network model and outputting the loss function as the super-resolution image of the image network model.

According to the embodiment of the invention, at least the following technical effects are achieved:

the system adopts a feedback mechanism, and forms the first iteration of the low-resolution image through a shallow layer feature extraction module, a first deep layer feature extraction module and a first reconstruction module; the system can refine the deep feature mapping extracted by the first iteration into the shallow feature mapping of the second iteration by the aid of the second iteration of the low-resolution image formed by the feature refining module, the second deep feature extraction module and the second reconstruction modeling module, and can extract deeper features of the low-resolution image under the condition of not deepening the network depth, so that the training effect of the image network model is improved.

In a second aspect of the present invention, a method for reconstructing a single-image super-resolution based on a feedback mechanism is provided, which includes the following steps:

the first iteration:

extracting shallow features of the low-resolution image subjected to mean shift segmentation through convolution operation;

extracting deep features of the low-resolution image from the shallow features, and performing first reconstruction on the deep features to obtain a first super-resolution image, wherein the first super-resolution image is used for calculating a loss function of the image network model;

and (3) second iteration:

carrying out cascade and convolution operations on the deep features and the shallow features in the first iteration to obtain refined features;

and extracting deep features of the low-resolution image from the refined features, and performing secondary reconstruction on the deep features to obtain a second super-resolution image, wherein the second super-resolution image is used for calculating a loss function of the image network model and outputting the loss function as a super-resolution image of the image network model.

the method adopts a feedback mechanism, can refine the deep feature mapping extracted by the first iteration into the shallow feature mapping extracted by the second iteration, and can extract deeper features of the low-resolution image under the condition of not deepening the network depth, thereby improving the training effect of the image network model.

In a third aspect of the present invention, an image network model is provided, which uses the feedback mechanism-based single image super-resolution reconstruction system according to the first aspect of the present invention or the feedback mechanism-based single image super-resolution reconstruction method according to the second aspect of the present invention during training.

in the model, the deep-layer extracted feature mapping can refine the shallow feature mapping of the next iteration, and deeper features can be extracted under the condition of not deepening the network depth.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic structural diagram of a single-image super-resolution reconstruction system based on a feedback mechanism according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an implementation of the system of FIG. 1;

fig. 3 is a schematic structural diagram of a Ghost residual dense module according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a Ghost Module according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of an attention module according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a feature refining module provided in an embodiment of the present invention;

fig. 7 is a schematic diagram of an experimental result provided in an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

Referring to fig. 1 to 6, an embodiment of the present invention provides a single image super-resolution reconstruction system based on a feedback mechanism, for training an image network model, including: the shallow layer feature extraction module, the first deep layer feature extraction module, the first reconstruction module, the feature refining module, the second deep layer feature extraction module and the second reconstruction module are specifically as follows:

the shallow feature extraction module is used for extracting shallow features of the low-resolution image after mean shift segmentation.

The red, green and blue color gamut in the low-resolution image is divided into three layers of characteristic channels through mean shift, and the separated characteristic channels are input into a shallow layer characteristic extraction module to be subjected to shallow layer characteristic extraction. In addition, the three layers of characteristic channels which pass through the mean shift are subjected to up-sampling operation by using a bilinear interpolation algorithm, so that preparation is made for subsequent image reconstruction.

As an alternative implementation, the shallow feature extraction module performs 2 convolution operations, where the convolution kernel of the first convolutional layer is 3 × 3, the convolution kernel of the second convolutional layer is 1 × 1, and the number of feature channels of the two convolutional layers is 64, which can be represented by the following formula:

F₀＝f_shallow(I_LR) (1)

wherein, I_LRRepresenting an input low resolution picture, f_shallow(. represents a shallow feature extraction function, F₀Representing a shallow feature extraction module.

The first deep feature extraction module is used for extracting deep features of the low-resolution image from the shallow features.

As an alternative implementation, the first deep feature extraction module may be composed of a module based on RDB (residual dense module) in the conventional deep feature extraction technology field.

As an optional implementation manner, the first deep feature extraction Module is provided with a plurality of Ghost residual error dense modules connected with each other, and each Ghost residual error dense Module comprises a plurality of Ghost modules connected with each other in a dense manner, and is used for extracting deep features of the low-resolution image from the shallow features. For ease of understanding, the Ghost residual dense block is denoted using GRDB, both hereinafter and in the drawings. The GRDB is designed based on a conventional RDB Module, and the GRDB provided in this embodiment uses a Ghost Module to replace a 3 × 3 convolution structure in the RDB Module. As in fig. 3, the primary role of GRDB is to extract edge and texture details in feature mapping, which can be represented by the following equation:

wherein the content of the first and second substances,

respectively, the outputs from the first GRDB to the last GRDB in the first deep feature extraction module. In the present embodiment, m is set to 8 (note that, the GRDB is exemplified by 8, but is not limited to 8). The number of Ghost modules is set to 8 in one GRDB (note that Ghost modules are 8 for example, but not limited to 8). Since 8 Ghost modules are connected densely, i.e. the feature map of the first 7 is input by the join operation, and the final output is a feature map of 64 layers.

As shown in fig. 4, each Ghost Module is composed of a 1x1 convolution and a 3x3 convolution, and aims to remove redundant channels of feature maps, learn discarded redundant channels from retained feature maps by using convolution, and finally join the learned feature maps with the original feature maps to achieve the number of channels in input. The method not only can abandon similar feature mapping channels in convolution operation and concentrate on useful feature mapping channels, but also can reduce the parameter amount and the calculation amount of the network and is suitable for a lightweight network. Such as: the input feature mapping is 64 layers, and 32-layer feature mapping (Feat1) with half redundancy removed is obtained after 1x1 convolution; then, carrying out 3x3 grouping convolution on the 32-layer feature maps after the 1x1 convolution, grouping the 32-layer feature maps into 32, and outputting the 32-layer feature maps (Feat 2); and finally, carrying out a connection operation on the 32-layer feature map (Feat1) subjected to the 1x1 convolution to remove redundancy and the 32-layer feature map (Feat2) subjected to the 3x3 convolution operation, wherein the connected output feature map keeps the same channel number as the input feature map. The above process can be represented by the following formula:

F_GM＝concat(Feat1，Feat2) (3)

Feat1＝W_1×1(I_F) (4)

Feat2＝W_3×3(Feat1) (5)

where, Feat1 denotes primary convolution (primary-conv), consisting of a convolution with a convolution kernel size of 1x1 and a ReLU activation function; i is_FRepresenting the input feature mapping of the Ghost Module; feat2 denotes the lap-operation, consisting of a convolution with a convolution kernel size of 3x3 and a ReLU activation function; f_GMRepresents the output of the Ghost Module.

As an alternative embodiment, an attention module (SCM) integrating spatial and channel attention mechanisms is added at the end of each GRDB. The purpose of adding the attention module is to enable the network to focus more on adjusting useful information and high-frequency information in space and channels, enhance the expression capability of feature mapping, effectively recover more high-frequency details such as textures and contours and obtain a better high-resolution image reconstruction effect. The application range of the Attention mechanism can be divided into Channel Attention (Channel Attention) and Spatial Attention (Spatial Attention), and the present embodiment combines the two functions in the present system. The 8 th Ghost Module output is sent to the attention Module, the residual factor is multiplied by the output feature mapping point of the attention Module, then the obtained product is added to the input of the GRDB in a matrix manner, and the addition result is used as the output of the GRDB, which can be represented by the following formula:

F_Ghost＝α(F_SCM(W_1×1(G_d)))+G₀ (6)

wherein G is₀Represents the input of GRDB; w_1×1(G_d) Represents a convolution with a convolution kernel size of 1x1, including the ReLU activation function; f_SCMAn attention module is represented; α represents an occupied weight factor, α is 0.2; g_dIndicates the last oneThe output of the GhostModule can be represented by the following formula:

G_d＝concat(F_GM(G_d-1)，G_d-1，…，G₁，G₀) (7)

wherein G is₁～G_d-1Represents the output from the 1 st to the d-1 st Ghost modules, F_GMFor the operation of Ghost Module, concat represents the cascading operation.

As shown in fig. 4 and 5, GRDB integrates spatial attention with channel attention, which uses 3x3 and 1x1 convolutions. For example, the channel attention firstly performs global average pooling on input feature maps, firstly averagely pools feature maps with the original size of 48x48x64 into 64-channel single-pixel points with the size of 1x1x64, then compresses the channel number from 64 layers into 4 layers through convolution with the size of 1x1, transmits the channel number into a linear rectification function (ReLU) for activation, and finally reconstructs the compressed 4 layers into the original 64 layers through convolution with the size of 1x1, wherein the output of the channel attention is a channel weight vector with the size of 1x1x 64. Spatial attention first performs the incoming feature map with a packet convolution with a convolution kernel of 3x3 packet 64, and then a further convolution process of 1x 1. Adding pixel points to the space and channel attention outputs, and adjusting the weight of each channel and each pixel by a logistic function to be 0-1. And performing point multiplication operation on the output of the Sigmoid function and the input of the attention module so as to reserve important channel and space pixel points. The above process can be represented by the following formula:

F_SCM＝x_c·σ(F_SA+F_CA) (8)

F_SA＝W_p(W_g(x_c)) (9)

F_CA＝W_U(δ(W_D(F_GAP(x_c)))) (10)

wherein, F_SCMRepresents the output of the attention module, x_cRepresents the input to the attention module and σ represents the sigmoid function. F_SAOutput representing spatial attention manipulation, W_pRepresenting a pixel-by-pixel convolution operation with a convolution kernel size of 1x1, W_gRepresenting a packet convolution with a convolution kernel size of 3x3, packet 64. F_CAOutput, W, representing channel attention operation_UDenotes a packed convolution operation with a channel number of 4 and a convolution kernel of 1x1, W_DIndicating an expanding convolution operation with the number of channels expanded back to 64 and a convolution kernel of 1x1, with δ indicating the ReLU activation function. F_GAPRepresenting a global average pooling operation, F_GAPCan be represented by the following formula:

wherein, H and W respectively represent the height and width of the input feature mapping, and i and j respectively represent the relative horizontal and vertical coordinates of the pixel points.

The first reconstruction module is used for reconstructing the deep features output by the first deep feature extraction module to obtain a first super-resolution image, and the first super-resolution image is used for calculating a loss function of the image network model.

The first reconstruction module is mainly used for combining and reconstructing the deep features output by the first deep feature extraction module and the low-resolution image subjected to bilinear interpolation upsampling processing to obtain a first super-resolution image. The reconstruction process includes first transpose convolution operation and one layer convolution operation, the convolution kernel size is 3 × 3, then the extracted features and the low resolution picture are added by using the result after interpolation up-sampling, and the result is output as a reconstructed super-resolution picture, which can be represented by the following formula:

wherein, I_LRRepresenting an input low resolution picture, f_UPWhich represents an interpolation up-sampling operation, is,

representing the output of the first deep feature extraction module, f_RBWhich is representative of the image reconstruction function,

and (3) outputting a super-resolution picture representing the t-th iteration, wherein t is 1 or 2 (note that the value is 1, and the value of the second reconstruction modeling block is 2).

Calculation of the loss function: using the super-resolution picture output of two iterations, and performing L1 loss function calculation respectively with the original low-resolution picture and then taking an average value, which can be represented by the following formula:

wherein the content of the first and second substances,

for the L1 loss function, theta is the network parameter of the image network model, T-2 is the total iteration number, T is the iteration, I_HRAnd

respectively representing an original high-resolution picture (a picture used for training an image network model) and a super-resolution reconstructed picture. The image network model and the loss function are not described in detail here.

The characteristic refining module is used for carrying out cascade and convolution operations on the shallow characteristic and the deep characteristic extracted by the first deep characteristic extraction module to obtain refined characteristics.

As an alternative embodiment, as shown in fig. 6, the feature refining module performs 2 cascades and 2 convolutions, the first cascade concatenates the deep features of several GRDB outputs and inputs the concatenation result to the first convolution layer; the second concatenation concatenates the output of the first convolutional layer with the shallow feature and inputs the concatenated result into the second convolutional layer.

Based on the above embodiment, the last 4 GRDB outputs of the previous iteration are imported into the join operation (note that 4 are taken as an example, but not limited to 4, and not limited to the last 4), and the joined feature map is subjected to a 1x1 convolution operation, and the number of feature channels is compressed from 64x4 layers after joining to 64 layers through 1x1 convolution; and then, the feature mapping after convolution is transmitted to the iteration, the connection operation is carried out on the feature mapping after the connection and the shallow feature output of the iteration, the feature mapping after the connection is subjected to 1x1 convolution, the number of feature layers is compressed to 64 layers from 64x2 layers through the convolution operation, and finally the feature layers are transmitted to a second deep feature extraction module of the iteration. Can be represented by the following formula:

wherein the content of the first and second substances,

respectively representing the m-b GRDB to last GRDB outputs in the first deep feature extraction module iteration, f_refine(. represents a characteristic refining function, F_GFMRepresenting the output of the feature refining module.

The second deep feature extraction module is used for extracting deep features of the low-resolution image from the refined features.

As an optional implementation manner, the same as the first deep feature extraction module specifically includes: the second deep feature extraction Module comprises a plurality of GRDBs which are connected with each other, and each GRDB comprises a plurality of densely connected Ghost modules. Can be represented by the following formula:

wherein the content of the first and second substances,

respectively representing the output from the first GRDB to the last GRDB in the second deep feature extraction module.

Likewise, each GRDB incorporates an integrated space and channel attention module (SCM). And will not be described in detail herein.

The second super-resolution image is used for calculating a loss function of the image network model and outputting the loss function as the super-resolution image of the image network model.

The calculation of the loss function is shown as formula (13), and in the same way as the first reconstruction module, the second reconstruction module firstly comprises a transposition convolution operation and a layer convolution operation, the sizes of convolution kernels are all 3x3, then the extracted features and the low-resolution picture are added by utilizing the result of interpolation upsampling, and the result is output as the reconstructed super-resolution picture. The process and advantages thereof will not be described in detail herein.

The system of the embodiment improves the method for balancing the visual effect, the parameter amount and the running time of the reconstructed image, which are ubiquitous in the existing super-resolution reconstruction algorithm, and has the following beneficial effects:

(1) in the system, a feedback mechanism is added, and a first iteration of a low-resolution image is formed through a shallow layer feature extraction module, a first deep layer feature extraction module and a first reconstruction module; the system can refine the deep feature mapping extracted by the first iteration into the shallow feature mapping of the second iteration by the aid of the second iteration of the low-resolution image formed by the feature refining module, the second deep feature extraction module and the second reconstruction modeling module, and can extract deeper features of the low-resolution image under the condition of not deepening the network depth, so that the training effect of the image network model is improved.

(2) The system designs a GRDB consisting of densely connected Ghost modules, and then forms a deep feature extraction Module by the interconnected GRDBs, so that the system can remove redundant channels in feature channels while achieving the feature extraction function of common convolution, and reduce the parameter quantity in the network.

(3) The system adds an attention mechanism into each GRDB, so that the network can be more concentrated on adjusting useful information and high-frequency information in space and channels, the expression capability of feature mapping is enhanced, more high-frequency details such as textures and contours are effectively recovered, and a better high-resolution image reconstruction effect is obtained. And because of the existence of the Ghost Module, the increase of the parameter quantity generated by increasing the attention mechanism has no influence.

(4) Compared with the existing models such as VDSR, DRRN, CARN, IMDN and the like, the system not only has optimal visual effect, but also is superior to the super-resolution model on the objective evaluation standards of PSNR and SSIM.

The embodiment of the invention provides a single-image super-resolution reconstruction method based on a feedback mechanism, which is used for training an image network model and comprises the following steps:

s100, first iteration:

s200, second iteration:

and extracting deep features of the low-resolution image from the refined features, and performing secondary reconstruction on the deep features to obtain a second super-resolution image, wherein the second super-resolution image is used for calculating a loss function of the image network model and outputting the second super-resolution image as a super-resolution image of the image network model.

As an alternative embodiment, the extracting deep features of the low-resolution image from the shallow features includes: and extracting deep features of the low-resolution image from the shallow features through a plurality of interconnected GRDBs, wherein each GRDB comprises a plurality of densely connected Ghost modules.

As an alternative embodiment, the extracting deep features of the low resolution image from the refined features includes: and extracting deep features of the low-resolution image from the refined features through a plurality of interconnected GRDBs, wherein each GRDB comprises a plurality of densely connected Ghost modules.

As an optional implementation manner, before the first reconstruction and the second reconstruction, the method further includes: an attention mechanism that integrates spatial and channel attention is performed on deep features.

It should be noted that the present embodiment and the above system embodiment are based on the same inventive concept, and therefore, the related contents of the above system embodiment are also applicable to the present embodiment, and are not described herein again.

An embodiment of the present invention provides an image network model, and the model uses the feedback mechanism-based single image super-resolution reconstruction method described in the above method embodiment or the feedback mechanism-based single image super-resolution reconstruction system described in the above system embodiment during training. The training process and the testing process (including the processing of the training set) of the image network model provided by the embodiment are abstracted as follows:

1. training an image network model;

1.1, downsampling the high-resolution picture data set by an interpolation-based method to obtain a corresponding low-resolution picture data set.

1.2, setting the magnification factor (2 times, 3 times, 4 times and the like) of the image network model and the target path of the high-resolution and low-resolution picture data sets.

And 1.3, inputting the processed picture data set into an image network model in blocks for carrying out adjacent interpolation amplification.

And 1.4, inputting the picture data set into the system of the embodiment, and executing corresponding operation to obtain a super-resolution image.

And 1.5, training a convolution operator by using a loss function for the super-resolution image by taking the corresponding picture data set as supervision training.

And 1.6, obtaining a network model with corresponding magnification through a plurality of rounds of training.

2. Using a network model;

and 2.1, reconstructing the low-resolution image into a high-resolution image by using the trained corresponding magnification image network model according to the magnification to be amplified.

As shown in fig. 7, based on the above embodiments, the embodiments of the present invention provide a set of simulation experiments, which are specifically as follows:

simulating an environment;

the platform used was the ubantu16.08 operating system, the memory size was 128GB, the CPU used intel to strong E5-2670, the GPU used intemada TITANX, and trained in the pytorch0.4.0 deep learning environment of the GPU version. In the embodiment, a network parameter weight is initialized by adopting a method of Hommin et al, an Adam algorithm is adopted to optimize network parameters, the batch processing size is set to be 16, the image block size is set to be 48x48, the initial learning rate is 10-4, the learning rate is reduced to half of the original rate every 200 times of iterative training, and 1000 times of total iteration is performed.

Simulating a data set;

the adopted training set is DIV2K, the data set is published in NITRE challenge in 2017 and is used as a high-quality image data set for an image repairing task, and each image achieves the standard of 2K resolution ratio through 800 training sets, 100 verification sets and 100 test set pictures. The picture types of the DIV2K include characters, handmade products, environments (cities, villages), animals and plants, natural scenery, and the like. This example uses only 800 training set pictures for training and performs data enhancement before training. Data enhancement adopts three modes, namely 1, randomly rotating the picture by 90 degrees, 180 degrees and 270 degrees; 2. horizontally or vertically turning the picture; 3. the original image is reduced by a factor of reduction factors of 0.9, 0.8, 0.7 and 0.6. The training set after data enhancement is 10 times of the original picture, i.e. 8000 pictures. And finally, carrying out bilinear downsampling operation of different multiples (2 times, 3 times and 4 times) on the high-definition picture after data enhancement to obtain a low-resolution picture, and forming a training data pair with the original high-definition picture.

The test set adopted: set5, Set14, BSD100, Urban100, and Manga109 five widely used super resolution benchmark test sets for model performance evaluation. Wherein Urban100 contains 100 challenging Urban scene pictures, containing dense high-frequency feature details. Manga109 is 109 cartoon cover pictures, has high-frequency and low-frequency information and text information, and examines the comprehensive processing capacity of the model on the text and the pictures.

Experimental results;

the test set is adopted for testing, and common performance evaluation indexes are used: peak Signal to Noise Ratio (PSNR) and Structural Similarity (SSIM), and selecting Y channel for performance evaluation under YCbCr color coding format. In this embodiment, the models trained with different magnifications are used to perform the test of the corresponding magnification, and the test result is compared with the super-resolution model advanced in the coming year, including Bicubic (Bicubic), srncn, DRRN, IDN, cari, IMDN, and other models. Table 1 below shows the results of quantitative comparison of different super-resolution models using PSNR and SSIM at three different magnifications x2, x3, and x4, where bold and dash lines indicate the best results in the above algorithm:

TABLE 1

As can be seen from table 1, most of the PSNR and SSIM in the above test set of the model of this embodiment exceeds other super-resolution models or achieves suboptimal performance, and it is known that the difficulty of image super-resolution reconstruction increases with the increase of the magnification, and it can be seen that the model of this embodiment is superior to most of the models under the conditions of 3 times and 4 times of magnification. Under the condition of 4-fold amplification, in the test set Manga109, the PSNR value of the model of the embodiment is improved by 2.99dB compared with the SRCNN of the classical model and is improved by 0.13dB compared with the IMDN of the latest model. The model of the embodiment can introduce the high-frequency details which are difficult to learn in the high magnification factor, because the characteristics of deep extraction of the previous iteration are introduced into the iteration through a feedback mechanism, the learning depth of the high-frequency information is deepened, and therefore a good reconstruction effect is obtained in the high magnification factor. The effect of the model in the Urban100 test set is obviously higher than that of other data sets because the Urban100 data set contains pictures of Urban buildings and high-frequency details are more. The attention mechanism in the model of the embodiment can screen and reserve the characteristics containing more high-frequency information in the channels and the space, so that a better reconstruction effect can be obtained in an Urban test set. The GRDB in the model of this embodiment uses the densely connected Ghost Module, which can remove the redundant channel in the feature channel and reduce the amount of parameters in the network.

	DRCN	IDN	SRMDNF	CARN	SRRAM	This example
							x3-params	1774	553	1528	1592	1127	1197
PSNR	27.15	27.42	27.57	28.06	28.12	28.23

TABLE 2

As can be seen from table 2, in the case that the amplification factor is 3 times and the test set is Urban100, the model of this embodiment obtains the best PSNR score under the condition that the parameter amount is kept low, which indicates that the model of this embodiment is more suitable for the mobile terminal device with less storage space under the premise of better reconstruction effect.

To further illustrate the experimental effect, in this embodiment, pictures in the Urban100 data set are selected for comparison, the data set contains relevant information of cities and buildings, high-frequency details are rich, and the super-resolution reconstruction is challenging. This example uses VSDR, DRCN, DRRN, laprn, MemNet, IDN, carry, IMDN and the model of this example (named FGRDN in the figure) to reconstruct them, the effect is shown in fig. 7.

As shown in fig. 7, the double-enlarged img _67 picture is a building with a glass outer frame structure, and contains a lot of transverse and oblique high-frequency information, most of the reconstruction models fail to recover clear transverse black lines, such as VDSR, DRCN, DRRN, laprn, MemNet, IDN, carin, and IMDN models, which can recover transverse black lines clearly, but some black lines have jaggy feeling, and the detail recovery is not sufficient. And the model of the embodiment can reconstruct more details, and the transverse black lines are clearer. The img _76 picture which is enlarged three times is a display combined building with a human face display, the longitudinal and oblique high-frequency details are denser, the extraction of the high-frequency details of the model is more challenging, most of reconstruction cannot clearly recover oblique display frame lines, and as shown in the model IDN, the low-frequency details are recognized as the high-frequency details, and a light-color area displayed by the display is recovered as the display frame by mistake, so that the visual perception is seriously influenced. The model of the embodiment can recover more high-frequency details and display a clear black border of the display. In the model of the embodiment, good effects are obtained on the two selected visual effect pictures by comparing the PSNR and SSIM evaluation indexes with the recovery effect of the existing model.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example," or "some examples" or the like mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A single-image super-resolution reconstruction system based on a feedback mechanism is used for training an image network model and is characterized by comprising the following components:

2. The single image super-resolution reconstruction system based on the feedback mechanism of claim 1, wherein the first deep feature extraction Module comprises a plurality of connected Ghost residual dense modules, each Ghost residual dense Module comprises a plurality of densely connected Ghost modules.

3. The single image super-resolution reconstruction system based on the feedback mechanism of claim 1, wherein the second deep feature extraction Module comprises a plurality of connected Ghost residual dense modules, each Ghost residual dense Module comprises a plurality of densely connected Ghost modules.

4. The single image super-resolution reconstruction system based on the feedback mechanism as claimed in any one of claims 2 or 3, wherein each Ghost residual dense module incorporates an attention module integrating spatial and channel attention mechanisms.

5. The single image super-resolution reconstruction system based on the feedback mechanism as claimed in claim 2, wherein: the feature refining module executes 2-time cascade connection and 2-time convolution, the first cascade connection connects deep features output by a plurality of Ghost residual error intensive modules, and a connection result is input to a first convolution layer; the second concatenation concatenates the output of the first convolutional layer with the shallow feature and inputs the concatenated result into the second convolutional layer.

6. A single image super-resolution reconstruction method based on a feedback mechanism is used for training an image network model and is characterized by comprising the following steps:

the first iteration:

and (3) second iteration:

7. The method for single-image super-resolution reconstruction based on the feedback mechanism of claim 6, wherein the extracting deep features of the low-resolution image from the shallow features comprises: and extracting deep features of the low-resolution image from the shallow features through a plurality of connected Ghost residual error dense modules, wherein each Ghost residual error dense Module comprises a plurality of densely connected Ghost modules.

8. The method for single-image super-resolution reconstruction based on the feedback mechanism of claim 7, wherein the extracting deep features of the low-resolution image from the refined features comprises: and extracting deep features of the low-resolution image from the refined features through a plurality of connected Ghost residual error dense modules, wherein each Ghost residual error dense Module comprises a plurality of densely connected Ghost modules.

9. The method for single-image super-resolution reconstruction based on the feedback mechanism of claim 8, wherein before the first reconstruction and the second reconstruction, the method further comprises: an attention mechanism process that integrates spatial and channel attention is performed on the deep features.

10. An image network model, characterized in that the image network model uses the feedback mechanism-based single image super-resolution reconstruction system according to any one of claims 1 to 5 or the feedback mechanism-based single image super-resolution reconstruction method according to any one of claims 6 to 9 when training.