CN110992270A

CN110992270A - Multi-scale residual attention network image super-resolution reconstruction method based on attention

Info

Publication number: CN110992270A
Application number: CN201911319741.5A
Authority: CN
Inventors: 谌贵辉; 陈伍; 李忠兵; 谌杰睿; 易欣; 彭姣; 赵茂君
Original assignee: Southwest Petroleum University
Current assignee: Southwest Petroleum University
Priority date: 2019-12-19
Filing date: 2019-12-19
Publication date: 2020-04-10

Abstract

The invention belongs to the technical field of image super-resolution reconstruction, and discloses a multi-scale residual attention network image super-resolution reconstruction method based on attention, which comprises the steps of selecting a public image data set as an image set to be tested, dividing the image set to be tested into an image training set and an image testing set, and carrying out image preprocessing; designing a multi-scale residual error structural unit module, introducing a channel attention mechanism, and building a multi-scale residual error attention neural network model based on channel attention; inputting the preprocessed image training set into a multi-scale residual error attention neural network model based on channel attention to perform model training; and inputting the preprocessed image test set into the trained model for testing to obtain a finally reconstructed high-resolution image. The method ensures that the basic unit concentrates on extracting high-frequency information, highlights important characteristic diagram information in a channel better, extracts important information in an image better and reduces reconstruction errors.

Description

Multi-scale residual attention network image super-resolution reconstruction method based on attention

Technical Field

The invention belongs to the technical field of image super-resolution reconstruction, and particularly relates to a multi-scale residual error attention network image super-resolution reconstruction method based on attention.

Background

Image super-resolution reconstruction is a technique that uses an input image as a low resolution image to generate a high resolution output image. The application field of image super-resolution reconstruction relates to the field of image processing, and has important application prospects in the aspects of military affairs, computer vision, medical diagnosis, public safety, satellite images and the like.

The current reconstruction algorithm applied to the super-resolution of the image can be mainly divided into an interpolation-based method, a reconstruction-based method and a learning-based method.

Based on interpolation methods, common interpolation methods are nearest neighbor interpolation, bilinear interpolation, and bicubic interpolation. The method utilizes the correlation between adjacent pixels of an input single low-resolution image and adopts a mathematical interpolation principle to solve the pixel of an unknown point, thereby reconstructing a high-resolution image. However, the interpolation-based method does not fully consider the global information of the image, and meanwhile, the reconstructed high-resolution image is too smooth and loses most of details of the image, the reconstructed high-resolution image has shock at the position where the gray scale changes violently, the image detail recovery effect is poor, the edge effect is serious, and especially the high-frequency information is seriously lost.

The reconstruction-based method obtains the dependency relationship between the pixels of the high-resolution image and the low-resolution image as the prior knowledge of the image to be reconstructed according to the registration corresponding relationship between the low-resolution image and the high-resolution image, and reconstructs the target high-resolution image by utilizing the prior knowledge. In the process of adopting the reconstruction-based method, the input image signal with low resolution can be supposed to well predict the original image signal with high resolution. Obtaining an LR image needed by people according to a known degradation model, extracting key pixel point characteristic information in the LR image, carrying out prior constraint on generation of an HR image to be generated, and combining prior knowledge in the HR image to obtain a corresponding high-resolution image to be reconstructed. However, the image priori knowledge obtained based on the reconstruction method is limited, so that more high-frequency detail information cannot be recovered for the image with complex reconstruction.

The learning-based method guides the reconstruction of the high-resolution image by learning the mapping relation between the high resolution and the low resolution and utilizing the image priori knowledge obtained by learning. The learning-based method mainly comprises a domain embedding method, a sparse representation method and a deep learning-based method. Yang et al propose a super-resolution reconstruction method based on sparse representation and dictionary learning, which carries out image reconstruction according to the LR image block and the corresponding HR image block super-complete dictionary pair through learning. However, the learning requirement on the high-resolution and low-resolution overcomplete dictionary pairs is high in the reconstruction process, the practicability of the reconstructed image is poor, the field embedding and sparse dictionary combining with Timofte and the like are combined, an anchoring field regression (ANR) algorithm and an improved anchoring field regression (A +) algorithm are provided, although the calculation efficiency in the reconstruction process is improved, the high-frequency detail recovery effect of the image is still poor, and the reconstructed high-resolution image is not well improved. In recent years, with the vigorous development of neural networks, the ability of convolutional neural networks to efficiently extract feature information has been widely applied to image super-resolution reconstruction. In 2014, Dong and the like firstly adopt a convolutional neural network (SRCNN) with three layers of an input layer, a feature extraction layer and a reconstruction layer to extract high-frequency information features of the image. The network has simple structure, less layers and easy realization. The effect is better than that of the traditional interpolation-based and reconstruction-based methods. However, due to the defects of a small number of network layers, a small receptive field, poor generalization capability, limited extracted high-frequency information of the image and the like, the characteristic of the deep high-frequency information of the image is not extracted, and the reconstruction effect is general.

It is known that as the number of network layers increases, the deep feature information of the image can be further extracted. A20-layer convolutional neural network VDSR network model is provided, and image information features are extracted by increasing the number of network layers and increasing the receptive field. However, as the number of network layers increases, phenomena such as gradient extinction and gradient explosion occur, so that the defects that the network is difficult to train and converge exist. The problem can be better solved by introducing the residual error structure while deepening the network under the inspiration of the residual error network structure. However, the number of network layers is increased, and the simple stacking of the network layers leads to increased computational burden, and the network still has the defects of difficult convergence and the like. Meanwhile, the introduction of a single residual structure can not extract the features of different scales, and on the aspect of the problem, a network equally treats the feature information of each channel to be processed, and equally treats the channels rich in high-frequency information channels and a large amount of low-frequency information, so that limited network computing resources are wasted, and the loss of high-frequency information is caused.

Disclosure of Invention

The invention aims to provide an image super-resolution reconstruction method of a multi-scale residual error network based on a channel attention mechanism, which is used for solving the problems in the prior art.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a multi-scale residual attention network image super-resolution reconstruction method based on attention comprises the following steps:

selecting a public image data set as an image set to be tested, dividing the image set to be tested into an image training set and an image testing set according to a certain proportion, and performing image preprocessing operation;

designing a multi-scale residual error structural unit module, introducing a channel attention mechanism, and building a multi-scale residual error attention neural network model based on channel attention;

inputting the preprocessed image training set into the multi-scale residual error attention neural network model based on the channel attention for model training to obtain a trained multi-scale residual error attention neural network model based on the channel attention;

and inputting the preprocessed image test set into a trained multi-scale residual error attention neural network model based on channel attention to test to obtain a finally reconstructed high-resolution image.

Further, the method for selecting the public image data set as the image set to be tested, dividing the image set to be tested into an image training set and an image testing set according to a certain proportion, and performing image preprocessing operation comprises the following steps:

adopting a DIV2K data set as an image set of an experiment, randomly selecting N images from a plurality of high-resolution images as an experiment training set, and taking the remaining M images as an experiment testing set; respectively carrying out down-sampling on the original high-resolution images by a bicubic interpolation method of a down-sampling factor k on the experimental training set and the test set to obtain a corresponding LR experimental training set and an LR experimental test set, wherein k is 2,3 and 4 times, and the representation images are reduced by 2,3 and 4 times;

cutting the LR experiment training set into I size_LR×I_LRAnd the high-resolution image corresponding to the LR experimental training set is cut into the size I_HR×I_HRThe size of the cut LR image and the size of the cut HR image satisfy the relation I_HR×I_HR＝kI_LR×kI_LRActual image tensor sizes of the LR image and the HR image are H × W × C and kH × kW × C, respectively;

and taking the LR experiment training set obtained by cutting as an input label of a training network, and taking the HR image block data set obtained by cutting as a data label of the training network.

Further, the method for designing the multi-scale residual error structural unit module, introducing a channel attention mechanism, and building the multi-scale residual error attention neural network model based on the channel attention comprises the following steps:

constructing a multi-scale residual error structure unit module by using residual error structures with convolution sizes of 3 multiplied by 3 and 5 multiplied by 5, and introducing an attention mechanism into an output part of the multi-scale residual error structure unit module;

and constructing a multi-scale residual error attention network based on a channel attention mechanism by using a plurality of convolution layers, the multi-scale residual error structural unit module and the sub-pixel convolution, and optimizing the multi-scale residual error attention network based on the channel attention mechanism by adopting a minimum absolute value deviation loss function.

Further, the attention mechanism consists of three processes of squeezing and exciting function and matrix product, wherein,

the extrusion process comprises the following steps: performing global average pooling on input image features with tensor H W C, so that the tensor size of the input image features is 1W 1C, wherein a squeezing function corresponding to the global average pooling is as follows:

where H × W denotes the tensor size, f_sq(u_c) The function represents the global average pooling operation, u_c(i, j) denotes the c-th feature u_cValue at (i, j), u_cRepresenting the original input tensor of the channel attention block;

the excitation process is as follows: adaptively calibrating the weight of each channel by using an excitation function, wherein the excitation function is as follows:

e_c＝f_ex(z,W)＝σ(g(z,W))＝σ(W₂δ(W₁z)

wherein the delta function represents a ReLU activation function, sigma is a sigmoid activation function, W₁、W₂Are respectively expressed as

And

the matrix multiplication process is as follows: the tensor with the weight score of 1 × 1 × C, which is subjected to the squeezing and exciting part, is multiplied by the original input tensor, and is expressed as:

X_c＝f(u_c,e_c)＝e_cu_c

wherein, X_cRepresenting the output of the entire channel attention Block, e_cRepresenting the output tensor, u, of the excited part_cRepresenting the original input tensor of the channel attention block.

Further, the minimum absolute value deviation loss function is:

wherein L is_LADRepresents the minimum absolute value deviation loss function, I represents the ith sample block in the training set, F (I)ⁱ _LR) Representing the high-resolution image block reconstructed from the ith LR image, N representing the total number of samples in the training set, Iⁱ _HRRepresenting the original true high resolution image.

Compared with the prior art, the technical scheme provided by the invention has the following beneficial effects or advantages:

according to the attention-based multi-scale residual attention network image super-resolution reconstruction method, a single-scale residual structure is designed to be a residual structure with different scales, so that image characteristic information under different scales is extracted, image characteristic information under different scales is fused, a channel attention mechanism is introduced to the tail end of a basic unit to be constructed, the basic unit is more focused on extracting high-frequency information, meanwhile, the channel attention mechanism is introduced, important characteristic diagram information in a channel can be better highlighted, the important information in the image can be better extracted, and reconstruction errors are reduced.

Specific embodiments of the present invention are disclosed in detail with reference to the following description and drawings, indicating the manner in which the principles of the invention may be employed. It should be understood that the embodiments of the invention are not so limited in scope. The embodiments of the invention include many variations, modifications and equivalents within the spirit and scope of the appended claims.

Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments, in combination with or instead of the features of the other embodiments.

It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, integers, steps or components but does not preclude the presence or addition of one or more other features, integers, steps or components.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 is a flowchart of a method for super-resolution reconstruction of a multi-scale residual attention network image based on attention according to an embodiment of the present invention;

FIG. 2 is a block diagram of a multi-scale residual error structure unit module according to an embodiment of the present invention;

FIG. 3 is a block diagram of a network architecture of an attention mechanism in an embodiment of the present invention;

fig. 4 is a structural block diagram of a built multi-scale residual attention neural network model based on channel attention in the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

Examples

As shown in fig. 1, an embodiment of the present invention provides a super-resolution reconstruction method for a multi-scale residual attention network image based on attention, which includes the following steps:

step S1: selecting a public image data set as an image set to be tested, dividing the image set to be tested into an image training set and an image testing set according to a certain proportion, and performing image preprocessing operation.

In a specific implementation process, a public image data set is selected as an image set to be tested, the image set to be tested is divided into an image training set and an image testing set according to a certain proportion, and the method for performing image preprocessing operation comprises the following steps:

firstly, a DIV2K data set is used as an image set of an experiment, N pictures are randomly selected from a plurality of high-resolution images to be used as an experiment training set, and M pictures are left to be used as an experiment testing set. For example, 900 pictures are randomly selected from 1000 high-resolution images to be used as an experimental training set, and the rest 100 pictures are used as an experimental testing set.

And then, respectively carrying out down-sampling on the original high-resolution images by a bicubic interpolation method of a down-sampling factor k on the experimental training set and the test set to obtain a corresponding LR experimental training set and an LR experimental test set, wherein k is 2,3 and 4 times, and the representation images are reduced by 2,3 and 4 times.

Then, the LR experimental training set is trimmed to a size I_LR×I_LRCutting the high-resolution image corresponding to the LR experimental training set into a size I_HR×I_HRThe size of the cut LR image and the size of the cut HR image satisfy the relation I_HR×I_HR＝kI_LR×kI_LRThe actual image tensor sizes of the LR and HR images are H W C and kH kW C, respectively.

And finally, taking the LR experimental training set obtained by cutting as an input label of a training network, and taking the HR image block data set obtained by cutting as a data label of the training network. The training set after image pre-processing is represented as

After the image preprocessing is completed, step S2 is executed: and designing a multi-scale residual error structural unit module, introducing a channel attention mechanism, and building a multi-scale residual error attention neural network model based on channel attention.

In a specific implementation process, the method for designing the multi-scale residual error structural unit module and introducing the channel attention mechanism in the embodiment of the invention to build the multi-scale residual error attention neural network model based on the channel attention specifically comprises the following steps:

first, a multi-scale residual structure unit module is constructed with residual structures of convolution sizes of 3 × 3 and 5 × 5, as shown in fig. 2, and a mechanism of attention is introduced at an output portion of the multi-scale residual structure unit module. Specifically, the attention mechanism in the embodiment of the present invention is specifically an SE module (squeeze-and-excitation blocks), and the SE module specifically includes three processes of squeezing, exciting, and matrix product, as shown in fig. 3, where:

where H × W denotes the tensor size, f_sq(u_c) The function represents the global average pooling operation, u_c(i, j) denotes the c-th feature u_cValue at (i, j), u_cRepresenting the original input tensor of the channel attention block.

The above-mentioned squeeze function performs a global averaging operation on the input tensor, sums all eigenvalues of each channel, and then takes an average.

The pressing process is prepared for the subsequent activation process. For the convolution operation of the convolution kernel, because the original feature map is convolved in the manner of one local receptive field when the convolution kernel performs the convolution operation on the feature map, the feature map information outside the local receptive field cannot be utilized in the convolution operation of the convolution kernel, and the global feature information of the feature map is not fully utilized. This problem is more pronounced especially in the initial stages of the network. The pressing operation of the SE structure can solve this problem well. By globally pooling the feature maps of the channels, all position information of the whole map in the feature maps of the channels is fused, and the situation that when channel weight evaluation is carried out, extracted information cannot represent the feature map information of the channel when the feature map convolution is carried out on a local receptive field of a convolution kernel is avoided, so that reference information is insufficient, and the evaluation is inaccurate is avoided.

In order to fully utilize information after the global average pooling operation, excitation is required, and the excitation process in the embodiment of the present invention specifically includes: adaptively calibrating the weight of each channel by using an excitation function, wherein the excitation function is as follows:

e_c＝f_ex(z,W)＝σ(g(z,W))＝σ(W₂δ(W₁z)

And

taking the feature graph after the full-play average pooling as the input of an excitation part, achieving the purpose of reducing the dimension through a full-connection layer and a ReLU activation function, achieving the original dimension through the full-connection layer and a sigmoid activation function, and simultaneously evaluating the weight score of the channel, wherein the weight score of each channel is in the range of [0,1], and the more the channel weight score is close to 1, the more important the channel information is represented.

After excitation, the tensor with the weight score of 1 × 1 × C and the original input tensor after the extrusion and excitation are subjected to a product operation, namely, a matrix product process, which is expressed as:

X_c＝f(u_c,e_c)＝e_cu_c

After a multi-scale residual error attention neural network model based on channel attention is built, a minimum absolute value deviation loss function is adopted to optimize the multi-scale residual error attention network based on the channel attention mechanism. The minimum absolute value deviation loss function in the embodiment of the invention is specifically as follows:

The constructed multi-scale residual attention neural network model based on the channel attention is shown in fig. 4 and comprises an image feature information extraction part, a convergence layer part and a reconstruction part.

Wherein the image feature information extracting section: in order to keep the input image and output image size equal for each layer in the network, a padding operation is used, while the multi-scale residual structure unit consists of residual structures of convolution size 3 x 3 and 5 x 5, convolution layers of 3 x 3 and 5 x 5, and active layers.

The convergence layer part: and carrying out information fusion on the characteristic information extracted by each multi-scale residual attention structure unit through a convergence layer, wherein the convergence layer adopts convolution size of 1 × 1 size to carry out characteristic fusion.

And a reconstruction part: to magnify the image by a factor of k, we use the sub-pixel convolution layer to upsample the output of the feature extraction.

After the multi-scale residual attention neural network model based on the channel attention is built, step S3 is executed: inputting the preprocessed image training set into the multi-scale residual error attention neural network model based on the channel attention for model training, and obtaining the trained multi-scale residual error attention neural network model based on the channel attention.

After the training is completed, step S4 is performed: and inputting the preprocessed image test set into a trained multi-scale residual error attention neural network model based on channel attention to test to obtain a finally reconstructed high-resolution image.

According to the attention-based multi-scale residual attention network image super-resolution reconstruction method provided by the embodiment of the invention, a single-scale residual structure is designed to contain residual structures with different scales, so that image characteristic information under different scales is extracted, simultaneously, the image characteristic information under different scales is fused, a channel attention mechanism is introduced to the tail end of a basic unit for construction, the basic unit is more focused on extracting high-frequency information, and meanwhile, the channel attention mechanism is introduced, so that important characteristic diagram information in a channel can be better highlighted, the important information in the image can be better extracted, and reconstruction errors are reduced.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A multi-scale residual attention network image super-resolution reconstruction method based on attention is characterized by comprising the following steps:

2. The attention-based multi-scale residual error attention network image super-resolution reconstruction method of claim 1, wherein the method for selecting a common image data set as an image set to be tested, dividing the image set to be tested into an image training set and an image testing set according to a certain proportion, and performing image preprocessing operation comprises the following steps:

3. The attention-based multi-scale residual attention network image super-resolution reconstruction method according to claim 2, wherein the method for designing the multi-scale residual structure unit module and introducing the channel attention mechanism to build the channel attention-based multi-scale residual attention neural network model comprises the following steps:

4. The method for super-resolution reconstruction of multi-scale residual attention network image based on attention according to claim 3, wherein the attention mechanism consists of three processes of squeezing and excitation and matrix product, wherein,

where H × W denotes the tensor size, f_sq(u_c) Function representation global average pooling operation，u_c(i, j) denotes the c-th feature u_cValue at (i, j), u_cRepresenting the original input tensor of the channel attention block;

e_c＝f_ex(z,W)＝σ(g(z,W))＝σ(W₂δ(W₁z)

And

X_c＝f(u_c,e_c)＝e_cu_c

5. The attention-based multi-scale residual attention network image super-resolution reconstruction method of claim 4, wherein the minimum absolute value deviation loss function is:

wherein L is_LADRepresents the minimum absolute value deviation loss function, i represents the ith sample block in the training set,

representing the high-resolution image blocks reconstructed from the ith LR image, N representing the total number of samples in the training set,

representing the original true high resolution image.