CN111709895B

CN111709895B - Image blind deblurring method and system based on attention mechanism

Info

Publication number: CN111709895B
Application number: CN202010553157.2A
Authority: CN
Inventors: 林晨; 王子健; 尹增山
Original assignee: Shanghai Engineering Center for Microsatellites; Innovation Academy for Microsatellites of CAS
Current assignee: Shanghai Engineering Center for Microsatellites; Innovation Academy for Microsatellites of CAS
Priority date: 2020-06-17
Filing date: 2020-06-17
Publication date: 2023-05-16
Anticipated expiration: 2040-06-17
Also published as: CN111709895A

Abstract

The invention provides an image blind deblurring method and system based on an attention mechanism, comprising the following steps: the multiscale attention network adopts an end-to-end mode to directly restore clear images; the multi-scale attention network adopts an asymmetric coding and decoding structure, and a coding side of the coding and decoding structure adopts a residual dense network block so as to finish the characteristic extraction and expression of the multi-scale attention network on an input image; a plurality of attention modules are arranged on the decoding side of the coding and decoding structure, the attention modules output preliminary restored images, and the preliminary restored images form an image pyramid type multi-scale structure; the attention module also outputs an attention profile that models relationships between remote regions from a global perspective to process blurred images; the prior loss of the dark channel and the loss of the multi-scale content form a loss function, and the loss function is used for reversely optimizing the network and is not self-optimization.

Description

Image blind deblurring method and system based on attention mechanism

Technical Field

The invention relates to the technical field of image processing, in particular to an image blind deblurring method and system based on an attention mechanism.

Background

Dynamic scene blurring is a common phenomenon caused by factors such as shaking of shooting equipment, movement of a target object under different depths of field, camera defocus and the like in the process of image acquisition. Blurred images not only affect visual effects, but are also detrimental to subsequent computer vision tasks such as object detection, semantic segmentation, etc., and therefore deblurring has been a fundamental but very important problem in the field of image processing.

Typically blurred images are mathematically modeled as:

wherein,,

representing a two-dimensional spatial convolution operation. Blurred image I _Blur Is composed of clear image I _Sharp Convolving with the fuzzy kernel K, and overlapping noise N. The two types of problems are classified according to whether the blur kernel is known to be non-blind and blind deblurring. In addition, the blur kernel of a pixel point at an arbitrary position of an image is classified into uniform blur (spatially invariant blur) and nonuniform blur (spatially variant blur) according to whether or not the blur kernel is the same.

Blurred image restoration is the reconstruction of potentially sharp images from blurred images, and in general, blur in a real dynamic scene is unpredictable and spatially varying. Non-uniform blind deblurring is therefore of great research value.

Early researchers followed the estimation of the blur kernel K followed by the estimation of the latent sharp image I in a mathematical model approach based on equation (1) _Sharp It can be seen that blind deblurring is an inverse problem that is not unique to the solution. For the discomfort of the deblurring problem, different types of blur are modeled based on specific assumptions, and different natural images and a priori knowledge of camera trajectories are introduced to constrain the solution space of the clear image. In recent years, with the development of deep learning, many neural network-based methods are increasingly applied to estimation of blur kernels and deconvolution of clear images.

However, the image quality restored by the above method is severely dependent on the accuracy of the blur kernel estimation, and the iterative solution has a large computational overhead. Therefore, recent work regards image restoration as an image translation problem, and uses an end-to-end method to directly output a clear image, so that the complex and time-consuming process of fuzzy kernel estimation is avoided, and many improved models are proposed by researchers so far.

However, the existing blind deblurring method based on the convolutional neural network cannot efficiently process complex dynamic scene blurring, and texture details and edge structures of an output image are unclear. The multi-scale deep learning model often repeats a plurality of sub-network modules and cyclic training, and although the deblurring effect is improved to a certain extent, the problems of complex model, large parameter quantity, high training difficulty, time consuming reasoning and the like exist, and meanwhile, the relation among parameters under different scales needs to be considered. The local connectivity nature of the convolutional layer results in a limited increase in receptive field of the network and cannot exploit the relationship between fuzzy regions from a global perspective. In addition, the middle layer treats different information in the feature map indiscriminately, and limits the feature expression and learning ability of the network to a certain extent.

Disclosure of Invention

The invention aims to provide an image blind deblurring method and system based on an attention mechanism, which are used for solving the problem that the existing blind deblurring method based on a convolutional neural network cannot efficiently process complex dynamic scene blurring.

In order to solve the technical problems, the invention provides an attention mechanism-based image blind deblurring method, which comprises the following steps:

the multiscale attention network adopts an end-to-end mode to directly restore clear images;

the multi-scale attention network adopts an asymmetric coding and decoding structure, and a coding side of the coding and decoding structure adopts a residual dense network block so as to finish the characteristic extraction and expression of the multi-scale attention network on an input image;

a plurality of attention modules are arranged on the decoding side of the coding and decoding structure, the attention modules output preliminary restored images, and the preliminary restored images form an image pyramid type multi-scale structure;

the attention module also outputs an attention profile that models relationships between remote regions from a global perspective to process blurred images;

the dark channel a priori loss and the multi-scale content loss constitute a loss function that is used to reverse optimize the network.

Optionally, in the image blind deblurring method based on the attention mechanism, the asymmetric codec structure includes:

the coding side consists of four convolution modules, and each scale comprises a convolution layer with a step length of 2 and three intensive network blocks connected with residual errors;

the decoding side consists of three attention modules and a reconstruction module, wherein the attention modules and the reconstruction module complement each other, and the importance of the self-adaptive calibration feature map is beneficial to reconstructing information; each reconstruction module comprises a transposed convolution layer and three bottleneck residual blocks;

the bottleneck residual block structure consists of two convolution layers, wherein the channel dimension is increased through a 1X 1 convolution layer, then the target output is obtained through a 3X 3 convolution kernel, and a nonlinear activation function is arranged behind each convolution layer;

and jump connection is added between the attention feature graphs with the same size on the two sides of the coding and decoding, and fusion of the low-layer features and the high-layer features is completed through addition.

Optionally, in the image blind deblurring method based on the attention mechanism, the residual dense network block includes:

residual error learning is completed according to identity mapping between an input image and an output image, and the input image with the similarity reaching a threshold value is directly connected with the output image by the identity mapping in the image restoration process;

the residual dense network block transmits the output of the last attention module to the input of each convolution layer in the current attention module, so that the continuous transmission of the characteristics is completed;

the output of each convolution layer is transmitted to the input of each subsequent convolution layer in a densely connected manner by splicing in the channel dimension;

the spliced dense features are subjected to nonlinear transformation through a convolution layer to fuse the multi-channel features;

residual scaling is used in local residual learning of dense network blocks;

the residual dense network block integrates the residual network block and the dense network block.

Optionally, in the image blind deblurring method based on the attention mechanism, the attention module includes:

the attention module carries out nonlinear transformation on an image with input characteristics of 1 multiplied by 1 to generate an attention characteristic diagram with the channel number of 64;

compressing the channel number to 3 through a 5×5 predictive convolution layer, and outputting a clear image under the current resolution;

up-sampling the output image through nearest neighbor interpolation, and obtaining a global feature map through nonlinear transformation;

the global feature map comprises shallow layer features extracted from the preliminary restored image, and the shallow layer features are spliced with deep layer features obtained by a reconstruction module at a decoding side in the channel dimension to adjust the importance of the features at different spatial positions.

Optionally, in the image blind deblurring method based on the attention mechanism, the loss function includes:

L _total ＝λ ₁ L _darkchannel +λ ₂ L _{sub_content} +λ ₃ L _content (2)

coefficient lambda of dark channel prior loss ₁ Content loss coefficient λ of the attention module of 25 ₂ 5, content loss coefficient lambda of output ₃ 10.

Optionally, in the image blind deblurring method based on the attention mechanism, the multi-scale content loss module includes:

in the network optimization process, the mean square error is selected as a content loss function, and the square of pixel-by-pixel error between the network output image and the real clear image is calculated to obtain:

wherein the method comprises the steps of，θ _σ Is a network parameter, I _x,y ^sharp Is a true clear image, I ^Blur Is a blurred picture which is a picture of the person,

the restored images are respectively the number of images, the width of the images and the height of the images;

an optimized multi-scale mean square error function is adopted in the training process;

and (3) supervising the extraction and conversion of global features in the attention module by using the mean square error between the restored image at the decoding side and the real clear picture with the corresponding scale:

and obtaining a multi-scale content loss function after weighting the content loss of each scale, wherein the weight is obtained by the ratio of the original image to the current size.

Optionally, in the image blind deblurring method based on the attention mechanism, the dark channel prior loss module includes:

dark channel prior loss realizes clear image dark channel sparsity constraint;

for image I, the dark channel of pixel p is defined as:

where p and q are pixel positions, N (p) represents an image block centered on pixel p, I ^c (q) is the image on the C-th channel, so the dark channel describes the minimum in the image block;

in the deblurring work based on convolutional neural networks, convolution is a basic operation of feature extraction, and the convolution computation on each channel can be expressed as:

wherein x is _i,j Represents the input of the j-th row and the j-th column elements, w _m,n Represents the m-th row and n-th column weights of the convolution kernel, w _b Represents bias, a _i,j Representing the ith row and j column elements of the feature map obtained on the channel;

the dark channel prior loss module calculates Euclidean distance of a pixel dark channel in a real clear image and a restored image by adopting a mean square error:

the invention also provides an image blind deblurring system based on an attention mechanism, which comprises:

a multiscale attention network adopts an end-to-end mode to directly restore a clear image;

In the image blind deblurring method and system based on the attention mechanism, aiming at the problem that blind deblurring cannot effectively process complex dynamic scene blurring, an image deblurring algorithm based on the attention mechanism is provided, and a clear image is directly restored from end to end by using a multi-scale attention network. The network employs an asymmetric codec structure. Firstly, the encoding side adopts a residual dense network block, so that the characteristic extraction and expression capability of the network on the input image are improved. And secondly, a plurality of simple and efficient attention modules are designed on the decoding side, the preliminary restored images output by the modules realize a multi-scale structure of an image pyramid, and the output attention feature images model the relationship between remote areas from the global angle, so that images with serious blur degree can be better processed. Finally, a loss function consisting of the prior loss of the dark channel and the loss of the multi-scale content is proposed. The experimental result shows that the restored image of the GOPRO data set is superior to the comparison algorithm in evaluation index and visual effect. Compared with a multi-scale circulation network, the peak signal-to-noise ratio and the structural similarity on the GOPRO test set are respectively improved by 1.97% and 1.16%, and the edge of the restored image is sharper and the texture is clearer. The method effectively improves the recovery performance of the blurred image of the dynamic scene, optimizes the training process and shortens the recovery time.

The invention designs an asymmetric coding and decoding network based on an attention mechanism to realize blind restoration of the dynamic scene blurred image, and realizes the following beneficial effects:

(1) Inspired by the attention mechanism, an attention module is designed, the module realizes the extraction and conversion of global features, and attention feature images and restored images under different spatial resolutions are output. The attention module is integrated into the decoding side of the network, realizes optimization through end-to-end training, effectively captures global context information and simultaneously realizes a 'progressive' multi-scale structure.

(2) The network consists of an asymmetric encoder-decoder structure. In order to improve the extraction and expression capacity of the characteristics, the dense network blocks connected with the residual errors at the coding side are formed; the reconstruction features are up-sampled on the decoding side, the feature graphs of the backbone network and the attention module are spliced on the channel dimension, and the features extracted on the decoding side are adaptively calibrated.

(3) In the training process, the loss function consists of dark channel priori loss and multi-scale mean square error loss, and guides the network to generate images with sharp edges and clear textures, so that the performance of the network is further improved.

(4) The test result of the public synthetic data set shows that the algorithm and the recent method of the invention are improved in evaluation index and restoration time, and the restored image also has clearer edge structure and rich texture details. And acquiring a blurred image in the real dynamic scene for experiment, and verifying the practical value of the algorithm.

Drawings

FIG. 1 is a schematic diagram of a network architecture based on an attention mechanism according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a bottleneck module on a decoding side according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a residual dense network block architecture according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an attention module structure according to an embodiment of the present invention;

FIG. 5 is a schematic view of the attention profile and the corresponding profile processing results element by element according to an embodiment of the present invention;

FIG. 6 is a schematic view of blurred images and features at different spatial resolutions in accordance with an embodiment of the invention;

FIG. 7 is a graph showing a comparison of the recovery effect of GOPRO test data sets according to an embodiment of the present invention;

FIG. 8 is a schematic diagram illustrating texture detail comparison of a restored image and an SRN restored image according to an embodiment of the present invention;

fig. 9 is a schematic diagram showing a comparison of restoration effects of a true blurred image in accordance with an embodiment of the present invention.

Detailed Description

The image blind deblurring method and the system based on the attention mechanism provided by the invention are further described in detail below with reference to the accompanying drawings and specific embodiments. Advantages and features of the invention will become more apparent from the following description and from the claims. It should be noted that the drawings are in a very simplified form and are all to a non-precise scale, merely for convenience and clarity in aiding in the description of embodiments of the invention.

The invention provides an image blind deblurring method and system based on an attention mechanism, which aims to solve the problem that the existing blind deblurring method based on a convolutional neural network cannot efficiently process complex dynamic scene blurring.

In order to achieve the above-mentioned idea, the present invention provides an image blind deblurring method and system based on an attention mechanism, wherein the image blind deblurring method based on an attention mechanism includes: the multiscale attention network adopts an end-to-end mode to directly restore clear images; the multi-scale attention network adopts an asymmetric coding and decoding structure, and a coding side of the coding and decoding structure adopts a residual dense network block so as to finish the characteristic extraction and expression of the multi-scale attention network on an input image; a plurality of attention modules are arranged on the decoding side of the coding and decoding structure, the attention modules output preliminary restored images, and the preliminary restored images form an image pyramid type multi-scale structure; the attention module also outputs an attention profile that models relationships between remote regions from a global perspective to process blurred images; the dark channel a priori loss and the multi-scale content loss constitute a loss function that is used to reverse optimize the network.

Example 1

The embodiment provides an image blind deblurring method based on an attention mechanism and an asymmetric coding-decoding network structure. The structure is simple and compact, and the device is widely applied to various computer vision tasks. Recently, in the field of image restoration, a plurality of end-to-end methods based on learning are developed around the method, a coding and decoding network combined with jump connection is used in a video deblurring task, a coding and decoding network based on residual blocks, which is proposed in the field of single image dynamic scene deblurring, is a network with smaller parameter quantity and better performance in the current deblurring method based on deep learning, and a multi-scale cyclic structure is adopted for gradually estimating clear images from coarse to fine.

The encoding side and decoding side in the codec network are different in task, the encoding side is configured to blur image I from _Blur The extracted features are downsampled, and the decoding side performs upsampling on the extracted features to reconstruct each pixel value in the image prediction clear image. The present invention thus designs an asymmetric network structure, called a multiscale attention network (Multi-Scale Attention Network, MSANet), as shown in the figure1.

The coding side consists of four convolution modules, and each scale comprises a convolution layer with a step length of 2 and three intensive network blocks connected with residual errors; the decoding side consists of three spatial attention fusion modules and a reconstruction module, the attention modules and the reconstruction module complement each other, and the importance of the self-adaptive calibration feature map is beneficial to reconstructing information. Each reconstruction module comprises a transposed convolution layer and three bottleneck residual blocks, the structure of each bottleneck residual block is shown in fig. 2, each bottleneck residual block consists of two convolution layers, the channel dimension is increased through a 1×1 convolution layer, then the target output is obtained through a 3×3 convolution kernel, and a nonlinear activation function is arranged behind each convolution layer, and the structure is shown in fig. 2.

And similar to U-Net, jump connection is added between feature graphs with the same size on two sides of the encoding and decoding, and fusion of low-layer features and high-layer features is realized through addition, so that the deblurring performance is improved. In addition, normalization (Normalization) operations are omitted from the network in order to avoid artifacts and checkerboard effects in the restored image.

From the physical phenomenon, the dynamic scene blurring is actually that the delayed information of the same scene target within a certain distance in the exposure time is dispersed on surrounding pixel points, and finally a blurred image is formed. Therefore, the fuzzy image contains the information of the clear image, the similarity between the fuzzy image and the clear image is higher, the invention utilizes the dense network blocks connected by residual errors to improve the characteristic extraction and expression capacity of the coding side close to the input image, and the structure is shown in figure 3.

Firstly, the identity mapping between the input and the output realizes residual error learning, and the identity mapping directly connects the input and the output with high similarity in the image restoration work, thereby reducing the training difficulty.

And secondly, the output of the last module is transmitted to the input of each convolution layer in the current module in the residual dense network module, so that the continuous transmission of the characteristics is realized. The output of each convolution layer is also passed on to the input of the subsequent layers in a densely connected manner. The connection is achieved by splicing in the channel dimension.

And finally, carrying out nonlinear transformation on the spliced dense features through a convolution layer to fuse the multi-channel features. To further improve performance, residual scaling is also used in local residual learning of dense network blocks.

The residual dense network block integrates the residual network block and the dense network block, so that cross-layer recycling of the features is realized through multiple dense connections, learning and expression capacity of the features are improved, training difficulty is reduced by residual learning, and meanwhile, the problems of gradient disappearance and the like in a deep network are avoided.

The importance of feature information of a blurred image captured in a dynamic scene is different at different spatial positions, such as a background (such as sky, building, etc.) and a foreground object (such as pedestrians and vehicles) in the image. The existing deblurring convolutional neural network adopts a mode of deepening the network layer number by adopting a large-size convolutional kernel, so that the receptive field size is locally increased, and the relation between fuzzy areas cannot be utilized from the global angle. Therefore, the attention mechanism is taken as a design trend of the network in recent years, can simulate the characteristic that human attention deflects computing resources to important characteristics, has excellent performance in many tasks of natural language processing and computer vision, particularly effectively enhances high-frequency characteristics and suppresses noise of a smooth area by introducing different attention modules in super-resolution work, and achieves better effects than before.

Therefore, the encoding and decoding network integrates a spatial attention module at the decoding side, and realizes the estimation of clear images and the extraction of global context characteristics under different scales. The modular structure is shown in fig. 4;

first, the attention module performs nonlinear transformation on an input feature of 1×1 to generate a feature map with a channel number of 64. And then compressing the channel number to 3 through a 5×5 predictive convolution layer, and outputting a clear image under the current resolution. The extraction of features and the output of restored images at that scale are realized from a global perspective.

And secondly, up-sampling the output image through nearest neighbor interpolation, and obtaining a global feature map through nonlinear transformation. The feature is a shallow layer feature extracted from a restored image, and is spliced with a deep layer feature obtained by a decoding side reconstruction module in the channel dimension, so that the importance of the feature in different spatial positions is adjusted.

Usually, the attention feature map is applied to the original feature map by dot multiplication or addition, but the deblurring work and the task of an advanced computer are different, so that more information in the image is recovered, and element-by-element adjustment can destroy good information in the original feature map, so that the information is lost.

As can be seen from fig. 5 (b), there are abnormal pixel values in the smooth area of low frequency (e.g., clothes) and the detail of high frequency (e.g., ground texture), and the restored image quality is poor. The invention adopts splicing operation to splice the sampling feature image on the decoding side and the attention module output feature image on the channel dimension, and fuses the advanced semantic features in the main network and the shallow image context information output by the attention module. More information is reserved, and the importance of the space position of the feature map is adjusted.

In order to deal with the different degree of blurring, the recent deblurring network focuses on acquiring rich context information by using a large receptive field, and gradually estimating a potential clear image by combining a multi-scale strategy, so that the network has different receptive fields. However, simply stacking the same modules has limited performance improvement over the network while taking into account the relationship between model parameters and dimensions.

As shown in fig. 6, the original image is represented at the mark (1), 2 times of downsampling is represented at the mark (2), 4 times of downsampling is represented at the mark (3), and the image containing the blurred edge is in the downsampling process, wherein the blurred edge is visually reduced along with the reduction of the scale and the blurring degree, the corresponding characteristic is changed, and the clear edge is hardly changed in the downsampling process. Therefore, in a multi-scale network structure, the sharing of the characteristic extraction module parameters at the encoding side cannot well distinguish clear and fuzzy characteristics in images at different resolutions, and the module parameters at different resolutions can be shared in the process of reconstructing the clear images at the decoding side.

For this reason, the first convolution layer in the attention module of the present invention has the same number of output channels as the number of characteristic channels before the final image is obtained in the last layer of the network, which is 64. Because the prediction convolution layer and the last layer of the network in the module are reconstructed into a clear image, parameters are shared on the whole decoding side. The method has the advantages that the number of trainable parameters in a network can be remarkably reduced, the complexity of a model is reduced, meanwhile, under the condition that the solved targets are consistent, the weights of the shared convolution layers can transmit useful information under different scales, and the cyclic training is equivalent to data enhancement on the scales, so that clear images can be recovered.

In order to optimize the network model, the loss function used by the present invention consists of content loss and dark channel prior loss,

in the experiment, the coefficient lambda of the prior loss of the dark channel ₁ Content loss coefficient λ of the attention module of 25 ₂ 5, content loss coefficient lambda of output ₃ 10.

In general, the mean square error is selected as a content loss function in the network optimization process, and is obtained by calculating the square of pixel-by-pixel error between the network output image and the real clear image,

wherein θ _σ Is a network parameter, I _x,y ^sharp Is a true clear image, I ^Blur Is a blurred picture which is a picture of the person,

is the restored image. r, W, H are the number of pictures, picture width and height, respectively.

The restored image is the final output of the network, but the multi-scale structure network corresponds to a single loss function, and the reliability of the intermediate restored result cannot be evaluated. Therefore, the invention adopts an optimized multi-scale mean square error function in the training process. The extraction and conversion of global features in the attention module are supervised by means of the mean square error between the restored image at the decoding side and the real clear picture of the corresponding scale

It can be seen that the content loss of each scale is weighted to obtain a multi-scale content loss function, the weight is obtained by the ratio of the original image to the current size, and the low resolution contains fewer pixels, so that the corresponding weight is large.

Although the pixel-level error function is excellent in terms of the objective evaluation index Mean Square Error (MSE) and peak signal to noise ratio (PSNR), the resulting image always tends to produce a smooth image, and the blurred edges remain insufficiently sharp after recovery. To further improve the quality of the restored image, the sparsity of dark channels in the clear image is proved by theory and style, the clear image and the blurred image can be effectively distinguished, and L is proposed ₀ Constraint terms of norms are used in the non-uniform motion blur estimation problem. The invention introduces the prior loss of the dark channel to realize the sparsity constraint of the dark channel of the clear image.

For image I, the dark channel of pixel p is defined as,

where p and q are pixel positions, N (p) represents an image block centered on pixel p, I ^c (q) is the image on the C-th channel, so the dark channel describes the minimum in the image block.

In the deblurring operation based on convolutional neural networks, the convolution is the basic operation of feature extraction, and the convolution computation on each channel can be expressed as

Wherein x is _i,j Represents the input of the j-th row and the j-th column elements, w _m,n Represents the m-th row and n-th column weights of the convolution kernel, w _b Represents bias, a _i,j Representing the result obtained on the channelThe ith row and j column elements of the feature map. It can be seen that each pixel in the output feature map is obtained by inputting the corresponding pixel and neighborhood weighting, which means that the dark channel pixel value will increase. Therefore, the introduction of dark channel sparse priors can effectively guide the network to output images with clear edges and textures.

Compared with the prior art, adopts L ₀ The norm counts the number of non-zero elements of the dark channel in the image, and although the image can be output with good constraints, due to its non-impossibility, it cannot be back-propagated as an objective function (back-propagation optimization cannot be performed as an objective function). Therefore, the prior loss of the dark channel designed by the invention adopts the mean square error, namely, the Euclidean distance of the pixel dark channel in the real clear image and the restored image is calculated.

Deep learning based dynamic scene motion deblurring requires a large amount of paired training data, and early datasets were synthesized by researchers through clear images and different blur kernels. Such a simplified blur source generates images that do not simulate complex motion blur in a real dynamic scene. Accordingly, researchers have proposed the average composition of multiple frames of clear images using high-speed cameras that closely simulate the non-uniform blur caused by the shake of imaging equipment and the movement of the target object.

The GOPRO data set proposed by 2017 Nah and the like is generated by using 7 to 15 clear frames in the GoPro camera on average, and the middle frame is a corresponding real clear image. Together, 3214 pairs of sharp-blurred image pairs, 720 x 1280 resolution, 2103 pairs were used for training, and the remaining 1111 pairs were test datasets.

The algorithm of the present invention was implemented using Python based on the TensorFlow framework, trained and tested on a workstation equipped with Nvidia GeForce GTX 1080Ti GPU and a Mac pro of 2.7GHz Inter Core i5 CPU, respectively.

The batch size in the training process is 4, and the initial value of the learning rate is set to be 1 multiplied by 10 ^-4 Then take on coefficientsFor an exponential decay of 0.3, the optimization algorithm is Adam algorithm, and the training period is 2000, which is enough to make the network model converge. The step length of a downsampling convolution layer in the coding and decoding network is 2, and the core size is 5; the corresponding upsampling transpose convolutional layer step size is also 2 and the kernel size is 4. The method comprises the steps that four convolution layers with the core of 3 and a Leaky ReLu nonlinear activation function alternately form in a residual error dense network block at the encoding side, and the core size of an output convolution layer for fusing features is 1 multiplied by 1; the two convolution layer core sizes of the bottleneck residual block at the decoding side are 1 and 3 respectively. Unlike most previous uses of large-size convolution kernels, most convolution kernels in the present invention have a size of 3.

The input image is randomly cropped to 256×256 during training, and randomly flipped up and down or rotated by 90 ° to prevent overfitting. The preliminary restored image sizes of the multi-scale attention module are 32×32, 64×64, 128×128, respectively, and the final restored image is consistent with the input. The input and output images were all 720 x 1280 in initial resolution during testing.

The comparison methods in the experiment include a traditional heterogeneous blurred image blind restoration algorithm proposed by Whyte et al, a method (From Motion Blur to Motion Flow, MBMF) for estimating a blur kernel by utilizing CNN in combination with Deep learning, and an end-to-end Deep learning-based method, comprising a Deep Multi-Scale convolutional neural network (Deep Multi-Scale Convolutional Neural Network, MS-CNN), a Deblurgan for generating an countermeasure network based on conditions, a Scale-circular network (Scale-Recurrent Network, SRN) and a spatially-varying circular neural network (Spatially Variant Recurrent Neural Networks, SVRNN).

The present invention selects the commonly used peak signal-to-noise ratio (peak signal to noise ratio, PSNR), structural similarity (structure similarity index measure, SSIM) and recovery time as quantitative evaluation indexes, and Table 1 shows comparison results.

Table 1 comparison of GOPRO test set with evaluation index of existing algorithm

Method	PSNR/dB	SSIM	Recovery time t/s
				Whyte et al ^[31] Method	24.65	0.7566	700
MBMF	26.15	0.8050	210
				MS-CNN	28.66	0.8554	6
DeblurGAN	26.74	0.8704	0.207
				SVRNN	29.33	0.8808	0.304
SRN	30.41	0.9040	0.358
				The algorithm of the invention	31.02	0.9145	0.296

The traditional method can not simulate the space change blurring in the dynamic scene, so that the Whyte and the like and the MBMF do not perform well. MS-CNN and Deblu-GAN significantly improved the deblurring effect. However, the MS-CNN has a complex structure and long restoration time, and the DeblurGAN has a simple network structure, but the output image quality is to be improved. The improved models SRN and SVRNN further improve deblurring performance, where SRN performs best in existing methods and SVRNN balances recovery quality and time overhead better.

The algorithm of the invention obtains the best result in objective evaluation indexes, improves PSNR by 0.61dB compared with an SRN method, and shortens recovery time compared with SVRNN.

The deblurring visualization effect on the GOPRO test set is shown in FIG. 7, (1) shows motion blur and (2) shows camera shake blur, and it can be seen that the conventional method is not applicable to a non-uniformly blurred dynamic scene, and the restored image still has blur. The condition-based generation of the DeblurGAN against the network removes spatially varying blurring in the image, further improving the efficiency of restoring the image, but the presence of significant artifacts in the image results in poor visual effects. MS-CNN removes the blurring of different scales to a certain extent, but still has the problems of insufficient sharpness of the edge of the output image, missing texture details and the like.

Compared with other methods, in the algorithm of the invention in fig. 7 (h), the restored high-speed running automobile in the motion blurred image (1) has sharp outline and no obvious blurred edge, and the restored window texture of the camera shake in (2) has clear detail, so that the nonuniform blurring is effectively removed.

In addition, as shown in fig. 8, different types of blurred images selected from the gop ro test set are respectively motion blur (1) and motion blur (2) caused by movement of a shooting object under different depths of field, camera shake blur (3) and camera shake blur (4) when shooting a still car, and blur degree (5) which is serious. Compared with the SRN, the number and the letter in the license plate restored by the SRN are deformed, the road zebra stripes are blurred, and certain manual restoration marks exist on the face details. The high-frequency details in the algorithm are effectively restored, the digital structure is clear, the zebra stripes are sharp in edge, the face details are more reasonable, and the method is closer to the real clear image in the right image.

The invention collects the blurred image in the real dynamic scene for experiments, and the tests are carried out on Macbook pro of 2.7GHz Inter Core i5 CPU.

First, the average recovery time of the method of the present invention is reduced by 10s compared to SRN. Secondly, in the forward reasoning process of obtaining the restored image from the input initial resolution picture model in the test process, the number of times (Floating point operations per second, FLPs) of model parameter operation is required to be obviously less than that of SRN, and the calculation cost is low.

Table 2 comparison of real pictures with SRN method recovery time

Method	Average recovery time/s	Floating point count (FLOPs)
			SRN ^[16]	29.9628	2,349G
The method of the invention	18.1362	1,686G

In addition, as shown in fig. 9, the visualized result is a blurred image, an SRN algorithm restored image and an algorithm result of the present invention in this order from left to right. It can be seen that the wall surface in (1) and the pattern texture in (2) are clearer than the blurred image, and the edges of chairs and books in the restored image with serious blurring degree (3) are restored.

In order to remove complex dynamic scene blurring in a single picture, the invention provides a multi-scale attention network. The network is based on an asymmetric coding-decoding structure, and the coding side utilizes a residual dense network block to improve the extraction and expression capacity of the characteristics; and secondly, the attention module is embedded under different spatial resolutions of the decoding side, the generated preliminary restored image realizes that the pyramid-type multi-scale structure of the image increases the receptive field of the network, and the output attention characteristic diagram optimizes the capability of the decoding side for restoring high-frequency details. In addition, the final output convolution layer of the network is shared with the parameters of the prediction convolution layer in the attention module, so that the complexity of the network is reduced. The dark channel prior loss function is designed in the loss function to guide the network to reconstruct an image with clearer details, and the multi-scale content loss function realizes supervised learning under different scales.

Experimental results show that the multi-scale attention network provided by the invention can effectively solve the problem of complex dynamic scene blurring. Compared with other algorithms, the objective evaluation index (PSNR, SSIM) is better, the edge of the restored image is sharper, the texture is clearer, and the restoration effect of the true blurred image is better. The method has better performance in recovery precision and efficiency.

In summary, the above embodiments describe in detail different configurations of the image blind deblurring method and system based on the attention mechanism, and of course, the present invention includes, but is not limited to, the configurations listed in the above implementation, and any contents of transformation based on the configurations provided in the above embodiments fall within the scope of protection of the present invention. One skilled in the art can recognize that the above embodiments are illustrative.

< example two >

The embodiment provides an attention mechanism-based image blind deblurring system, which comprises: a multiscale attention network adopts an end-to-end mode to directly restore a clear image; the multi-scale attention network adopts an asymmetric coding and decoding structure, and a coding side of the coding and decoding structure adopts a residual dense network block so as to finish the characteristic extraction and expression of the multi-scale attention network on an input image; a plurality of attention modules are arranged on the decoding side of the coding and decoding structure, the attention modules output preliminary restored images, and the preliminary restored images form an image pyramid type multi-scale structure; the attention module also outputs an attention profile that models relationships between remote regions from a global perspective to process blurred images; the dark channel a priori loss and the multi-scale content loss constitute a loss function that is used to reverse optimize the network.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, the description is relatively simple because of corresponding to the method disclosed in the embodiment, and the relevant points refer to the description of the method section.

The above description is only illustrative of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention, and any alterations and modifications made by those skilled in the art based on the above disclosure shall fall within the scope of the appended claims.

Claims

1. An attention mechanism-based image blind deblurring method, which is characterized by comprising the following steps of:

the prior loss of the dark channel and the loss of the multi-scale content form a loss function, and the loss function is used for reversely optimizing the network;

the loss function includes:

L _total ＝λ ₁ L _darkchannel +λ ₂ L _{sub_content} +λ ₃ L _content (1)

coefficient lambda of dark channel prior loss ₁ Content loss coefficient λ of the attention module of 25 ₂ 5, content loss coefficient lambda of output ₃ 10;

the multi-scale content loss module includes:

obtaining a multi-scale content loss function after weighting the content loss of each scale, wherein the weight is obtained by the ratio of the original image to the current size;

the dark channel prior loss module comprises:

dark channel prior loss realizes clear image dark channel sparsity constraint;

for image I, the dark channel of pixel p is defined as:

2. the attention-based image blind deblurring method of claim 1 wherein the asymmetric codec structure comprises:

3. The attention-based image blind deblurring method of claim 1 wherein the residual dense network block comprises:

residual scaling is used in local residual learning of dense network blocks;

4. The attention-based image blind deblurring method of claim 1 wherein the attention module comprises:

5. An attention-based image blind deblurring system for implementing the method of claim 1, wherein the attention-based image blind deblurring system comprises: