CN111028177B

CN111028177B - Edge-based deep learning image motion blur removing method

Info

Publication number: CN111028177B
Application number: CN201911275632.8A
Authority: CN
Inventors: 姚剑; 蒋佳芹; 李俐俐; 龚烨
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2019-12-12
Filing date: 2019-12-12
Publication date: 2023-07-21
Anticipated expiration: 2039-12-12
Also published as: CN111028177A

Abstract

The invention relates to an image restoration technology, in particular to a deep learning image motion blur removal method based on edges, which comprises the steps of utilizing a trained HED network to extract edges from a blurred image, and then using a convolution layer to extract edge characteristic information for guiding a motion blur removal process; the deblurring backbone network extracts multi-scale feature information from the blurred image, the spatial feature transformation layer is used for integrating the image features and the edge features on each scale, and the decoding part gradually recovers the potential clear image from the deepest image features; the fuzzy-clear image pair is used as a training sample set, a total loss function is defined by the sum of a mean square error loss function and a perception loss function, and the deblurring backbone network is trained by using the total loss function until the optimal precision is converged; and inputting the motion blurred image into a trained deblurring backbone network to obtain a deblurred result. The method realizes the effective integration of the image features and the edge features, and has remarkable deblurring effect.

Description

Edge-based deep learning image motion blur removing method

Technical Field

The invention belongs to the technical field of image restoration, and particularly relates to a deep learning image motion blur removing method based on edges.

Background

In the photographing process, the imaging device and the scene object generate motion blur due to relative motion, and the obtained image loses important detail information. The process of recovering a potentially sharp image from a blurred image that has been degraded is called deblurring. The motion blur removal can recover clear edges from blurred images caused by camera shake, vehicles which fly in a scene and the like, not only can improve visual perception quality, but also is beneficial to subsequent high-level application such as character recognition, target detection and the like, so the motion blur removal has higher research value and application prospect.

Existing image deblurring algorithms can be generally divided into conventional deblurring methods based on energy optimization and deblurring methods based on deep learning. In the conventional deblurring method based on energy optimization, the method can be further subdivided into global consistent deblurring and global non-consistent deblurring.

In conventional approaches, a motion blurred image may be modeled as a convolution of a blur kernel with a sharp image, followed by the addition of additive noise. The deblurring method based on energy optimization comprises two stages of fuzzy core estimation and image deconvolution, wherein a degradation model of a motion fuzzy image is analyzed in the fuzzy core estimation stage, an energy equation is established by combining prior statistical knowledge of a fuzzy core and a clear image, and a fuzzy core estimation value is obtained by solving the minimum value of the equation; after the fuzzy core is acquired, the degradation model and the priori knowledge of the clear image are combined for modeling and solving to obtain the clear image estimated value. Global consistent blur is typically caused by in-plane panning that occurs when a camera shoots a static scene, where blur kernels are shared in full view, an image pyramid can be built, and blur kernels can be restored from coarse to fine. The global inconsistent blur causes are complex, including camera rotation, dynamic targets in static scenes, depth of field changes, etc., and it is generally considered that each small region in the blurred image shares a blur kernel, and a linear blur kernel library is usually built to fit the blur kernels of the small regions. The global consistent deblurring effect is good, but the assumption of consistent blurring is over-idealized; the incongruous blurring with complex causes is closer to the real world, and has more practical application value for the research of the incongruous blurring, but modeling and solving are complex under the framework of the traditional method, and the effect is not very satisfactory.

In recent years, the deep neural network has strong learning ability in the field of computer vision, and has a lot of achievements in the field of image motion blur removal. The deep learning method is based on data driving, which does not strictly distinguish between globally consistent deblurring and globally non-consistent deblurring. The method is characterized in that the estimation value of the fuzzy kernel is obtained by utilizing the strong characteristic expression capability of the network to learn, and then the traditional method is used for deconvolution of the image, but the deblurring effect is not improved greatly. The subsequent end-to-end deblurring network framework learns a mapping from a blurred image to a clear image, and compared with the traditional energy optimization method, the existing non-uniform deblurring method based on deep learning has greatly improved model establishment, model solving and deblurring effects, but the problem that deblurring is not thorough still exists at the edge of the image.

Disclosure of Invention

The invention aims to provide a deep learning image motion blur removing method taking edge information as auxiliary information.

In order to achieve the above purpose, the invention adopts the following technical scheme: an edge-based deep learning image motion blur removing method comprises the following steps:

step 1, extracting edges from a blurred image by using a trained HED network, and then extracting edge characteristic information for guiding a motion deblurring process by using a convolution layer;

step 2, the deblurring backbone network extracts multi-scale feature information from the blurred image, the spatial feature transformation layer is used for integrating the image features and the edge features on each scale, and the decoding part gradually recovers the potential clear image from the deepest image features;

step 3, taking the fuzzy-clear image pair as a training sample set, defining the sum of a mean square error loss function and a perception loss function as a total loss function, and training the deblurring main network by using the total loss function until the deblurring main network converges to the optimal precision;

and step 4, inputting the motion blurred image into the deblurred backbone network trained in the step 3 to obtain a deblurred result.

In the above-mentioned edge-based deep learning image motion blur removing method, the obtaining of the edge feature information in step 1 includes the following sub-steps:

step 1.1, acquiring a blurred image edge map; inputting a color blurred image with the size W multiplied by H multiplied by 3 into an HED network loaded with pre-training weights to obtain an edge map with the size W multiplied by H multiplied by 1, wherein W is the width of an original map, and H is the height of the original map;

step 1.2, excavating deep characteristic information of an edge map; taking the edge map output in the step 1.1 as input, extracting high-level edge characteristic information from the fuzzy edge through a series of convolution and nonlinear activation operations: the convolution kernel size of the first convolution is 1 multiplied by 1, and the convolution kernel size of the subsequent four convolutions is 3 multiplied by 3, so that the spatial resolution of the image is kept unchanged in the whole process; nonlinear activation adopts a hole correction linear unit, and finally high-dimensional edge characteristic information with the size of W multiplied by H multiplied by 128 is output.

In the above-mentioned edge-based deep learning image motion blur removing method, the implementation of step 2 includes the following sub-steps:

step 2.1, extracting the characteristics of the blurred image; the method comprises the steps of inputting a color blurred image with the size W multiplied by H multiplied by 3 into a convolution layer formed by convolution and nonlinear activation, wherein the encoding stage can be divided into 4 processing blocks, the characteristic diagram of each block is respectively of the size kW multiplied by kH multiplied by l, W is the width of an original diagram, and H is the height of the original diagram; k=1, 0.5,0.25, l=32/k;

step 2.2, integrating the fuzzy image features and the edge information in different scales; integrating the edge characteristics output in the step 1.2 and the image characteristics of the current scale obtained in the step 2.1 by adopting a spatial characteristic transformation residual block, wherein the spatial characteristic transformation residual block comprises a spatial characteristic transformation layer, a convolution layer, a spatial characteristic transformation layer and a convolution layer;

step 2.3, further mining the fuzzy image characteristics; the method comprises the steps of combining hole convolutions with different hole rates, and increasing receptive fields so as to further mine characteristic information, wherein the characteristic information comprises 2 serial-hole convolutions residual blocks and 1 parallel-hole convolutions residual blocks;

and 2.4, gradually reconstructing an image after deblurring from the deep image features.

In the above-mentioned edge-based deep learning image motion blur removing method, the implementation of step 3 includes the following sub-steps:

step 3.1, respectively defining the mean square error loss function L _mse And a perceptual loss function L _p ：

Wherein I is ^c And I ^d Respectively a real clear image and a deblurred image; m, n represent the horizontal and vertical index of the image; phi (phi) _i,j VGG19 feature map loaded with weights pre-trained on ImageNet, which is the jth convolution before the ith max pooling layer, W _i,j And H _i,j Is the size of the feature map, i=3, j=3 is usually set;

step 3.2, defining a total loss function:

L _total ＝L _mse +λ×L _p ，

wherein λ is the weight of the perceptual loss function, set to 0.01;

using the total loss function L _total The network is trained until the entire network converges to an optimal accuracy.

The invention has the beneficial effects that: 1. and the characteristic learning and generalization capability is strong. By using a deep learning method based on a convolutional neural network, an end-to-end model is trained, so that a motion blurred image is input, and a clear image with the same resolution as the input image can be obtained. The process does not need to give manually designed features in advance, and the network itself can learn the required features from training data and reasonably utilize the features, so that the method has better generalization capability and can also have stable performance even under a severe fuzzy scene.

2. The network structure is simple and easy to train. The deblurring backbone network is single-scale, no additional sub-networks (such as edge extraction networks) need to be trained, only the existing edge extraction networks are used for acquiring the edge, and the guiding information provided by the edge is effectively utilized through the edge feature and image feature integration module. Therefore, the network designed by the invention has simple structure and is easy to train.

3. The deblurring precision is improved, and the effect at the edge is obvious. By introducing edge information and effectively integrating the edge information and the image information on the feature level through the spatial feature transformation layer, the deblurring effect is obviously improved, especially at the edge.

Drawings

FIG. 1 is a flow chart of a method for edge deep learning image motion blur removal according to one embodiment of the present invention;

FIG. 2 (a) is a motion blur removal backbone network in the architecture of an edge-based deep learning motion blur removal network according to one embodiment of the present invention;

FIG. 2 (b) is an edge-based feature integration module in the architecture of an edge-based deep learning de-motion blur network according to one embodiment of the present invention;

FIG. 2 (c) is a block of parallel-hole convolution residuals in an architecture of an edge-based deep learning de-motion blur network according to one embodiment of the invention;

FIG. 3 (a) is a blurred image of an experiment on a GOPRO test dataset in accordance with one embodiment of the present invention;

FIG. 3 (b) is a deblurred image of an experiment on a GOPRO test dataset according to one embodiment of the invention;

FIG. 4 (a) is a first blurred image on the Stereo Blur Dataset test data set in accordance with one embodiment of the invention;

FIG. 4 (b) is a first deblurred image on a Stereo Blur Dataset test dataset according to an embodiment of the invention;

FIG. 4 (c) is a second blurred image on the Stereo Blur Dataset test data set in accordance with one embodiment of the invention;

fig. 4 (d) is a second deblurred image on the Stereo Blur Dataset test dataset according to an embodiment of the invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Considering the performance deficiency of the end-to-end non-uniform deblurring method based on deep learning at the edge and the feasibility of reducing the solution space by introducing image priori knowledge to obtain an effective deblurring result by the traditional deblurring method, the embodiment provides a deep learning image deblurring method taking edge information as auxiliary information, so that the deblurring effect at the edge is further improved. Compared with a multi-scale deblurring network architecture, the embodiment only uses a single-scale network architecture, so that the complexity and the parameter quantity of the network are greatly reduced. The present embodiment introduces edge information to make the deblurring process more focused on edge regions than most single-scale deblurring network architectures. In contrast to existing single-scale deblurring network architectures that take into account edge information, the present embodiment directly uses existing edge extraction networks to derive edges and integrates image and edge information at the feature level with novel spatial feature transformations.

The embodiment is realized by the following technical scheme, namely, an edge-based deep learning image motion blur removing method, the whole network structure is shown in fig. 2 (a), 2 (b) and 2 (c), and the method mainly comprises two modules, namely, an edge information preprocessing and motion blur removing backbone network, and if not specifically described, the convolution kernels used in the embodiment are all 3×3 in size. The specific steps of motion blur removal are as follows:

step one: extracting edges from the blurred image using a trained HED (Holistcally-Nested Edge Detection) network, and then extracting edge feature information guiding a motion deblurring process using a convolution layer;

step two: the image deblurring backbone network of the encoding-decoding structure extracts multi-scale feature information from the blurred image, the spatial feature transformation layer is used for integrating the image features and the edge features on each scale, and the decoding part gradually recovers potential clear images from the deepest image features;

step three: the fuzzy-clear image pair is used as a training sample set, the sum of the mean square error loss function and the perception loss function is defined as a total loss function, and the deblurring main network is trained by using the total loss function until the deblurring main network converges to the optimal precision;

step four: and inputting the motion blurred image into a trained network to obtain a deblurred result.

In specific implementation, as shown in fig. 2 (a), fig. 2 (b) and fig. 2 (c), the edge-based deep learning image deblurring network frame includes the following steps, as shown in fig. 1:

s1, acquiring edge characteristic information, which comprises the following substeps:

s1.1, acquiring a blurred image edge map. The HED (Holisically-Nested Edge Detection) network is a neural network framework for extracting edges of images, which can output an edge probability map of the same resolution from the input color image once training is completed, with pixel values between 0 and 1. And inputting the color blurred image with the size W multiplied by H multiplied by 3 into the HED network loaded with the pre-training weight to obtain a W multiplied by H multiplied by 1 edge map, wherein W represents the width of the original map and H represents the height of the original map.

S1.2, mining deep characteristic information of the edge map. Taking the edge output in the step S1.1 as an input, extracting high-level edge characteristic information from the fuzzy edge through a series of convolution and nonlinear activation operations: the convolution kernel size of the first convolution is 1 multiplied by 1, and the convolution kernel size of the subsequent four convolutions is 3 multiplied by 3, so that the spatial resolution of the image is kept unchanged in the whole process; nonlinear activation adopts a hole correction linear unit (leakyReLU) and finally outputs high-dimensional edge characteristic information with the size of w×h×128.

S2, the deblurring backbone network realizes end-to-end deblurring of the input blurred image, and as shown in FIG. 2 (a), the deblurring method comprises the following sub-steps:

s2.1, extracting the characteristics of the blurred image. The color blurred image of size w×h×3 is input to a convolution layer composed of convolution and nonlinear activation, and the encoding stage can be divided into 4 processing blocks, each block having a feature map size kw×kh×l, (k=1, 0.5,0.25, l=32/k), respectively.

S2.2, integrating the image features and the edge information in different scales. And integrating the edge feature obtained in the step S1.2 and the image feature of the current scale obtained in the step S2.1 by adopting a spatial feature transformation residual block, wherein the spatial feature transformation residual block comprises a spatial feature transformation layer, a convolution layer, a spatial feature transformation layer and a convolution layer. As shown in fig. 2 (b).

Taking the 1 st time of spatial feature transformation residual operation on the original image resolution scale as an example for explanation, wherein k=1, the image feature size is w×h×32, (1) firstly, a spatial feature transformation layer is used for respectively obtaining gain parameters gamma and deviation parameters beta for adjusting the image feature from edge features through 2 times of convolution layer operation, the spatial resolution of the two parameters is consistent with the channel number and the current image feature size, and each convolution layer comprises 2 times of convolutions: the 1 st convolution input is W×H×32, and W×H×32 is output at the original resolution; the 2 nd convolution input is w×h×32, and w×h×l (k=1, l=32/k) is output. Then the gain parameter gamma and the image feature are multiplied pixel by pixel, and the deviation parameter beta and the image feature output in the last step are added pixel by pixel to obtain the adjusted image feature, so that the deblurring effect at the edge is more concerned. (2) A convolution operation is performed with a constant number of channels and spatial resolution. (3) As in (1), new gain parameters γ and deviation parameters β are learned to further adjust the image characteristics. (4) And (3) performing convolution operation with unchanged channel number and spatial resolution, as in (2). And adding the input current scale image features and the features after the spatial feature transformation residual operation, thus completing the integration operation of the image features and the edge information at one time. Performing integration operation of image features and edge information for 3 times at original image resolution (k=1 at this time); then downsampling the image features to 1/2 (k=0.5 at this time) by a convolution operation with a step length of 2, wherein the size is kW×kH×64, reducing the edge features to 1/2 of the original image by a convolution layer with a step length of 1 and a convolution layer with a step length of 2, and performing 3 times of image feature and edge information integration operation on the resolution; the image features are then downsampled again by a factor of 1/2 (where k=0.25) by a convolution operation of step size 2, with a size kW x kH x 128, and the edge features are reduced to 1/4 of the original size by a convolution layer of step size 2 twice, again at this resolution, 3 image feature and edge information integration operations are performed.

S2.3, further mining the blurred image features. The hole convolutions with different hole rates are combined to increase the receptive field and thereby mine more characteristic information, including 2 serial-hole convolutions residual blocks and 1 parallel-hole convolutions residual blocks. The standard residual error module adds the result of the convolution-nonlinear activation-convolution operation of the input and the input to obtain the output, and the serial-hole convolution residual error block and the parallel-hole convolution are all changes on the standard residual error: the 2 serial-hole convolution residual blocks change the original hole rate 1 of the 1 st convolution to 2 and 3 respectively, and the input and output are kW×kH×128 (k=0.25). Parallel-hole convolution as shown in fig. 2 (c), the input is subjected to a hole convolution operation with hole ratios of 1,2,3,4 in a parallel manner, and the input and output are kw×kh×128 (k=0.25); the results of the 4 convolution operations are then combined together by concatenation in the channel dimension, where the image feature size is kW x kH x 512 (k=0.25); then the number of image feature channels is changed to 128 by a convolution operation with a void ratio of 1, and the input and output are kw×kh×512 (k=0.25) and kw×kh×128 (k=0.25), respectively; and finally, adding the result of the operation to the input.

S2.4, gradually reconstructing an image after deblurring from the deep image features. Firstly, processing image features by adopting the operation of a residual error module of 1 time convolution and 3 times on the lowest resolution scale kW multiplied by kH multiplied by 128 (k=0.25); then upsampling the image features to kW x kH x 64 (k=0.5) by transposed convolution followed by the same operation of 1 convolution plus 3 residual modules, except that the input of the convolution is the value of the current scale image feature and the corresponding scale image feature of the encoded portion in series in the channel dimension; likewise, the image features are then up-sampled to kw×kh×32 (k=1) by a transposed convolution, followed by 1 convolution plus 3 operations of the residual block, the input of the convolution still being the value of the concatenation of the image features of the current scale and the image features of the corresponding scale of the encoded portion in the channel dimension; then changing the image feature of kw×kh×32 (k=1) to a color image of w×h×3 by a convolution operation, which represents the difference between the blurred image and the clear image learned by the network; and finally, adding the input and the difference map to obtain a final deblurring image.

S3, defining a loss function, training the network based on a training sample set, and comprising the following substeps:

s3.1 using a mean square error loss function L _mse Guaranteeing the fidelity of the deblurring result, using the perceptual loss function L _p To improve the quality at the details of the deblurred results, the two loss functions are defined as follows:

wherein I is ^c And I ^d Respectively a real clear image and a deblurred image; m, n represent the horizontal and vertical index of the image; phi (phi) _i,j VGG19 feature map loaded with weights pre-trained on ImageNet, which is the jth convolution before the ith max pooling layer, W _i,j And H _i,j Is the size of the feature map, i=3, j=3 is usually set.

S3.2, defining a total loss function,

L _total ＝L _mse +λ×L _p ，

where λ is the weight of the perceptual loss function, which is typically set to 0.01. Using the total loss function L _total The network is trained until the entire network converges to an optimal accuracy.

S4, inputting the motion blurred image into a trained network in the S3 to obtain a deblurring result with more thorough blur removal.

In this embodiment, the deblurring is performed on a part of experimental data, and the results are shown in fig. 3 (a), fig. 3 (b), fig. 4 (a), fig. 4 (b), fig. 4 (c), and fig. 4 (d), so that it can be seen that the present embodiment can stably and accurately deblur blurred images with different blur degrees.

It should be understood that parts of the specification not specifically set forth herein are all prior art.

While particular embodiments of the present invention have been described above with reference to the accompanying drawings, it will be understood by those skilled in the art that these are by way of example only, and that various changes and modifications may be made to these embodiments without departing from the principles and spirit of the invention. The scope of the invention is limited only by the appended claims.

Claims

1. The edge-based deep learning image motion blur removing method is characterized by comprising the following steps of:

step 4, inputting the motion blurred image into the deblurred backbone network trained in the step 3 to obtain a deblurred result;

the obtaining of the edge characteristic information in the step 1 comprises the following substeps:

step 1.2, excavating deep characteristic information of an edge map; taking the edge map output in the step 1.1 as input, extracting high-level edge characteristic information from the fuzzy edge through a series of convolution and nonlinear activation operations: the convolution kernel size of the first convolution is 1 multiplied by 1, and the convolution kernel size of the subsequent four convolutions is 3 multiplied by 3, so that the spatial resolution of the image is kept unchanged in the whole process; the nonlinear activation adopts a hole correction linear unit, and finally outputs high-dimensional edge characteristic information with the size of W multiplied by H multiplied by 128;

the implementation of step 2 comprises the following sub-steps:

step 2.1, extracting the characteristics of the blurred image; the method comprises the steps of inputting a color blurred image with the size W multiplied by H multiplied by 3 into a convolution layer formed by convolution and nonlinear activation, wherein the encoding stage can be divided into 4 processing blocks, and the characteristic diagram of each block is respectively kW multiplied by kH multiplied by l, wherein W is the width of an original diagram, and H is the height of the original diagram; k=1, 0.5,0.25, l=32/k;

step 2.4, gradually reconstructing an image with blur removed from deep image features;

the implementation of step 3 comprises the following sub-steps:

step 3.2, defining a total loss function:

L _total ＝L _mse +λ×L _p ，

wherein λ is the weight of the perceptual loss function, set to 0.01;