CN114677304A

CN114677304A - Image deblurring algorithm based on knowledge distillation and deep neural network

Info

Publication number: CN114677304A
Application number: CN202210313655.9A
Authority: CN
Inventors: 李春国; 李武斌; 刘周勇; 杨绿溪
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2022-03-28
Filing date: 2022-03-28
Publication date: 2022-06-28

Abstract

The invention discloses an image deblurring algorithm based on knowledge distillation and a deep neural network, which can be used for realizing high-quality conversion from a blurred image to a clear image. The algorithm comprises the following steps: constructing a training set, a verification set and a test set according to the public image data set; constructing a self-coding deep neural network based on an Unet structure, namely a teacher model; carrying out supervised training on the teacher model based on the high-definition image data set until convergence; constructing a deep neural network model based on a Unet structure and deformable convolution, namely a student model; based on a complete fuzzy image data set, a converged teacher network is combined, a knowledge distillation method is used for carrying out supervised training on a deblurring student model, and then a converged deep deblurring model is obtained; and evaluating the deblurring performance of the student network model on the blurred image test set. The invention obtains better deblurring performance, and obtains better structural similarity while obtaining higher peak signal-to-noise ratio.

Description

Image deblurring algorithm based on knowledge distillation and deep neural network

Technical Field

The invention belongs to the technical field of image deblurring in computer vision, and particularly relates to an image deblurring algorithm based on knowledge distillation and a deep neural network.

Background

Digital images are an important information carrier and are widely used in various fields, such as: digital audio and video, remote sensing detection, traffic management and the like. However, digital images are very susceptible to various damage factors in the processes of acquisition, transmission and storage, so that the quality of the digital images is reduced, and blurring is a main characteristic. Therefore, the method has important significance and great value for the research of deblurring the digital image.

The traditional image deblurring algorithm mainly models an image blurring process into a linear convolution model, and a solving model of the traditional image deblurring algorithm is mainly based on a Maximum a Posterior probability (Max a Posterior, MAP) model and a Maximum Likelihood Estimation (MLE) model to solve a minimized energy function. However, the traditional deblurring algorithm is limited to a limited scene type under the condition of space uniform blurring, the prior robustness of the used image is poor, and a non-convex and non-linear constraint function is introduced into an optimization model, so that the optimization process is large in calculated amount, low in restoration speed and not high in practical application value.

With the great success of the deep neural network in the field of computer vision, a lot of researches have proposed that the deep neural network is applied to the problem of image deblurring. Researchers consider that most of real scenes are complex spatial non-uniform fuzzy scenes, and are difficult to establish a proper degradation model, and try to directly learn a non-linear mapping function from a fuzzy image to a clear image by utilizing the strong fitting capability of a deep neural network. The method is often used for solving the problem of complex spatial non-uniform deblurring (such as dynamic scene blurring) and achieves a better recovery effect. The current image deblurring algorithm based on the deep neural network mainly comprises a multi-scale convolution neural network, a multi-image block hierarchical network, a GAN-based deep neural network, an image prior-based deep neural network and the like. However, the existing image deblurring algorithm based on the deep neural network has large overall parameter quantity, the training difficulty of the network is increased only by simply deepening the network depth, supervision and constraint on intermediate features are lacked in the network training process, and finally the deblurring performance is limited to be improved. Meanwhile, when the complex and variable spatial non-uniform blurring exists, the recovery effect of the algorithm still has limitation.

Disclosure of Invention

The invention aims to provide an image deblurring algorithm based on knowledge distillation and a deep neural network, and aims to solve the technical problems that the existing image deblurring algorithm has no higher practical application value and the existing image deblurring algorithm based on the deep neural network has limited improvement on deblurring performance.

In order to solve the technical problems, the specific technical scheme of the invention is as follows:

an image deblurring algorithm based on knowledge distillation and a deep neural network comprises the following steps:

step S1: constructing an end-to-end deep neural network model based on a Unet structure, namely a teacher network model;

step S2: acquiring a fuzzy image data set, and performing supervised training on a teacher network model by using a clear image set in the fuzzy image data set so as to obtain a converged teacher network model, wherein the converged teacher network model realizes that a high-definition image is input, outputs a plurality of intermediate features through forward operation and obtains a recovered high-definition image;

step S3: constructing an end-to-end image deblurring model based on a Unet structure and deformable convolution, namely a student network model;

step S4: and (5) performing supervision training on the intermediate characteristic output of the student network model by using the teacher network model converged in the step (S2) based on the fuzzy image data set in the step (S2) so as to obtain a converged student network model, wherein the converged student network model takes the fuzzy image as input and outputs a reconstructed deblurred image through forward operation.

Further, the step S2 specifically includes the following steps:

step S201: acquiring a public fuzzy image data set GOPRO, and dividing the data set GOPRO into a training set, a verification set and a test set, wherein the training set, the verification set and the test set respectively comprise 2000 pairs of training image pairs, 103 pairs of verification image pairs and 1111 pairs of test image pairs; training a teacher network model by using only real and clear images of the training set;

step S202: in the data preprocessing stage, image blocks with 256 × 256 resolutions are randomly cut from an original real clear image, random horizontal and vertical turnover data enhancement processing is carried out on the image blocks, and then the image blocks are input into a teacher network model for training after being normalized;

step S203: the method comprises the steps of extracting network characteristics of an input clear image through an encoder, reconstructing the network characteristics of a decoder, and finally fusing the characteristics to obtain a reconstructed clear image;

step S204: performing supervision training on the teacher network by using an L1 loss function;

step S205: and the trained teacher network model performs forward operation on the input clear image to obtain a plurality of decoding feature maps with different scales of the clear image, and finally outputs a restored clear image, wherein the intermediate decoding feature maps are used for subsequent supervised training of the student network model.

Further, the step S4 specifically includes the following steps:

step S401: using the data set GOPRO partitioned in step S2 as a training set, a verification set and a test set, wherein the training set includes 2000 pairs of blurred image and real sharp image pairs, the verification set includes 103 pairs of blurred image and real sharp image pairs, and the test set includes 1111 pairs of blurred image and real sharp image pairs;

step S402: in the data preprocessing stage, 256 × 256 image pairs are randomly cut from an original image pair, the image pairs are subjected to random horizontal and vertical turnover data enhancement processing, finally, the normalized image pairs are input into a student network model, and meanwhile, real clear images in the image pairs are input into a converged teacher network model;

step S403: extracting the input fuzzy image through the network characteristics of an encoder, reconstructing the network characteristics of a decoder, and finally performing characteristic fusion to output a reconstructed deblurred image; meanwhile, the teacher network model performs forward operation on the 256 × 256 real clear images to obtain intermediate feature maps of a plurality of clear images, and the intermediate feature maps are used for performing supervision training on the decoding feature output of the student network;

step S404: performing supervised training on the student network output image by adopting an L1 loss function and a frequency domain-based L1 loss function;

step S405: and performing forward operation on the input test set fuzzy image by using the optimized available student network model to obtain a recovered deblurred image.

Further, step S204 specifically includes the following steps:

in the training process, an Adam optimizer is used, the learning rate is continuously attenuated in a cosine annealing mode, when the Peak Signal-to-Noise Ratio (PSNR) and the Structural Similarity (SSIM) on the verification set do not increase any more after training, the network stops training, and an optimized available teacher network model is obtained.

Further, step S404 specifically includes the following steps: loss function L of student network population_totalComprises the following steps:

L_total＝L₁+L_fft1+L_feats

wherein L is₁Represents the reconstruction loss, L_fft1Representing the reconstruction loss, L, based on the frequency domain_featsRepresenting the intermediate feature reconstruction loss; in the training process, an Adam optimizer is used, the learning rate is continuously attenuated in a cosine annealing mode, when the peak signal-to-noise ratio and the structural similarity on the verification set do not increase any more after training, the student network stops training, and an optimized usable image deblurring network model is obtained.

Further, in step S1, the teacher network is composed of an encoder network and a decoder network, where the encoder network is composed of a plurality of residual error modules ResCAB including a channel attention mechanism and a downsampling layer, and after the feature map is downsampled, the feature resolution is reduced, but the convolution depth is continuously deepened, and the number of extracted feature channels is increased; the decoder network consists of a plurality of residual error modules ResCAB containing a channel attention mechanism and an up-sampling layer, and after the feature map is up-sampled, the feature resolution is increased, and the number of feature channels is reduced; meanwhile, jump connection is added between feature layers with the same resolution of the coding and decoding network, so that the learning difficulty and the training difficulty of the coding and decoding network are reduced; and finally, performing fusion reconstruction on the features by using a residual error module ResCAB, and outputting a reconstructed image.

Further, the student network model in step S3 is composed of an encoder network, a decoder network, and a feature fusion network; the encoder network consists of a plurality of residual modules ResCAB containing channel attention and downsampling layers; the decoder network consists of an improved residual error module, a deformable convolution module and the like, and jump connection exists between different scale layers of the encoder network and the decoder network; and finally, performing cross-scale feature fusion on the shallow feature and the decoding features with different scales by a feature fusion network.

Further, the method comprises the step of evaluating the deblurring performance of the student network model on the blurred image test set.

Further, the peak signal-to-noise ratio and the structural similarity are calculated according to the deblurred image obtained by the convergent student network forward operation and the real clear image in the blurred image data set, and the image deblurring performance of the algorithm is evaluated by combining the network parameter quantity and the forward operation time.

The image deblurring algorithm based on knowledge distillation and the deep neural network has the following advantages: the method is based on knowledge distillation technology and deep neural network architecture, and adds a teacher network model on the basis of normal supervision and training, so that supervision and constraint on intermediate features in the network training process are enhanced, and higher-quality deblurred images can be obtained. Compared with the current main image deblurring algorithm, the method can improve the deblurring performance of the network model under the conditions of keeping small network parameter quantity and short forward operation time, obtain higher peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM), and restore clearer images. In the field of deep learning, the invention proves that the knowledge distillation can improve the performance of a deep neural network in a specific field.

Drawings

FIG. 1 is a flow chart of the knowledge-based distillation and deep neural network image deblurring algorithm of the present invention;

FIG. 2 is a diagram of a teacher network model architecture of the present invention;

FIG. 3 is a flow chart of teacher network model training and prediction of the present invention;

FIG. 4 is an overall block diagram of a teacher network and student network model of the present invention;

FIG. 5 is a flow chart of student network model training and prediction of the present invention;

FIG. 6 is a comparison of the deblurred image effect of the method of the present invention with the current dominant image deblurring algorithm.

Detailed Description

In order to better understand the purpose, structure and function of the present invention, an image deblurring algorithm based on knowledge distillation and deep neural network is described in further detail below with reference to the attached drawings.

Example 1

Referring to fig. 1-6, the present implementation provides an image deblurring algorithm based on knowledge distillation and a deep neural network.

As shown in fig. 1, the algorithm specifically includes:

the specific structure of the teacher network model is shown in fig. 2. The teacher network mainly comprises an encoder network and a decoder network, wherein the encoder network mainly comprises a plurality of residual error modules ResCAB containing a channel attention mechanism and a down-sampling layer, when a feature map is down-sampled, the feature resolution is reduced, but the convolution depth is continuously deepened, and the number of extracted feature channels is increased. The decoder network is composed of a plurality of residual modules ResCAB containing channel attention mechanisms and an upsampling layer, and after the feature map is upsampled, the feature resolution is increased, and the number of feature channels is reduced. Meanwhile, jump connection is added between feature layers with the same resolution of the coding and decoding network, and the learning difficulty and the training difficulty of the network are reduced. And finally, performing fusion reconstruction on the features by using a residual error module ResCAB, and outputting a reconstructed image.

the teacher network model training and predicting process is shown in fig. 3, and includes:

step S201: acquiring a public fuzzy image data set GOPRO, and dividing the data set GOPRO into a training set, a verification set and a test set, wherein the training set, the verification set and the test set respectively comprise 2000 pairs of training image pairs, 103 pairs of verification image pairs and 1111 pairs of test image pairs; the teacher network model is trained using only the true sharp images of the training set.

Step S202: in the data preprocessing stage, image blocks with 256 × 256 resolutions are randomly cut from an original real clear image, data enhancement processing such as random horizontal and vertical overturning is carried out on the image blocks, and then the image blocks are normalized and input into a teacher neural network model for training;

step S203: and (3) extracting the network characteristics of the input clear image through an encoder, reconstructing the network characteristics of a decoder, and finally performing characteristic fusion to obtain a reconstructed clear image.

Step S204: performing supervision training on a teacher network by adopting an L1 loss function, using an Adam optimizer in the training process, continuously attenuating the learning rate in a cosine annealing mode, and stopping the training of the network when the PSNR and SSIM on a verification set do not increase any more after training so as to obtain an optimized available teacher network model;

Step S3: constructing an end-to-end image deblurring model based on a Unet structure and deformable convolution, namely a student neural network model;

the concrete structure of the student network model is shown in fig. 4. The student network model mainly comprises an encoder network, a decoder network and a feature fusion network. The encoder network consists of a plurality of residual modules ResCAB containing channel attention and downsampling layers; the decoder network consists of an improved residual error module, a deformable convolution module and the like, and jump connection exists between different scale layers of the encoder network and the decoder network. And finally, the feature fusion network performs cross-scale feature fusion on the shallow features and the decoding features with different scales.

The training and predicting process of the student network model is shown in fig. 5, and includes:

step S401: the data set GOPRO divided in step S2 is used as a training set, a verification set and a test set, wherein the training set includes 2000 pairs of blurred images and real sharp image pairs, the verification set includes 103 pairs of blurred images and real sharp image pairs, and the test set includes 1111 pairs of blurred images and real sharp image pairs.

Step S402: and in the data preprocessing stage, image pairs with 256 × 256 resolutions are randomly cut from the original image pairs, the images are subjected to random horizontal and vertical turnover data enhancement processing, finally, the normalized images are input into a student network model, and meanwhile, real clear images in the image pairs are input into a converged teacher network model.

Step S403: extracting the input fuzzy image through the network characteristics of an encoder, reconstructing the network characteristics of a decoder, and finally performing characteristic fusion to output a reconstructed deblurred image; meanwhile, the teacher network model performs forward operation on the 256 × 256 real clear images to obtain intermediate feature maps of a plurality of clear images, and the intermediate feature maps are used for performing supervision training on the decoding feature output of the student network.

Step S404: and carrying out supervised training on the student network output image by adopting an L1 loss function and a frequency domain-based L1 loss function. Student netLoss function L of the ensemble of the network_totalComprises the following steps:

L_total＝L₁+L_fft1+L_feats(1)

wherein L is₁Represents the reconstruction loss, L_fft1Representing the reconstruction loss, L, based on the frequency domain_featsRepresenting the intermediate feature reconstruction loss. In the training process, an Adam optimizer is used, the learning rate is continuously attenuated in a cosine annealing mode, and when the Peak Signal-to-Noise Ratio (PSNR) and the Structural Similarity (SSIM) on a verification set are basically not increased any more, the student network stops training, so that an optimized usable image deblurring network model is obtained.

Step S405: and carrying out forward operation on the input test set fuzzy image by using the optimized student network model with fixed parameters to obtain a recovered deblurred image.

It should be noted that the teacher network model and the student network model provided in this embodiment are not limited to the Pytorch framework, and only need to train the dataset GORPO, and achieve convergence after iteration for several times (order of magnitude) in the training process, and finally achieve reconstruction of deblurred images.

And finally, evaluating the deblurring performance of the student network model on the blurred image test set. More specifically, the peak signal-to-noise ratio and the structural similarity are calculated according to the deblurred image obtained by the convergent student network forward operation and the real clear image in the blurred image data set, and the image deblurring performance of the algorithm is evaluated by combining the network parameter quantity and the forward operation time. Wherein, the larger the peak signal-to-noise ratio and the structural similarity value are, the better the quality of the restored image is.

As shown in the following table, the method of the present invention is compared with the deblurring performance of the current main image deblurring algorithm. From the test results in the following table, the method of the present invention obtains the highest PSNR value and SSIM value on the GOPRO test set, which indicates that the final deblurring effect achieved by the method of the present invention is the best. The PSNR of the method is still 0.56dB higher than the PNSR value of the best-performance HINet algorithm in other algorithms in the table, which shows that the deblurring performance of the algorithm is greatly improved.

Compared with the MPRIT and HINet algorithms with better deblurring effect, the method has smaller parameter quantity and lighter network in terms of network parameter quantity. The running time of the method is still short in the view of the network forward operation time.

Fig. 6 is a diagram showing an example of the deblurring image effect comparison of the method of the present invention and the current main image deblurring algorithm. The first row is respectively an original blurred image, a DeepDeblur, a DMPHN and an MPRNet algorithm deblurred image from left to right; the second row is from left to right respectively MIMO-Unet, HINet and the deblurred image and the real clear image of the method of the invention. From the point of view of the overall quality of the image and the recovery effect of the local image details, the deblurring effect of the method can be visually displayed to be better than that of other algorithms.

The details of the present invention are well known to those skilled in the art.

It is to be understood that the present invention has been described with reference to certain embodiments, and that various changes in the features and embodiments, or equivalent substitutions may be made therein by those skilled in the art without departing from the spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims

1. An image deblurring algorithm based on knowledge distillation and a deep neural network is characterized by comprising the following steps:

2. The knowledge distillation and deep neural network based image deblurring algorithm of claim 1, wherein the step S2 specifically comprises the steps of:

step S201: acquiring a public fuzzy image data set GOPRO, and dividing the data set GOPRO into a training set, a verification set and a test set, wherein the training set, the verification set and the test set respectively comprise 2000 pairs of training image pairs, 103 pairs of verification image pairs and 1111 pairs of test image pairs; training the teacher network model using only the real clear images of the training set;

3. The knowledge distillation and deep neural network based image deblurring algorithm of claim 2, wherein the step S4 specifically comprises the steps of:

step S401: using the data set GOPRO divided in step S2 as a training set, a verification set and a test set, wherein the training set includes 2000 pairs of blurred images and real sharp image pairs, the verification set includes 103 pairs of blurred images and real sharp image pairs, and the test set includes 1111 pairs of blurred images and real sharp image pairs;

4. The knowledge distillation and deep neural network based image deblurring algorithm of claim 3, wherein the step S204 specifically comprises the following steps:

in the training process, an Adam optimizer is used, the learning rate is continuously attenuated in a cosine annealing mode, when the peak signal-to-noise ratio and the structural similarity on the verification set do not increase any more after training, the network stops training, and an optimized available teacher network model is obtained.

5. The image deblurring algorithm based on knowledge distillation and deep neural network of claim 4, wherein the step S404 comprises the following steps: loss function L of student network population_totalComprises the following steps:

L_total＝L₁+L_fft1+L_feats

6. The knowledge distillation and depth neural network based image deblurring algorithm of claim 1, wherein in step S1, the teacher network is composed of an encoder network and a decoder network, wherein the encoder network is composed of a plurality of residual error modules ResCAB containing channel attention mechanism and a downsampling layer, when the feature map is downsampled, the feature resolution is reduced, but the convolution depth is continuously deepened, and the number of extracted feature channels is increased; the decoder network consists of a plurality of residual error modules ResCAB containing a channel attention mechanism and an up-sampling layer, and after the feature map is up-sampled, the feature resolution is increased, and the number of feature channels is reduced; meanwhile, jump connection is added between feature layers with the same resolution of the coding and decoding network, so that the learning difficulty and the training difficulty of the coding and decoding network are reduced; and finally, performing fusion reconstruction on the features by using a residual error module ResCAB, and outputting a reconstructed image.

7. The knowledge distillation and deep neural network based image deblurring algorithm of claim 1, wherein the student network model in step S3 is composed of an encoder network, a decoder network and a feature fusion network; the encoder network consists of a plurality of residual modules ResCAB containing channel attention and a down-sampling layer; the decoder network consists of an improved residual error module, a deformable convolution module and the like, and jump connection exists between different scale layers of the encoder network and the decoder network; and finally, the feature fusion network performs cross-scale feature fusion on the shallow features and the decoding features with different scales.

8. The knowledge distillation and deep neural network based image deblurring algorithm of claim 1, further comprising evaluating deblurring performance of the student network model on a test set of blurred images.

9. The knowledge distillation and deep neural network based image deblurring algorithm of claim 8, wherein peak signal-to-noise ratio and structural similarity are calculated from deblurred images obtained by convergent student network forward operations and true sharp images in the blurred image dataset, and the image deblurring performance of the algorithm is evaluated in combination with the network parameters and forward operation time.