CN112508792A

CN112508792A - Single-image super-resolution method and system of deep neural network integration model based on online knowledge migration

Info

Publication number: CN112508792A
Application number: CN202011531087.7A
Authority: CN
Inventors: 张泊宇; 罗喜伶; 金晨; 王雪檬
Original assignee: Hangzhou Innovation Research Institute of Beihang University
Current assignee: Hangzhou Innovation Research Institute of Beihang University
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2021-03-16

Abstract

The invention discloses a single-image super-resolution method and a single-image super-resolution system of a deep neural network integration model based on online knowledge migration.A plurality of DNN-based base learners are used for constructing the integration model, the integration model comprises a plurality of DNN-based base learners, and each base learner is provided with an up-sampling module and a refinement network; the upsampling modules of different base learners adopt different upsampling methods; the basis learners are connected by an expanded Cross-batch unit, and the outputs of the basis learners are combined into the final output HR image by a learnable weight. The invention can actively introduce the difference into the base learners, namely actively enhance the difference among the base learners, and ensure that the integrated model can obtain good performance; the extended Cross-stick unit allows the beneficial inputs from other base learners to be strengthened, while the unfavorable inputs are weakened; the learnable weights can be adaptively combined according to the outputs of different basis learners, thereby achieving better results.

Description

Single-image super-resolution method and system of deep neural network integration model based on online knowledge migration

Technical Field

The invention relates to a single-image super-resolution method and a single-image super-resolution system of a deep neural network integration model based on online knowledge migration, which are particularly suitable for scenes with high requirements on details of reconstructed high-resolution images.

Background

Single Image Super Resolution (SISR) is a classical problem in the field of computer vision, aiming at reconstructing a High Resolution (HR) Image with larger size and richer details from a given Single Low Resolution (LR) Image. The technology is widely applied to remote sensing image imaging, video security and medical image imaging. A large number of SISR methods have been proposed, and can be specifically classified into interpolation, reconstruction, and learning methods. The interpolation method uses a domain interpolation-based method to reconstruct the HR image from the LR image, such as nearest neighbor interpolation, bicubic interpolation, etc., which are very fast and direct, but lack accuracy. Reconstruction-based methods constrain the possible solution space of the reconstructed HR image by defining fine explicit prior information (in the form of distributions, energy functions, etc.). However, the performance of these methods drops off rapidly as the upsampling factor increases. The learning-based approach learns a mapping function from LR to HR images using pairs of LR images and corresponding HR images. Among current SISR methods, learning-based methods, particularly Deep Neural Network (DNN) -based SISR models, are widely used and outperform interpolation or reconstruction-based models in performance.

Since SISR is a process of recovering a large amount of missing information according to limited information, it is an ill-defined problem that the solution is not unique, i.e. SISR methods using different recovery mechanisms will generate high resolution images containing different details for the same low resolution image. Combining the high resolution images produced by these different SISR methods using an integration model can provide high resolution images with finer details. The key to the excellent performance of the integrated model is that the individual base learners are excellent and have sufficient variability between the base learners. The invention provides an integrated model framework for end-to-end training of a SISR method, wherein a difference introduction module and an online knowledge migration module are innovatively designed, the difference introduction module can actively enhance the difference between base learners, and the online knowledge migration module can promote the performances of all the base learners in the training process, so that the performances of the integrated model are improved.

The ensemble model has been applied to the SISR problem and shows better performance than a single learner, such as ESCN and MSCN. The methods firstly initialize a plurality of DNNs with the same structure at random, train the DNNs by utilizing a training set respectively, and construct an integrated model by taking the DNNs as a base learner. In reconstructing the HR image, as shown in FIG. 1, the model first upsamples the LR image I with an upsampling module (Up, usually bicubic interpolation)^LRAmplifying to a target size, and using the amplified image as input of each base learner to obtain a set of output

Finally, the group of outputs are combined into a final HR image I in an equal weight mode^HR. However, the DNN-based integration model has the following problems: 1) DNN, the learning of an individual-base learner of an integrated model is a highly complex non-convex optimization problem, and the base learner learned by adopting a gradient descent method is easy to fall into bad local minimum values, so that the effect of the whole integrated model is influenced; 2) even if each base learner is well learned, the strong learning power of DNN will reduce the variability between different base learners, which is a guarantee that the integrated model achieves good performance. Therefore, these problems limit the integration model to make a significant breakthrough in the effect of SISR.

Disclosure of Invention

The invention aims to provide a novel integrated model framework based on DNN (discrete cosine transformation) for solving the problem of SISR (Single-input Single-output) and can solve the problems of insufficient difference among base learners and difficulty in training the base learners existing in the traditional DNN integrated model for solving the SISR, and the effect of reconstructing an HR image by the model is improved.

The technical scheme of the invention is as follows:

the invention firstly discloses a single-image super-resolution method of a deep neural network integration model based on online knowledge migration, which comprises the following steps:

1) constructing a DNN-based integrated model, wherein the integrated model comprises a plurality of DNN-based base learners, and each base learner is provided with an up-sampling module and a refinement network; the upsampling modules of different base learners adopt different upsampling methods; the base learners are connected by an expanded Cross-batch unit, and the activation graphs { x ] of all the base learners at the same layer are linearly combined₁…x_NGet the input of the next layer of these base learners

The outputs of the basis learners are combined into the final output HR image through learnable weights;

2) training the integrated model by using a training set containing LR images and HR images corresponding to the LR images as labels, and obtaining the integrated model capable of performing super resolution to high resolution images through training;

3) and carrying out super-resolution processing on the low-resolution images in the test data set by using the trained integrated model.

The invention also discloses a deep neural network integration system based on online knowledge migration for single image super-resolution processing, which comprises the following steps:

a plurality of DNN-based base learners; each base learner is provided with an up-sampling module and a refinement network; the up-sampling modules of different base learners adopt different up-sampling methods to up-sample the input LR image; the refinement network is composed of a feature extraction network, a plurality of residual blocks, a reconstruction network and a jump layer connection from a coarse input to an output, wherein the feature extraction network uses a convolution layer to extract features from the coarse input; each residual block comprises a convolutional layer with a Leaky ReLU activation function and a convolutional layer without an activation function; the reconstruction net uses the two convolution layers to reconstruct a residual image;

an extended Cross-stick cell; it linearly combines activation maps { x) of all base learners in the same layer₁…x_NGet the input of the next layer of these base learners

A learnable base learner output merging unit; which combines the outputs of the base learners into the final output HR image with learnable weights.

Compared with the prior art, the invention has the following beneficial effects:

(1) according to the invention, each base learner is additionally provided with an up-sampling module, and each up-sampling module adopts different up-sampling methods to generate different reconstructed image details, so that the difference can be actively introduced into the base learners, namely, the difference between the base learners is actively enhanced, and the integrated model can be ensured to obtain good performance; the up-sampling method is used as an up-sampling module instead of the traditional interpolation method, so that the refined network can obtain coarse input with higher quality, and the training difficulty of the refined network is reduced.

(2) The invention expands a Cross-batch unit as an online knowledge migration module; the extended Cross-stick unit linearly combines the activation maps { x } of all DNNs at the same level₁…x_NGet the input of the next layer of these DNNs

The invention can strengthen the beneficial input from other base learners and weaken the unfavorable input by the method.

(3) Compared with the traditional equal weight method, the learnable basis learner output merging mechanism is adopted, and the learnable weight w can be adaptively combined according to the outputs of different basis learners, so that a better effect is obtained. Meanwhile, the whole integrated model can be trained in an end-to-end mode, and the whole optimal solution can be found more conveniently.

Drawings

FIG. 1 is a conventional DNN-based integration model framework;

FIG. 2 is an integrated model framework for DNN proposed by the present invention;

FIG. 3 is a DNN integration model based on online knowledge migration;

FIG. 4 is a network structure of a base learner;

FIG. 5 is an expanded Cross-stick cell;

FIG. 6 is an integration model building process;

FIG. 7 is a comparison of the performance of KTDE and KTDE-VDSR, KTDE-DRRN, KTDE-MS-LapSRN;

FIG. 8 is a graph of KTDE and KTDE-w/o KT Training Loss (Training Loss) during Training;

FIG. 9 is a graph of PSNR changes on Set5 for KTDE and KTDE-w/o KT after different degrees of Gaussian noise have been added to the model;

FIG. 10 is a comparison of KTDE and KTDE-w/o KT in picture restoration detail after different degrees of Gaussian noise have been added to the model.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1 and 2, compared with the traditional DNN-based integrated model framework (fig. 1), the integrated model framework designed by the present invention (fig. 2) is added with a "difference upsampling module" and an "online knowledge migration module". Wherein the former is given a different upsampling module per base learner

Thereby providing each base learner with inputs having different details, i.e., differences between the active reinforcement base learners; the latter connects two different base learners to each other, so that the error gradient signal can be from one base in the training processThe learner passes to different base learners, helping to improve overall performance.

Based on the framework, the invention designs a DNN integration model KTDE (FIG. 3) which can be trained end to verify the validity of the framework. The model comprises 3 base learners with 'difference up-sampling modules', the base learners are connected by 'online knowledge migration modules', and each base learner is combined into a final output, namely a reconstructed HR image, through a learnable weight w.

The present invention uses three DNN-based basis learners to build our integration model KTDE. In the proposed framework, each base learner has an upsampling module and a refinement network. Fig. 4 shows the network structure of the base learner we have designed. When given the input LR image I^LRThe base learner first uses a unique upsampling module Up_kIt is scaled up to the coarse input l 'of target size'^LR. This module can actively introduce diversity into the basis learner since different up-sampling methods can produce different reconstructed image details. In this invention, three basis learners respectively adopt pre-trained DNN-based SISR models, namely VDSR, DRRN and MS-LapSRN, as an upsampling module. By using the advanced upsampling methods instead of the traditional interpolation method as the upsampling module, the refined network can obtain coarse input with higher quality, so that the training difficulty of the refined network is reduced.

The next refinement network (fig. 4) consists of a feature extraction network (FENet), three residual blocks (ResBlock), a reconstruction network (RecNet) and a hop connection from coarse input to output. Specifically, FENet uses a convolutional layer (Conv) to extract features from the coarse input. Each residual block is composed of a convolutional layer with a Leaky ReLU activation function and an no activation function. Finally, the model uses two convolution layers to reconstruct the residual image. The layer jump connection not only alleviates the training difficulty, but also preserves the diversity introduced by the "difference upsampling module". All convolutional layers have 64 channels (except the last convolutional layer which has only one channel for reconstruction), the filter size is set to 3 x 3, and the step size is set to 1 to ensure the input and output sizes are the same.

For online knowledge transfer between multiple basic learners, we extend the Cross-batch unit (i.misra, a.shurivastava, a.gupta, and m.hebert, "Cross-batch networks for multi-task learning," in Computer Vision and Pattern Recognition,2016, pp.3994-4003) to enable it to be applied to more than two DNNs. The extended Cross-latch cell (FIG. 5) linearly combines activation maps { x } for all DNNs at the same level₁…x_NGet the input of the next layer of these DNNs

As shown in equation (1). In training these DNNs, each DNN can obtain information from a different DNN as it propagates forward or backward, helping to obtain a better solution for each other.

Through the back propagation algorithm learning matrix A, the Cross-sticch unit can adaptively decide how much the output of a certain hidden layer of each base learner is transmitted to other base learners. Thus, for a base learner, the beneficial inputs from other base learners are strengthened, and the unfavorable inputs are weakened.

For the input LR image I^LROutput of all base learners

Combined together by way of linear weighting, the weights w can be updated in the training using a back-propagation algorithm, as shown in equation (2). Compared with the traditional equal weight method, the learning w can be adaptively combined according to the outputs of different base learners, so that a better effect is obtained. Meanwhile, the whole integrated model can be trained in an end-to-end mode, and the whole optimal solution can be found more conveniently.

The present invention will be described in further detail with reference to fig. 6 and an exemplary embodiment. This example describes how to build an integration model and train. The method comprises the following specific steps:

the method comprises the following steps: a training data set is constructed. The model KTDE designed by the invention is a supervised learning model, so a data set which comprises an LR image and an HR image corresponding to the LR image as tags needs to be constructed. Since such paired LR images and HR images are difficult to directly obtain, the present example constitutes a training data set by collecting a large number of high-quality natural images as HR images and down-sampling them by m times respectively by bicubic interpolation to obtain LR images. Meanwhile, in order to improve the generalization capability of the model, the training data is augmented in the following way: 1) rotating each pair of images by 90, 180 and 270 degrees simultaneously, so that the data volume is expanded to 4 times; 2) and horizontally overturning the data expanded in the last step, and further expanding the data volume to 8 times. This example simultaneously collects images (from the public data sets Set5, Set14, BSDS100 and Urban100) that are not duplicative of the training data Set, and constructs test sets in the same manner (but without data augmentation) for testing the performance of the trained model.

Step two: and constructing an integration model and initializing. The algorithm firstly obtains pre-training models of VDSR, DRRN and MS-LapSRN through an open source community, and then constructs an integrated model shown in figure 2. Meanwhile, since the model designed by the invention adopts the Leaky ReLU activation function, the model parameter initialization method suitable for the activation function is used in the example to initialize the model parameters of each base learner. For matrix a, to make the training process more stable, the present example initializes its parameters on the diagonal to 0.8, with the remaining parameters being 0.1. For the integrated weight w, the present example treats each base learner as equally important at initialization, i.e., both are initialized to 1/3.

Step three: and (5) training a model. This example sets the MAE as a loss function and trains the model through a stochastic gradient descent optimization algorithm Adam. Learning of Adam's algorithmThe rate is set to 10 for the model parameters^-3For matrix A, 10 is set^-2To speed up convergence. Momentum term beta of simultaneous Adam algorithm₁And beta₂Set to 0.9 and 0.999 respectively. The model was trained on a 4 × 10 total⁵Step, the batch size of the training data for each step is 32. The pre-trained model does not participate in the training of the model.

Step four: and (5) testing the model. After the model training is finished, the model is used for reconstructing the LR images in the test set to obtain the HR images, and the similarity between the LR images and the real HR images is evaluated through the PSNR and SSIM indexes. Wherein the PSNR (Peak Signal to Noise Ratio) formula is shown in (3), the index range is (0, + ∞), and the larger the value is, the closer the reconstructed image quality is to the real HR image; SSIM (Structural Similarity) formula is shown in (4), the index range is [0,1], and the closer the value is to 1, the closer the reconstructed image quality is to the true HR image.

Wherein n is the number of pixel bits, generally 8 is taken, and MSE is the mean square error of the current image x and the reference image y; mu.s_xAnd mu_yRepresenting the mean, σ, of the images x and y_xAnd σ_yRepresenting the variance, σ, of the images x and y, respectively_xyRepresenting the covariance of images x and y. c. C₁＝(k₁L)²,c₂＝(k₂L)²Is a constant used to maintain stability, and L is the dynamic range of pixel values, generally taken as 255, k₁＝0.01,k₂＝0.03。

The importance of the differential upsampling module of the present invention is illustrated below:

three integrated models KTDE-VDSR, KTDE-DRRN and KTDE-MS-LapSRN are additionally designed for comparison in the experiment, and the three integrated models are the same as the model designed by the inventionThe models are identical, except that the three models each use the same upsampling module (VDSR, DRRN and MS-LapSRN). In the experiment, under the condition that the up-sampling multiple is 2, the Chi-square distance (Chi-square distance) is adopted to calculate the output of different base learners aiming at the same test set and the same integrated model

The average difference between them, i.e. equation (5).

The experimental results are shown in table 1, and the degree of difference between the model outputs of different upsampling modules adopted by each base learner is significantly higher than that of the integrated model adopting the same upsampling model.

Further, this experiment compared the performance change of 4 models on the Set5 data Set during training, and the results are shown in fig. 7, where the curves represent the PSNR change with x 2 as the upsampling multiple on the Set5 data Set during training of each model. As can be seen from fig. 7, KTDE is superior in performance to other integrated models that use the same upsampling module. This experiment verifies the importance of the "difference upsampling module". Table 1KTDE and KTDE-VDSR, KTDE-DRRN, KTDE-MS-LapSRN mean chi-square distances between the outputs of the respective basis learners on different test sets

	KTDE-VDSR	KTDE-DRRN	KTDE-MS-LapSRN	KTDE
					Set5	2.07×10^-6	2.44×10^-6	2.43×10^-6	1.89×10^-2
Set14	8.27×10^-6	9.63×10^-6	9.53×10^-6	2.07×10^-2
					BSDS100	2.37×10^-6	2.63×10^-6	2.57×10^-6	1.95×10^-2
Urban100	1.79×10^-5	2.01×10^-5	4.90×10^-5	1.20×10^-2

The importance of the online knowledge migration module is explained below:

secondly, this experiment will verify the importance of the "online knowledge migration module". Specifically, the effectiveness of the model KTDE-w/o KT is verified by comparing the performance of the model KTDE and the model KTDE-w/o KT without the online knowledge migration module during training and testing. As can be seen from FIG. 8, KTDE with an "online knowledge migration Module" can converge faster than KTDE-w/o KT at training.

Meanwhile, in the experiment, Gaussian noise with the standard deviation sigma of 0-0.01 is added to the trained KTDE and KTDE-w/o KT, and the influence degree of the Gaussian noise on the PSNR index of the model on the Set5 is tested, so that the robustness of the model is verified. As shown in fig. 9, KTDE-w/o KT performance decreases significantly as the noise level increases; in contrast, KTDE performance declines slowly. The experiment also shows the anti-interference capability of the model to Gaussian noise in a visual mode. As shown in fig. 10, as the noise level increases, in the HR picture reconstructed by KTDE-w/o KT, a ring-shaped ghost appears at the edge; in contrast, the KTDE model has no obvious change in reconstruction effect. These experiments demonstrate the importance of the "online knowledge migration module".

The method of the invention is compared with the prior art methods as follows:

finally, the experiment compares the model with other SISR models (including SISR ensemble model, single learner model) at different magnifications for 4 test sets. The comparison result is shown in table 2, the model designed by the invention is superior to other models in both PSNR and SSIM indexes, and the result proves the effectiveness of the model.

Table 2 KTDE compares PSNR/SSIM of the existing SISR integrated model or single learner model under different test sets and different upsampling multiples

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A single-image super-resolution method of a deep neural network integration model based on online knowledge migration is characterized by comprising the following steps:

1) constructing a DNN-based integrated model, wherein the integrated model comprises a plurality of DNN-based base learners, and each base learner is provided with an up-sampling module and a refinement network; the upsampling modules of different base learners adopt different upsampling methods; the base learners are connected by an expanded Cross-batch unit, and all the base learners are positioned in the same hidden layer by linearly combining the activation graphs { x₁…x_NGet the input of the next hidden layer of these base learners

2. The on-line knowledge transfer-based deep neural network integrated model single-image super-resolution method as claimed in claim 1, wherein for a given input LR image I^LRThe up-sampling module of each base learner respectively adopts different up-sampling methods to obtain coarse input images I 'with different target sizes'^LRAs input to its refinement network.

3. The on-line knowledge migration-based single-image super-resolution method for the deep neural network integration model, as set forth in claim 1, wherein the refinement network consists of a feature extraction network, a plurality of residual blocks, a reconstruction network and a jump layer connection from a coarse input to an output; the feature extraction network uses a convolutional layer to extract features from the coarse input; each residual block comprises a convolutional layer with a Leaky ReLU activation function and a convolutional layer without an activation function; finally, the reconstruction net uses the two convolution layers to reconstruct the residual image.

4. The on-line knowledge migration-based single-image super-resolution method for deep neural network integration model according to claim 1, wherein the expanded Cross-stich unit linearly combines activation maps { x } of all DNNs in the same hidden layer₁…x_NGet the input of the next hidden layers of DNN

As shown in equation (1);

by learning the matrix A through the back propagation algorithm, the expanded Cross-batch unit can adaptively determine how much the output of a certain hidden layer of each base learner is transmitted to other base learners.

5. The on-line knowledge transfer-based single-image super-resolution method for deep neural network integration model according to claim 1, wherein for input LR image I^LROutput of all base learners

The weights w can be updated in the training by using a back propagation algorithm, as shown in formula (2);

6. the on-line knowledge transfer-based single-image super-resolution method for the deep neural network integration model, as set forth in claim 1, wherein the up-sampling module adopts a pre-trained model, and the pre-trained model does not participate in the integration model training.

7. An online knowledge migration-based deep neural network integration system for super-resolution processing of a single image, characterized by comprising: