CN110232653A

CN110232653A - The quick light-duty intensive residual error network of super-resolution rebuilding

Info

Publication number: CN110232653A
Application number: CN201811515913.1A
Authority: CN
Inventors: 李素梅; 石永莲
Original assignee: Tianjin University Marine Technology Research Institute
Current assignee: Tianjin University Marine Technology Research Institute
Priority date: 2018-12-12
Filing date: 2018-12-12
Publication date: 2019-09-13

Abstract

The quick light-duty intensive residual error network of super-resolution rebuilding, the binary channels depth residual error network (FLSR) based on convolutional network, the main function of deep channel are the high frequency textures for learning image, the low-frequency information of shallow channel study image.For the convergence rate for accelerating network, residual error connection type joined in structure, the image detail information of front convolutional layer directly can be transmitted to subsequent convolutional layer by furthermore residual error connection, be conducive to reconstruction quality better image；Additionally using in structure, which helps to weaken gradient, disappears, and improves the intensive connection type of model performance.The parameter and computation complexity of model are reduced while the image quality evaluations indexs such as raising Y-PSNR (PSNR), structural similarity (SSIM) and fidelity of information gauge (IFC), the reconstruction speed for improving image, can apply in real life.

Description

The quick light-duty intensive residual error network of super-resolution rebuilding

Technical field

The invention belongs to Image Super-resolution Reconstruction fields, are related to the improvement and application of image super-resolution rebuilding method, More particularly to a kind of quickly light-duty intensive residual error network of super-resolution rebuilding.

Background technique

With the rapid advancement of social development and science and technology, diversity is presented in the type that people obtain information, is believed The mode of breath is also continuously increased, wherein being more than 70% by the information proportion that vision obtains, image and video are to regard at present Feel the main carriers of information, this is allowed for, and image procossing research is more and more extensive, and image processing techniques is more more and more universal.

Image resolution ratio refers to the pixel number that per inch image contains, and indicates the degree of scenery details resolution capability.It Spatial resolution, temporal resolution, spectral resolution, radiometric resolution etc. can be divided into, point mentioned in content herein below What resolution referred to is all spatial resolution.The spatial resolution of image is higher, then it represents that the pixel of the image is more intensive, grain details Abundanter, apparently, image is more clear for human eye.Currently, many fields require high-resolution image, with realize target identification with The tasks such as track, analyzing and detecting.In terms of video monitoring, due to being limited by hardware device itself, gained image sharpness It is not high, it is limited to details reduction degree, affect quality monitoring；In terms of remote sensing satellite, high-definition picture could preferably be sentenced Disconnected identification land object information；Medical aspect, doctor can be analyzed and determined by high-definition picture, grasp disease Feelings improve the accuracy of medical treatment.

There are two types of the common methods for promoting image resolution ratio: first is that from hardware point of view, by the performance for improving imaging device Improve image resolution ratio, this method higher cost；Second is that people are acquired by innovatory algorithm handle by hardware device from software respective To low-resolution image be converted to corresponding high-definition picture, i.e., image super-resolution (Super Resolution, SR it) rebuilds.This method effectively avoids the limitation of hardware technology, reduces cost of device, has very strong practical application meaning, at For the common method for improving image resolution ratio at present.

Have a large amount of oversubscription method at present to be suggested, including the method (interpolation-based based on interpolation Methods), the method based on reconstruct (reconstruction-based methods) and based on study method ( Example-based methods).Method based on interpolation, including bilinear interpolation, bicubic interpolation, arest neighbors interpolation etc., This kind of algorithm is simple and quick, and since reconstructed results lack detailed information, reconstruction image is fuzzy；Method based on reconstruct, including repeatedly For Inverse Projection, maximum a posteriori probability method etc., when with enough prior informations, such methods can quickly recover high-resolution Rate picture；Method based on study, including neighborhood embedding inlay technique, rarefaction representation method etc., this method mainly pass through study HR and LR Mapping relations between are rebuild low resolution picture to obtain high-resolution pictures using the mapping relations learnt. Since performance of the first two method often in the large scale factor is very poor, so super-resolution (SR) method is all third substantially recently Kind, i.e., from LR and HR image to study priori knowledge.

In recent years, by the inspiration of many Computer Vision Tasks, depth network achieves very big breakthrough in the field SR.Depth The basic principle of study is to be constructed more hidden layers based on traditional artificial neural network, converted by multilayer, by original sky Between eigentransformation to new space obtain better data characteristics so as to be fitted more complicated functional relation.Finally reach To the purpose for promoting classification or recurrence accuracy.In recent years, deep learning is in image procossing, target detection, speech recognition, face The fields effects such as identification have been above traditional algorithm, and very big effect has also been played in image super-resolution problem.Because its Powerful ability in feature extraction, the super-resolution image of the available higher precision of deep learning algorithm, therefore this research has Great scientific research value and practical application meaning.

Although the effect performance of depth network is good, most of depth networks still contain that there are many disadvantages.Deepen at present Or widen network and become the designer trends of network, these methods need it is a large amount of calculate and memory consumption, be difficult in reality In directly apply；Secondly, being using single pass shallow-layer or depth currently based on the method majority of convolutional neural networks (CNNS) Layer network realizes super-resolution rebuilding, and needs to pre-process but pretreatment may can introduce new noise, and shallowly leads to The high-frequency information of road picture easy to be lost；And deep layer network when being rebuild convergence rate it is slow, be also easy to happen gradient explosion/disappearance Phenomenon.

Summary of the invention

To solve the above-mentioned problems, the quick intensive residual error network of light-duty super-resolution rebuilding of the present invention is based on convolutional network Binary channels depth residual error network (FLSR), the main function of deep channel is the high frequency texture for learning image, and shallow channel is learned Practise the low-frequency information of image.For the convergence rate for accelerating network, residual error connection type joined in structure, furthermore residual error connection can The image detail information of front convolutional layer is directly transmitted to subsequent convolutional layer, be conducive to reconstruction quality better image； Additionally using in structure, which helps to weaken gradient, disappears, and improves the intensive connection type of model performance.Improving Y-PSNR (PSNR), it is reduced while the image quality evaluations index such as structural similarity (SSIM) and fidelity of information gauge (IFC) The parameter and computation complexity of model improve the reconstruction speed of image, can apply in real life.

To realize single image super-resolution rebuilding task, we devise the convolution binary channels based on intensive residual error Network structure.The network has lightweight parameter and computation complexity, as shown in Figure 1, network frame is as shown in Figure 2.Entirely There is no pond layer and full articulamentum, only convolutional layer and warp lamination in FLSR structure.The structure by one 3 layers shallow channel and One 29 layers of deep channel is constituted, and is melted the reconstruction information in depth channel by a convolutional layer in the end of whole network Conjunction obtains HR.Shallow channel is mainly used for restoring the overall profile of image, retains the raw information of image, deep channel main function is Learn the high frequency texture of image, it includes feature extraction layer, Nonlinear Mapping layer, up-sampling layer and multiple dimensioned reconstruction layer four A part.Firstly, deep channel can accelerate network in such a way that intensive and residual error alternately connects in feature extraction phases Convergence rate.Secondly, whole network directly up-samples image using deconvolution, the preprocessing process of image is avoided, The complexity that whole network carries out oversubscription reconstruction can be reduced, so that network can be with significantly more efficient training.Finally, rebuilding rank Section, deep channel can be extracted short and long texture information simultaneously and be carried out image reconstruction by the way of multiple dimensioned reconstruction.Furthermore The enhancing model FLSR-G of FLSR combines group convolution in feature extraction phases, in holding result in the case where same level, Greatly reduce the parameter and computation complexity of network, feature extraction block (the Dense Block-G) such as Fig. 3 (b) of improved model It is shown.

The key of single image super-resolution rebuilding based on CNN is exactly that the mapping found between LR image and HR image is closed System.And the deep layer network in network structure is constantly extracted with the convolutional network intensively connected and the potential feature of iteration, therefore Deep channel can accurately restore details, such as marginal information.Shallow channel is simply to up-sample, and can retain original graph The overall profile information of picture, Nonlinear Mapping only only used a convolutional layer.

Indicate that original high-resolution image, X indicate the LR picture that down-sampling obtains, have N to instruction in the training process using Y Practice collection, corresponding high-low resolution imageThe SR figure rebuild by convolutional network As being expressed as.The loss letter of mean square error (Mean Square Error, MSE) as network is used herein Number L, is expressed as follows:

MSE is used to help to obtain higher Y-PSNR PSNR (Peak Signal to Noise as loss function Ratio), and PSNR be widely used in be quantitatively evaluated image quality, be able to reflect the perception matter of image to a certain extent Amount.

1 feature extraction

Image super-resolution rebuilding rebuilds better effect, but with network with the increase of depth convolutional neural networks depth The increase of depth, common gradient extinction tests can make network be difficult to restrain in depth network.Therefore add in deep channel herein The mode of intensive connection is added, has maximized the information flow in whole network, has reduced what deep layer network easily occurred in the training process Gradient extinction tests, it is significantly more efficient to reconstruct high frequency texture；In addition to this in order to accelerate the convergence rate of depth network, In deep channel by the way of residual error connection.

Network proposed in this paper has largely used the intensive mode for connecting and connecting with residual error in feature extraction phases, intensively connects It connects and all layers of feature extraction phases is all feedovered connection two-by-two, so that all layers obtain the stage all convolutional layers from front Characteristic pattern is obtained as input, plays important function for slowing down gradient extinction tests, so that network is more easily trained；With this Meanwhile residual error connection realizes connection shorter between subsequent convolutional layer and the convolutional layer of front, and the connection allows signal Backpropagation, reduce in depth convolutional neural networks, high-frequency information can be lost in being transmitted to subsequent convolutional network The phenomenon that, the effect of compensation high-frequency information is also acted in a network.The intensive connection type of convolutional neural networks is most early in figure As proposing in identification, block is as shown in Fig. 3 (a) for intensive connection (Dense block) proposed in this paper.

Feature extraction be from original LR image X extract (overlapping) fritter, and by each fritter be expressed as higher-dimension to Amount.These vectors include one group of Feature Mapping, and quantity is equal to the dimension of vector.Characteristic extraction step includes 11 convolutional layers, Each layer is provided with size64 filters.Convolutional layer can indicate are as follows:

L represents the l convolutional layer, and w indicates l layers of filter.It is the feature extraction figure of output, * indicates convolution algorithm.It indicatesParameter, whereinIt is the size of a filter convolution kernel,It is the quantity of filter.Each convolution Followed by activation primitive after layer.SRCNN is using rectification linear unitAs activation primitive.ReLU It is saturated firmly in x < 0.Derivative is 1 when due to x > 0, so, ReLU can keep gradient unattenuated in x > 0, thus Gradient disappearance problem is avoided to a certain extent.And herein using PReLU as activation primitive, PReLU is to increase parameter Modified ReLU, wherein the parameter of negative semiaxis functionIt can learn, it is quasi- which improves the model near zero Conjunction ability, and still have the characteristics that fast convergence rate.So in structure we using PReLU as activation primitive, PReLU can be with It is defined as a general activation primitive:

It isThe input signal of activation primitive on layer, and the output of activation primitive can be described as:

It is the characteristic pattern of final output,It is l layers of biasing.To solve degenerate problem, we are to the every of feature extraction Group network layer has used the quick connection of identical mapping, in addition, residual error structure can also be such that network more quickly restrains.

2 Nonlinear Mapping layers

Nonlinear Mapping layer is made of 5 layers of convolutional layer, Nonlinear Mapping be each high dimension vector is non-linearly mapped to it is another A high dimension vector, each map vector are conceptually the expressions of high-resolution fritter, these vectors include that another group of feature is reflected It penetrates, network also up-samples image using deconvolution, and the pretreatment for effectively avoiding image in this way introduces new noise.

The fusion of 3 depth channels

In a network, the detail of the high frequency of deep layer network recovery high-definition picture, and shallow-layer network only recovers image Profile information merges the feature in depth channel in the last convolutional layer using one layer of not no activation primitive of network, and Realize that dimension is reduced using 1x1 convolution, fused layer can be formulated as:

It is the high-resolution fusion feature figure of output,It isConvolution operation,It is the characteristic pattern of input.

4 image reconstructions

The mode that phase of regeneration in network structure is rebuild with traditional single scaling filter is different, uses more rulers Degree is rebuild, and the feature that such network can obtain multiple scales simultaneously is rebuild, which assembles above-mentioned high-resolution fritter It indicates to generate final high-definition picture.Last reconstruction image is expected similar to the LR of input.

The quick light-duty intensive residual error network of super-resolution rebuilding extracts the feature of mass efficient using intensive residual block and adds Rapid convergence speed；It carries out rebuilding HR image using the convolution kernel of different scale；Furthermore it is effectively reduced further combined with grouping convolution Network parameter and computation complexity obtain a new network structure FLSR-G.It is all real in PSNR, SSIM and IFC evaluation index Competitive result is showed.And reconstruction time is much smaller than best method such as DRRN and MemNet instantly.This lightweight construction Network can utilize extensively in practice.

Detailed description of the invention

Fig. 1 is the comparison figure of performance between different models, computation complexity and parameter amount；

Fig. 2 is FLSR structural schematic diagram proposed in this paper；

Fig. 3 is intensive residual block and enhanced intensive residual block schematic diagram in structure.

Specific embodiment

The quick light-duty intensive residual error network of super-resolution rebuilding, is divided into two channels, shallow channel only includes 3 convolutional layers Restore the exterior contour information of image with a warp lamination.The main task for rebuilding HR image is completed by deep channel.It masters Road includes feature extraction, Nonlinear Mapping, four up-sampling, multiple dimensioned reconstruction parts.The tool of the network is described in detail below Body implementation process.

1 data set

1.1 training dataset

Use 91image and Berkeley partitioned data set the Berkeley Segmentation of General-100, Yang et al. Dataset(BSD for 200 images) as training data, General-100 data set includes 100 bmp formats without compression Image；91image includes 91 pictures, therefore raw data set has 391 pictures altogether, under normal conditions the effect of the bigger training of data Fruit can be better, and in order to make full use of training data, we use 3 kinds of data enhancement methods: (1) every picture is respectively rotated, The angle of rotation is；(2) flip horizontal image；(3) each image is zoomed in and out in proportion, scaling It is 0.9,0.8,0.7,0.6.Therefore, final training dataset is 40 times of initial data.That is the quantity of training image isOpen image.

2.2 test data set

When amplification coefficient k is respectively 2,3,4, with Y-PSNR (PSNR) and structural similarity (Structural Similarity Index, SSIM) it is used as objective evaluation index, it is based on 4 standard base data collection Set5, Set14, BSD100 With Urban100 assessment models performance.In these data, Set5, Set14 and BSD100 contain natural scene, Urban100 The urban scenery of tool challenge is contained, the details with different frequency bands.Original image adopt using bicubic interpolation The available LR/HR image pair of sample, to be trained and test data set.We convert every coloured picture as YCbCr color space, And the channel Y is only handled, color component is expanded using the mode of bicubic interpolation.

The quick light-duty intensive residual error network of super-resolution rebuilding, the network with same even depth are compared, and have low amounts grade Computation complexity and network parameter, as shown in Figure 1；And compare the network in image quality evaluation from network reconnection effect All realized on index PSNR, SSIM and IFC it is competitive as a result, especially do up-sampling 2 times of tasks when effect More preferably, the PSNR/SSIM on Set5 data set is 37.78/0.9597, the PSNR/SSIM on Urban100 on data set It is 31.35/0.9195, remaining model performance experimental result is as shown in table 1.In addition, the reconstruction time of the network is much smaller than instantly Best method such as DRRN and MemNet, as shown in table 2, the network of this lightweight construction can be utilized extensively in practice.

2. realizing details

In order to prepare training sample, first to original HR image carry out down-sampling, decimation factor be S(S=2,3,4), utilize double three Sublinear interpolation generates corresponding LR image, is then a series of by LR image croppingThe subgraph image set conduct of size Training set.Corresponding HR test image is tailored toThe subgraph image set of size.In view of training the size of picture, To make training effect more preferable, by LR image cut out at random forImage block and corresponding HR image block as input instruction Practice network.The model is trained using Caffe, batch size when training is 64, and initial learning rate is initially set to 0.01, In the fine tuning stage divided by 10.Filter in convolutional layer is all Gaussian distributed random initializtion, and network uses boarding steps The mode of degree decline optimizes.The training on one piece of 1080 GPU 12G memory of NVIDIA GeForce GTX of entire model About half a day.In view of the compromise of training time and computation complexity and reconstruction effect, in two block of script network First four convolutional layer addition grouping convolution has eight grouping convolution altogether, and the block of modification is as shown in Fig. 2, structure is denoted as FLSR-G.? Test phase, i5-7500 CPU, NVIDIA GeForce GTX of the Open Source Code for the algorithm that we are used to compare on 16G The operation of 1080Ti GPU 12G memory.Because the MemNet and DRRN of official there is a phenomenon where GPU memory overflow, by institute The 3.5GHz Intel E5-2367 CPU (64RAM) for thering is test job to be transferred on 64G, NVIDIA TITAN X (Pascal) it is carried out on GPU (12G memory) memory, trained and test job can be completed smoothly with this.

Claims

1. the quick light-duty intensive residual error network of super-resolution rebuilding, it is characterised in that: do not have in entire FLSR structure pond layer with Full articulamentum, only convolutional layer and warp lamination；The structure is made of one 3 layers of shallow channel and one 29 layers of deep channel, It is merged the reconstruction information in depth channel to obtain HR by a convolutional layer in the end of whole network；It mainly uses in shallow channel In the overall profile for restoring image, retain the raw information of image, deep channel main function is the high frequency texture letter for learning image Breath, it includes feature extraction layer, Nonlinear Mapping layer, up-sampling layer and multiple dimensioned four part of reconstruction layer；Firstly, being mentioned in feature The stage is taken, deep channel can accelerate the convergence rate of network in such a way that intensive and residual error alternately connects；Secondly, entire net Network directly up-samples image using deconvolution, avoids the preprocessing process of image, it is possible to reduce whole network carries out The complexity that oversubscription is rebuild, so that network can be with significantly more efficient training；Finally, deep channel is using multiple dimensioned in phase of regeneration The mode of reconstruction can extract short and long texture information simultaneously and carry out image reconstruction；Furthermore the enhancing model FLSR-G of FLSR Group convolution is combined in feature extraction phases, in holding result in the case where same level, greatly reduces the parameter of network And computation complexity.

2. the quick light-duty intensive residual error network of super-resolution rebuilding according to claim 1, it is characterised in that: deep channel is main Include:

1 feature extraction layer

The intensive mode for connecting and connecting with residual error has largely been used in feature extraction phases, it is intensive to connect feature extraction phases institute There is layer all to be feedovered two-by-two connection, so that the stage all convolutional layers all layers from front obtain characteristic pattern as input, Important function is played for slowing down gradient extinction tests, so that network is more easily trained；Residual error connection realizes subsequent Shorter connection between convolutional layer and the convolutional layer of front, and the connection allows the backpropagation of signal, reduces and rolls up in depth In product neural network, the phenomenon that high-frequency information can be lost in being transmitted to subsequent convolutional network, benefit is also acted in a network Repay the effect of high-frequency information；

Feature extraction is to extract a large amount of fritter from original LR image X, and each fritter is expressed as high dimension vector, these Vector includes one group of Feature Mapping, and quantity is equal to the dimension of vector, and characteristic extraction step includes 11 convolutional layers, and each layer is equal It is provided with size64 filters, convolutional layer can indicate are as follows:

L represents the l convolutional layer, and w indicates l layers of filter,It is the feature extraction figure of output,Indicate convolution algorithm,It indicatesParameter, whereinIt is the size of a filter convolution kernel,It is the quantity of filter, Mei Gejuan Followed by activation primitive after lamination, SRCNN using rectification linear unit (As activation primitive, ReLU is saturated firmly in x < 0；Derivative is 1 when due to x > 0, so, ReLU can keep gradient unattenuated in x > 0, from And gradient disappearance problem is avoided to a certain extent；Using PReLU as activation primitive, PReLU increases parameters revision ReLU, wherein the parameter of negative semiaxis functionIt can learn, which improves the models fitting ability near zero, And still have the characteristics that fast convergence rate, PReLU can be defined as a general activation primitive:

It is the characteristic pattern of final output,It is l layers of biasing；Every networking to solve degenerate problem, to feature extraction Network layers have used the quick connection of identical mapping, in addition, residual error structure can also be such that network more quickly restrains；

2 Nonlinear Mapping layers

Nonlinear Mapping layer is made of 5 layers of convolutional layer, Nonlinear Mapping be each high dimension vector is non-linearly mapped to it is another A high dimension vector, each map vector are conceptually the expressions of high-resolution fritter, these vectors include that another group of feature is reflected It penetrates, network also up-samples image using deconvolution, and the pretreatment for effectively avoiding image in this way introduces new noise；

The fusion of 3 depth channels

It is the high-resolution fusion feature figure of output,It isConvolution operation,It is the characteristic pattern of input；

4 image reconstructions

Multiple dimensioned reconstruction is used in network structure, the feature that can obtain multiple scales simultaneously is rebuild, in operation aggregation Stating high-resolution fritter indicates that last reconstruction image is expected similar to the LR of input to generate final high-definition picture.