CN110782396B

CN110782396B - Light-weight image super-resolution reconstruction network and reconstruction method

Info

Publication number: CN110782396B
Application number: CN201911166995.8A
Authority: CN
Inventors: 赵国盛; 范赐恩; 邹炼; 田胜; 杨烨
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2019-11-25
Filing date: 2019-11-25
Publication date: 2023-03-28
Anticipated expiration: 2039-11-25
Also published as: CN110782396A

Abstract

The invention provides a light-weight image super-resolution reconstruction network and a reconstruction method, which convert an input low-resolution image into an output high-definition image with both an image structure and a human eye perception effect; thinning network parameters by pruning optimization, realizing weight sharing by using a cycle structure, and reducing the calculated amount of the network while compressing the parameters of the network by using depth separable convolution and depth separable deconvolution; the forward reasoning speed of the network is improved by converting floating point number operation into fixed point number operation through a saturation quantization cutoff algorithm based on the KL distance; the accuracy level of the lightweight network, namely the super-resolution effect of the image, is ensured, the calculated amount of the network is reduced, and the forward reasoning speed of the network is improved.

Description

Light-weight image super-resolution reconstruction network and reconstruction method

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a light-weight image super-resolution reconstruction network and a reconstruction method.

Background

Human beings rely on information that information carrier such as image obtained is far higher than the information that relies on relevant carrier such as sound, and image resolution is the important factor of weighing down image content expression, and image super resolution reconstruction technique is a emerging technique that can effectively promote image resolution. In recent years, the learning-based image super-resolution reconstruction technology has been well developed in terms of accuracy, but how to solve the problem of large parameter amount caused by the learning-based method has become a key problem in practical application of the image super-resolution technology.

The prior patents about the super-resolution reconstruction method of the accelerated image include:

1) The invention discloses a Chinese patent application No. CN201811576216.7 for a rapid image super-resolution reconstruction method based on deep learning, which adopts a shallow network structure to improve the calculation speed of a network, and adopts a nested network to improve the nonlinear representation of the network, thereby improving the reconstruction effect of the network; however, the method has a poor super-resolution reconstruction effect when images with large magnification factor are amplified because of less network convolution blocks.

2) The application numbers are: the invention relates to a super-resolution image reconstruction method based on a lightweight network in China patent CN 201910272182.0.A lightweight and efficient ShuffleNet unit structure of a mobile terminal neural convolutional network designed by the invention improves the original EDSR network structure of an enhanced deep super-resolution network, and simultaneously, network pruning, weight sharing and Huffman coding are used for quantizing network parameters, so that the number of network parameters is greatly reduced; however, the input of each layer of the method is still floating point number during forward reasoning, so that the method has high requirement on hardware and is not beneficial to practical application and popularization.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: a light-weight image super-resolution reconstruction network and a reconstruction method are provided, which are used for reducing the number of parameters and improving the calculation speed while ensuring the accuracy level of the light-weight network.

The technical scheme adopted by the invention for solving the technical problems is as follows: a light-weight image super-resolution reconstruction network comprises a low-resolution feature extraction module, a depth separable circulation module and a reconstruction module; the low-resolution feature extraction module is used for converting the input image from m number of channels into n number of channels, wherein m is less than n, and extracting features and sending the features to the depth separable circulation module; the depth separable circulation module is used for circularly operating the received image for T times to extract the low-resolution information and the high-resolution information of the received image and sending the low-resolution information and the high-resolution information into the reconstruction module; the reconstruction module is used for integrating the number of channels of the received image into m and deconvoluting the m into a high-resolution image;

according to the scheme, the depth separable circulation module comprises a depth separable convolution and a depth separable deconvolution; the size of the convolution kernel for the depth-separable convolution and depth-separable deconvolution corresponds to the magnification of the image, with a 2-fold magnification corresponding to a convolution kernel of 6x6, a 3-fold magnification corresponding to a convolution kernel of 7x7, and a 4-fold magnification corresponding to a convolution kernel of 8x8.

Further, the depth separable convolution includes channel-by-channel convolution and point-by-point convolution; the depth separable deconvolution includes channel-by-channel deconvolution and point-by-point deconvolution.

Further, the depth separable circulation module further comprises a volume block of 1x 1; the convolution block of 1x1 is used for converting the low-resolution features and the high-resolution features of the input multi-channel image into r channels, wherein m is more than r and less than n, and then the r channels and the n channels are sent to the depth separable convolution and the depth separable deconvolution to extract the features.

A light-weight image super-resolution reconstruction method comprises the following steps:

s1: constructing an image super-resolution reconstruction network comprising a low-resolution feature extraction module, a depth separable circulation module and a reconstruction module;

s2: a lightweight image super-resolution reconstruction network;

s3: and inputting a test low-resolution image into a light-weighted image super-resolution reconstruction network, and generating a high-resolution image through one-time forward propagation calculation.

Further, in the step S2, the specific steps are as follows:

s21: training the network obtained in the step S1 and storing a network model;

s22: carrying out pruning optimization on the network obtained in the step S21;

s23: circularly training the network obtained in the step S22 and storing a network model;

s24: and quantifying the weight of the network obtained in the step S23 and the input of the specific layer to obtain the light-weight image super-resolution reconstruction network.

Further, in the step S21, the specific steps are:

s211: setting the multiple as S, cutting the length and width of a group of high-resolution images into the nearest integral multiple of S as a target image data set, and setting the size of the target image as C _h ×C _w And obtaining a low-resolution image data set after down-sampling all target images by S times, wherein the size of the input image is

Taking the two image data sets as training data sets;

s212: the low-resolution images and the target images are in one-to-one correspondence, a small block y of the target image is taken, and the size of the small block y is set to be P _h ×P _w The size of y corresponds to the magnification of the image, 2 magnifications correspond to P _h ＝P _w =60,3 times magnificationCorresponds to P _h ＝P _w =50,4 magnification corresponds to P _h ＝P _w =40; taking a small block x of the low-resolution image corresponding to the target small block y, and setting the size of the small block x of the low-resolution image as

Taking y and x as a training sample pair;

s213: sequentially inputting all training sample pairs of the training data set into the network obtained in the step S1, and gradually updating parameters of the network through a forward and backward propagation algorithm until the training is finished; and setting a loss function as L1, traversing the whole training set through epoch cyclic iteration until the model parameters and the total loss of the network are converged, and storing the model parameters of the whole network.

Further, in step S22, the specific steps are: and setting the part of the network with the weight less than 50% obtained in the step S21 to zero.

Further, in the step S24, the specific steps are:

s241: let the original weight be W _float The maximum value of the absolute value of the original weight is | W _float | _max The quantized weight is W _int8 And (4) rounding 0 to obtain int (), carrying out unsaturated quantization on the weight of the network obtained in the step (S23), and quantizing the weight from a floating point number to a fixed point number as follows:

s242: performing KL-distance-based saturation cut-off quantization on the input of the specific layer of the network obtained in step S241 by taking N pictures, and setting the maximum value vector of the absolute values of the input data of each layer as H and the average maximum value vector as

The vector length of the depth separable cyclic module is TN, the vector length of the non-depth separable cyclic module is N, the summation function of all elements in the vector is sum (), and layer-by-layer quantization is performedThe average maximum of the acyclic module is found as:

the average maximum value of the loop module is:

dividing each layer of data into 1024 groups, wherein each group contains data of bin [0], bin [1], bin [2], … and bin [1023], and 1024 bins, and taking the midpoint of each group as the representative value of the group; setting the rounding-up function as ceil (), the width w of each group of data is

Setting the optimal threshold value of each group of data mapped from floating point number to fixed point number as th, respectively assuming th between 128-1023 groups of data in a traversal mode, setting the ith group of data as the optimal threshold value, wherein 127 & lti & lt & gtis not more than 1023, constructing a reference array P = [ bin [0], bin [1], bin [2], … and bin [ i-1] ], and adding all the data outside the boundary to the last item of P to avoid losing the data outside the boundary, wherein the modified reference array P' is as follows:

normalizing the modified reference array P':

setting the candidate array Q as the array mapped by the corrected reference array P 'to the 0 th to 127 th arrays of data, wherein Q comprises 128 arrays of data, expanding Q into i arrays to enable the length of Q to be the same as P', and normalizing the array Q:

setting the KL distance function of the quantized data and the original data as KL (), wherein the KL distance d between the corrected reference array P' and the candidate array Q is as follows:

d＝KL(P′，Q)；

storing each group of KL distance obtained by traversal, and setting the minimum value d of the KL distance _min Is m, then the optimal threshold th for each group of data to be mapped from floating point number to fixed point number is:

th＝(m+0.5)·w；

let the original data be I _float If the rounding-down function is fix (), the quantized fixed point data I _int8 Comprises the following steps:

further, in step S242, the specific layers include a 1 × 1 convolutional layer of the low resolution feature extraction module, a depth separable convolutional layer and a depth separable deconvolution layer of the depth separable loop module, and a depth separable deconvolution layer of the reconstruction module.

The invention has the beneficial effects that:

according to the light-weight image super-resolution reconstruction network and the reconstruction method, the image super-resolution reconstruction network based on the depth separable cycle module is constructed, the mapping relation from the low-resolution image to the high-resolution image is trained, and the trained network is subjected to light-weight operation to obtain the light-weight image super-resolution reconstruction network, so that the accuracy level of the light-weight network, namely the image super-resolution effect, is ensured, the calculated amount of the network is reduced, and the forward reasoning speed of the network is improved.

Drawings

FIG. 1 is a functional block diagram of an embodiment of the present invention.

Fig. 2 is a flow chart of an embodiment of the present invention.

Fig. 3 is a flow chart of forward propagation computation according to an embodiment of the present invention.

Fig. 4 is a functional block diagram of a deep separable loop module of an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

Referring to fig. 1, the light-weighted image super-resolution reconstruction network of the present invention is characterized in that: the system comprises a low-resolution feature extraction module, a depth separable circulation module and a reconstruction module; the depth separable circulation module comprises convolution blocks of depth separable convolution, depth separable deconvolution and 1x1, the sizes of convolution kernels of the depth separable convolution and the depth separable deconvolution are determined according to the image magnification, the convolution kernel corresponding to 2 times amplification is 6x6, the convolution kernel corresponding to 3 times amplification is 7x7, and the convolution kernel corresponding to 4 times amplification is 8x8; the depth separable convolution comprises channel-by-channel convolution and point-by-point convolution, and the depth separable deconvolution comprises channel-by-channel deconvolution and point-by-point deconvolution; referring to fig. 3, the low resolution feature extraction module is used to convert the input low resolution image from 3 channels to 128 channels and extract features into the depth separable loop module; referring to fig. 4, the depth separable circulation module is configured to convert the low resolution features and the high resolution features of multiple channels into 32 channels through a convolution block of 1 × 1, extract the features through depth separable convolution and depth separable deconvolution, perform a circular operation on the received image 4 times to extract the low resolution information and the high resolution information of the received image, improve the reconstruction accuracy of the network by using the information, and then send the information to the reconstruction module; the reconstruction module is used for integrating the channels of the received images into 3 channels and deconvoluting the 3 channels into a high-resolution image which gives consideration to the image structure and the human eye perception effect.

Referring to fig. 2, a light-weighted image super-resolution reconstruction method includes the following steps:

s1: and constructing an image super-resolution reconstruction network comprising a low-resolution feature extraction module, a depth separable circulation module and a reconstruction module.

S2: the light-weight image super-resolution reconstruction network comprises:

s21: training the network obtained in the step S1 and storing a network model:

Taking the two image data sets as training data sets;

s212: the low-resolution images and the target images are in one-to-one correspondence, a small block y of the target image is taken, and the size of the small block y is set to be P _h ×P _w The size of y corresponds to the magnification of the image, 2 magnifications correspond to P _h ＝P _w =60,3 magnification corresponds to P _h ＝P _w =50,4 magnification corresponding to P _h ＝P _w =40; taking a small block x of the low-resolution image corresponding to the target small block y, and setting the size of the small block x of the low-resolution image as

Taking y and x as a training sample pair;

S22: pruning optimization is carried out on the network obtained in the step S21, namely, the part of the network obtained in the step S21, of which the absolute value of the weight is less than 50%, is set to be zero;

s24: and quantifying the weight of the network obtained in the step S23 and the input of the specific layer to obtain a light-weight image super-resolution reconstruction network:

s242: taking 50 pictures to perform saturation cut-off quantization based on KL distance on the input of the specific layer of the network obtained in the step (S241), wherein the specific layer comprises a 1 multiplied by 1 convolution layer of a low-resolution feature extraction module, a depth separable convolution layer and a depth separable deconvolution layer of a depth separable circulation module, and a depth separable deconvolution layer of a reconstruction module; let the maximum vector of the absolute values of the input data of each layer be H, and the average maximum vector be

Since the depth separable cyclic module cycles 4 times in the forward inference, the vector length of the depth separable cyclic module is 200, the vector length of the non-depth separable cyclic module is 50, and the summation function of all elements in the vector is sum (), the average maximum value of the non-cyclic module is obtained through layer-by-layer quantization:

the average maximum of the loop module is:

normalizing the modified reference array P':

d＝KL(P′，Q)；

th＝(m+0.5)·w；

original designThe starting data is I _float If the rounding function is fix (), the quantized fixed point data I _int8 Comprises the following steps:

the code of the quantization algorithm is as follows:

Input:FP32 histogram H with 1024bins:bin[0],…,bin[1023]

for i in range(128,1024):

P＝[bin[0],...,bin[i-1]]

P[i-1]+＝S

P/＝∑(P)

Q＝quantize[bin[0],…,bin[i-1]]into 128levels

Q＝Q expend to‘i’bins

Q/＝∑(Q)

divergence[i]＝KL_divergence(P,Q)

end for

m＝index(divergence,min(divergence))

th＝(m+0.5)*w

The above embodiments are only used for illustrating the design idea and features of the present invention, and the purpose of the present invention is to enable those skilled in the art to understand the content of the present invention and implement the present invention accordingly, and the protection scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes and modifications made in accordance with the principles and concepts disclosed herein are intended to be included within the scope of the present invention.

Claims

1. A light-weight image super-resolution reconstruction network is characterized in that: the system comprises a low-resolution feature extraction module, a depth separable circulation module and a reconstruction module;

the low-resolution feature extraction module is used for converting the input image from m number of channels into n number of channels, wherein m is less than n, and extracting features and sending the features to the depth separable circulation module;

the depth separable circulation module is used for circularly operating the received image for T times to extract the low-resolution information and the high-resolution information of the received image and sending the low-resolution information and the high-resolution information into the reconstruction module;

the depth separable cycle module comprises a depth separable convolution and a depth separable deconvolution; the size of the convolution kernel of the depth separable convolution and the depth separable deconvolution corresponds to the magnification of the image, the convolution kernel corresponding to 2 times of magnification is 6x6, the convolution kernel corresponding to 3 times of magnification is 7x7, and the convolution kernel corresponding to 4 times of magnification is 8x8;

the depth separable convolution includes channel-by-channel convolution and point-by-point convolution; the depth separable deconvolution includes channel-by-channel deconvolution and point-by-point deconvolution;

the depth separable cyclic module further comprises a 1x1 rolling block; the convolution block of 1x1 is used for converting the low-resolution features and the high-resolution features of the input multi-channel image into r channels, wherein m is more than r and less than n, and then the r channels and the r channels are sent into a depth separable convolution and depth separable deconvolution to extract features;

the depth separable cycle module is used for performing 4 times of cyclic operation on the received image sequentially according to the sequence of convolution of 1x1, depth separable deconvolution, depth separable convolution, convolution of 1x1, depth separable deconvolution, convolution of 1x1 and depth separable convolution, wherein the output result of the first time of convolution of 1x1 is further input into the convolution of 1x1 for the second time and the fourth time, the output result of the first time of depth separable deconvolution is further input into the convolution of 1x1 for the third time and the fifth time, the output result of the first time of depth separable convolution is further input into the convolution of 1x1 for the fourth time and the sixth time to extract low-resolution information and high-resolution information of the received image, the reconstruction accuracy of the network is improved by utilizing the information, and then the reconstruction module is sent;

the reconstruction module is used for integrating the number of channels of the received image into m and deconvoluting the m into a high-resolution image;

the reconstruction module comprises deconvolution and 3 x 3 convolution; and enabling m =3, integrating the channels of the received image into 3 channels by using a reconstruction module, and performing deconvolution to obtain a high-resolution image which gives consideration to the image structure and the human eye perception effect.

2. The reconstruction method of the light-weighted image super-resolution reconstruction network according to claim 1, wherein the reconstruction method comprises the following steps: the method comprises the following steps:

s2: a lightweight image super-resolution reconstruction network;

3. The reconstruction method according to claim 2, characterized in that: in the step S2, the specific steps are as follows:

s21: training the network obtained in the step S1 and storing a network model;

s22: pruning optimization is carried out on the network obtained in the step S21;

4. The reconstruction method according to claim 3, characterized in that: in the step S21, the specific steps are as follows:

s211: setting the multiple as S, cutting the length and width of a group of high-resolution images into the nearest integral multiple of S as a target image data set, and setting the size of the target image as C _h ×C _w And obtaining a low-resolution image data set after down-sampling all target images by S times, and then inputting the size of the imageIs composed of

Taking the two image data sets as training data sets;

s212: the low-resolution images and the target images are in one-to-one correspondence, a small block y of the target image is taken, and the size of the small block y is set to be P _h ×P _w The size of y corresponds to the magnification of the image, 2 corresponds to P _h ＝P _w =60,3 magnification corresponds to P _h ＝P _w =50,4 magnification corresponds to P _h ＝P _w =40; taking a small block x of the low-resolution image corresponding to the target small block y, and setting the size of the small block x of the low-resolution image as

Taking y and x as a training sample pair;

5. The reconstruction method according to claim 4, characterized in that: in the step S22, the specific steps are: the part of the network obtained in step S21 having the weight less than 50% in absolute value is set to zero.

6. The reconstruction method according to claim 5, wherein: in the step S24, the specific steps are as follows:

s241: let the original weight be W _float The maximum value of the absolute value of the original weight is | W _float | _max The quantized weight is W _int8 And (4) taking an integer function of int () as 0, carrying out unsaturated quantization on the weight of the network obtained in the step (S23), and quantizing the weight value from a floating point number into a fixed point number as follows:

The vector length of the depth separable cyclic module is TN, the vector length of the non-depth separable cyclic module is N, and the summation function of all elements in the vector is sum (), so that the average maximum value of the non-cyclic module is obtained through layer-by-layer quantization:

the average maximum value of the loop module is:

normalizing the corrected reference array P':

setting the candidate array Q as the array of mapping the corrected reference array P 'to the 0 th to 127 th arrays of data, wherein Q comprises 128 arrays of data, expanding Q into i arrays to make the length of Q the same as P', and normalizing the candidate array Q:

d＝KL(P′，Q)；

th＝(m+0.5)·w；

7. the reconstruction method according to claim 6, characterized in that: in step S242, the specific layers include a 1 × 1 convolutional layer of the low resolution feature extraction module, a depth separable convolutional layer and a depth separable deconvolution layer of the depth separable cyclic module, and a depth separable deconvolution layer of the reconstruction module.