CN112381720A

CN112381720A - Construction method of super-resolution convolutional neural network model

Info

Publication number: CN112381720A
Application number: CN202011380940.XA
Authority: CN
Inventors: 刘明亮; 王晓航
Original assignee: Heilongjiang University
Current assignee: Heilongjiang University
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2021-02-19

Abstract

The invention discloses a construction method of a super-resolution convolutional neural network model based on a portable system. Step 1: constructing a convolutional neural network model; step 2: training the convolution neural network model in the step 1; and step 3: and (3) deploying the convolutional neural network model trained in the step (2) to realize the super-resolution function of the image. The invention adopts the up-sampling algorithm of the sub-pixel convolution layer of the ESPCN, modifies the network structure before the sub-pixel convolution layer and introduces the residual block to construct the network.

Description

Construction method of super-resolution convolutional neural network model

Technical Field

The invention belongs to the technical field; in particular to a construction method of a super-resolution convolutional neural network model.

Background

The input of the traditional super-resolution convolutional neural network is usually a pseudo high-resolution image output by a bicubic interpolation algorithm, the resolution of the image is improved, but the image is still quite fuzzy, and the quality needs to be further improved by the convolutional neural network, such as an SRCNN network. In recent years, many network structures for performing upsampling through a network are proposed, such as FSRCNN for performing upsampling through a deconvolution layer, and ESPCN for performing upsampling through a sub-pixel convolution layer, which both reduce the size of a sensor flowing in the network.

Disclosure of Invention

The invention provides a construction method of a super-resolution convolutional neural network model, which adopts an up-sampling algorithm of a sub-pixel convolutional layer of ESPCN, modifies a network structure before the sub-pixel convolutional layer, and introduces a residual block to construct a network.

The invention is realized by the following technical scheme:

a construction method of a super-resolution convolutional neural network model comprises the following steps

Step 1: constructing a convolutional neural network model;

step 2: training the convolution neural network model in the step 1;

and step 3: and (3) deploying the convolutional neural network model trained in the step (2) to realize the super-resolution function of the image.

Further, step 1 specifically includes constructing a convolutional neural network model by using two methods, namely, a convolution method with 5 residual blocks and a sub-pixel convolution method.

Further, the step 2 is specifically to train the image by using 100 images of 1920x1080x3, wherein the image includes 50 landscape images and 50 drawings, the training data and the verification data are 9:1, obtain the input of the convolutional neural network model training by reducing the image to 640x360x3, input the model constructed in the step 1, and use the original 1920x1080x3 image as the lab le of the network, and store the finally trained convolutional neural network model.

Further, the step 3 specifically includes performing model freezing, model quantization and model compiling on the convolutional neural network model trained in the step 2.

Further, the model freezing body is to combine the definition of the model calculation graph and the model weight into the same file.

Furthermore, the model quantization is a quantization model which can be conveniently quantized by correcting the quantized model through a quantization data set and calling a delete _ q tool of the DNNDK tool package through a script.

Further, the model is compiled to compile the quantized model into a model that the DPU can run.

The invention has the beneficial effects that:

1. the speed of the network reasoning phase of the invention is greatly improved.

2. The network reasoning phase of the invention consumes less memory.

3. The network reasoning output index of the invention is not obviously reduced.

Drawings

FIG. 1 is a schematic diagram of the convolutional neural network structure of the present invention.

FIG. 2 is a schematic diagram of the structure of the residual block of the present invention.

FIG. 3 is a comparative illustration of an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

According to fig. 1-2, a construction method of a super-resolution convolutional neural network model, the construction method comprising the following steps,

step 1: constructing a convolutional neural network model;

step 2: training the convolution neural network model in the step 1;

Further, the sub-pixel convolution construction method specifically includes that a low-resolution feature map (640x360x27) with small length and width dimensions and deep channel depth is converted into a high-resolution feature map (1920x1080x3) with large length and width dimensions and shallow channel depth in a pixel shuffling mode.

Further, the step 2 is specifically to perform training by using 100 pictures of 1920x1080x3, wherein the pictures include 50 landscape pictures and 50 paintings, the training data and the verification data are 9:1, obtain input of training of the convolutional neural network model by reducing (i.e. bicubic interpolation) the pictures to 640x360x3, input the model constructed in the step 1, make an original 1920x1080x3 image as a Lable of the network, and store the finally trained convolutional neural network model.

Furthermore, the model freezing body is that the definition of the model calculation graph and the model weight are combined into the same file, so that the deployment of the model is facilitated.

Further, the model quantization is that the convolutional neural network model with the parameter of Float32 type is subjected to Int8 quantization operation, the quantized model is corrected through a quantization data set (the adverse effect of precision reduction caused by quantization is reduced to the minimum), and the model can be conveniently quantized through calling a decentq tool carried by a DNNDK tool kit by a script.

Further, the model is compiled to compile the quantized model into a model that the DPU can run. The model can be compiled into an ELF file by calling a DNNC tool carried by the DNNDK tool package through the script, and the file is packaged into an o file, so that the DNNDK-Python API can be conveniently called.

Example 2

Step 1: PSNR (peak signal-to-noise ratio) and SSIM (structural similarity) are evaluation indexes for evaluating the quality of the image output by the neural network, the higher the quality is, the better the quality is, and the smaller the parameter number and the training time is, the better the quality is.

Due to the use of the sub-pixel convolution technology, images flowing in the network are low-resolution images, and the final effect is that the speed of the network reasoning stage is greatly improved; the memory consumption in the network reasoning stage is reduced; the net end effect is improved.

If the task is super resolution 4 times, the low resolution image size is 100 x 100, the original high resolution image size is 400 x 400, if no sub-pixel convolution is used, the network input is a picture pre-magnified by a bicubic interpolation algorithm, i.e. 400 x 400, if sub-pixel convolution is used, the input does not need to be pre-magnified, i.e. 100 x 100, which results in:

the speed of the network reasoning phase is greatly improved.

Through calculation, compared with the method that the sub-pixel convolution is not used, the common convolution with the output of 3 channels is used for replacement, the calculation amount is reduced by about 160 times; making the inference time decrease proportionally.

The network reasoning phase consumes less memory.

Through calculation, compared with the method that the sub-pixel convolution is not used, the common convolution with the output of 3 channels is used for replacement, and the memory consumption is reduced by about 16 times.

The net end effect is improved.

The experimental conditions are as follows: network a (sub-pixel convolution) and network B (normal convolution), using the same training set, training the same number of iterations, using the same test set, yields the following results:

	PSNR	SSIM
			network A	32.10	0.8958
Network B	31.23	0.8901

Compared with the network B, the network A improves the PSNR (peak signal-to-noise ratio) by 2.79 percent and the SSIM (structural similarity) by 0.64 percent, so that the final inference result of the network is improved.

Due to the use of the residual error learning technology, the phenomenon that the gradient of a deep network disappears in the training stage of the network is relieved, and the convergence speed of the final effect network in the training stage is greatly improved;

the experimental conditions are as follows: network a (using residual learning), network B (not using residual learning), other conditions being the same;

the experimental results are shown in fig. 3;

as is apparent from fig. 3, the network a using the residual learning technique has a significantly improved network convergence speed compared to the network B not using residual learning, which means that in the model training stage, the network a using the residual learning technique can obtain a better result with a smaller number of training iterations, and reduces the time and computational cost consumed by training the network.

Due to the lightweight design of the network structure, the network model is smaller and more exquisite, and the final effect is that the network reasoning speed is increased; memory consumption is reduced; the final result is a slight degradation in quality.

The experimental environment is as follows: the network A uses 5 residual blocks, the number of convolution kernels of each convolution layer is 64, the network B uses 10 residual blocks, the number of convolution kernels of each convolution layer is 128, and other conditions are the same;

	PSNR	SSIM	amount of ginseng	Time consuming
					Network A	32.10	0.8958	101179	2657s
Network B	32.81	0.9039	755931	8684s

As can be seen from the above table, although the quality of the design network output result of 10 residual blocks is improved, the parameter number is greatly improved, the parameter number and the training time are respectively reduced by 86.61% and 69.40% under the cost that PSNR and SSIM are respectively reduced by 2.16% and 0.89%, and the network model is very disadvantageous to be deployed on an embedded platform due to large parameter amount and long training time, which results in greatly improved model inference time and memory consumption, and improved cost due to the requirement on a hardware platform. Through a plurality of tests, a better balance point can be obtained between the performance and the quality by using the given structure in the invention.

And step 3: due to the use of the model quantization technology, the network weight is converted from the Float32 type into the Int8 type, and the final effect is that the speed of the network inference stage is greatly improved; the memory consumption in the network reasoning stage is reduced; the network reasoning output index is not obviously reduced.

The speed of the network reasoning phase is greatly improved.

The data type before quantization is Float32 (32-bit floating point type), a large amount of DSP slice resources are consumed for calculation in the FPGA, and because the resources are very limited, a batch of data needs to be divided into a plurality of batches for respective calculation, while the resources consumed for calculation by using Int8 (8-bit integer type) are few, so that a batch of data only needs to be divided into a few batches for calculation, and the calculation time is greatly reduced.

The network reasoning phase consumes less memory.

Since the weight is changed to Int8, the characteristic diagram calculated in the network is also of type Int8, and compared with type Float32, the use of type Int8 can directly reduce the memory space consumption by 4 times.

The decrease of the network reasoning output index is not obvious

The experimental conditions are as follows: network A uses quantization techniques and network B does not use quantization techniques

	PSNR	SSIM
			Network A	32.10	0.8958
Network B	32.06	0.8957

From the data, the evaluation index of the quantized network output image is slightly reduced but the amplitude is very small, the quality reduction of the output image caused by the index reduction can hardly be observed by human eyes, the reasoning speed is greatly improved, the memory consumption is greatly reduced, and the quantization is very important for the neural network model to be deployed on a portable platform with low computing power and low power consumption.

Claims

1. A construction method of a super-resolution convolutional neural network model is characterized by comprising the following steps,

step 1: constructing a convolution neural network model by using two methods of 5 residual blocks and sub-pixel convolution;

step 2: training the convolution neural network model in the step 1; training by using 100 pictures of 1920x1080x3, wherein the pictures comprise 50 landscape pictures and 50 paintings, training data and verification data are 9:1, input of convolutional neural network model training is obtained by reducing the pictures to 640x360x3 and is input into the model constructed in the step 1, an original 1920x1080x3 image is used as a Lable of a network, and a finally trained convolutional neural network model is stored;

2. The method for constructing a super-resolution convolutional neural network model as claimed in claim 1, wherein step 3 is to perform model freezing, model quantization and model compiling on the convolutional neural network model trained in step 2.

3. The method for constructing a super-resolution convolutional neural network model of claim 2, wherein the model freezing body is to combine the definition of the model computation graph and the model weights into the same file.

4. The method for constructing the super-resolution convolutional neural network model of claim 2, wherein the model quantization is a quantization model which can be conveniently quantized by correcting the quantized model through a quantization data set and calling a percentage _ q tool carried by a DNNDK tool kit through a script.

5. The method for constructing a super-resolution convolutional neural network model as claimed in claim 2, wherein the model is compiled by compiling the quantized model into a model that can be run by a DPU.