CN115965527A

CN115965527A - Mobile terminal lightweight image super-resolution reconstruction method based on convolutional neural network

Info

Publication number: CN115965527A
Application number: CN202211647267.0A
Authority: CN
Inventors: 周洲; 晁佳豪; 高洪帆; 龚嘉礼; 杨争峰; 曾振柄
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2022-12-21
Filing date: 2022-12-21
Publication date: 2023-04-14

Abstract

The invention discloses a mobile terminal lightweight image super-resolution reconstruction method based on a convolutional neural network. And then constructing a lightweight image super-resolution network suitable for the mobile terminal, wherein the network comprises a network during training and a network during reasoning, and the network during reasoning is converted by using an equivalent conversion method during training. The equivalent transformation method replaces the operator which is time-consuming at the mobile end with the convolution which is less time-consuming. During training, the loss of the original pictures in the data set and the loss of the generated high-resolution pictures are compared by using the training data set and the constructed lightweight super-resolution network, and back propagation calculation is carried out based on the loss until the training is finished. During reasoning, a more concise network of equivalent transformation is used, the model is small in size, and the output speed is high. The invention has the advantages that: the scene based on the mobile terminal is optimized and adapted, and a simple and efficient super-resolution network is constructed by using an equivalent transformation method.

Description

Mobile terminal lightweight image super-resolution reconstruction method based on convolutional neural network

Technical Field

The invention relates to the technical field of digital image processing, in particular to a mobile-end lightweight image super-resolution reconstruction method only depending on a convolutional neural network and based on an equivalent transformation technology.

Background

Image and video super-resolution converts low-resolution to high-resolution and has wide application in many fields including medical imaging, satellite imaging, medical imaging, and the like. For example, companies have advanced Standard Definition Television (SDTV) to High Definition Television (HDTV) using video super resolution techniques based on deep learning. In recent years, with the popularization and performance improvement of mobile terminal devices, it is very important to optimize and deploy an existing image super-resolution model to a mobile terminal.

The image super-resolution model is divided into a traditional super-resolution model, and is based on interpolation such as bilinear interpolation and bicubic interpolation, so that the implementation is simple but the effect is poor, the super-resolution model based on deep learning is complex, a large amount of cost and computing resources are needed, and the super-resolution effect is good. However, existing super-resolution models are rarely optimized and adapted in relation to the hardware of the terminal device.

Convolutional Neural Networks (CNNs) generally consist of Convolutional layers, pooling layers, and full-link layers, and can extract picture features and reduce a large number of parameters into a small number of parameters. At present, the convolutional neural network has an excellent effect in the field of image super-resolution. More and more complex convolutional networks have also emerged in recent years for image super-resolution (i.e., deeper and more convolutional layers) along with more and more excellent performance. For example, multi-scale Deep Super Resolution (MDSR) (Enhanced Deep Resolution Networks for Single Image Super resolution.2017IEEE Resolution on Computer Vision and Pattern Recognition Works (CVPRW)) by Lim et al has 160 layers of Convolutional Networks, while Super Resolution Conditional Neural Network (SRCNN) (Image Super-Resolution Using Deep conditional Networks. IEEE Transactions on Pattern Analysis and Machine interpretation, 295 (2), 295-307) by Dong et al, which was originally proposed.

https:// doi.org/10.1109/tpami.2015.2439281) has only 3 layers of convolutional networks. It has the following disadvantages:

1. most of the image super-resolution based on deep learning has good accuracy of image recovery, but the complexity, storage and time consumption of model training and prediction are high. For example, the SwinIR model has a better Image super-resolution effect based on Swin transform (SwinIR: image storage Using Swin transform.2021IEEE/CVF International Conference on Computer Vision workstations (ICCVW), 1833-1844.Https:// doi.org/10.1109/ICCVW54120.2021.00210), but the model has a parameter amount of about 12M, and is not suitable for the case of a mobile terminal.

2. Some smaller convolutional neural network super-resolution networks can achieve near-real-time speed at the mobile terminal device, but the super-resolution accuracy measured by PSNR is relatively limited.

The structural parameterization technology is that a larger model is used in the training process, and the model is converted into another group of parameters for reasoning through equivalent conversion of the parameters, so that the model used in the reasoning is smaller, less resources are consumed, and the precision of the large model is reserved. The structure parameterization technology has a good application scene in a mobile terminal scene. For example, zhang et al propose Edge-oriented gradient Block For Real-time Super Resolution on Mobile devices, procedents of the 29th ACM International Conference on multimedia,4034-4043 https:// doi.org/10.1145/3474085.3475291) a re-parameterization module ECB For the task, which contains 3 x 3,1 x 1 Convolution, the relevant gradient information is folded into a 3 x 3 Convolution at inference, thereby reducing the volume of the module at inference time and further speeding up the inference speed of the move end. It has the following disadvantages:

1. the re-parameterization technique combines several convolutions into one convolution, and is not suitable for the case where a nonlinear layer such as ReLU exists in the middle of the convolution.

2. The operator of the existing equipment at the mobile terminal is not specifically analyzed by the existing re-parameterization technology, so that the existing re-parameterization technology is not suitable for the scenes of the mobile terminal. For example, no relevant optimization is performed on the int8 quantization model of the existing smart tv platform.

Disclosure of Invention

The invention aims to provide a mobile terminal lightweight image super-resolution reconstruction method based on a convolutional neural network, aiming at the defects of the prior art. Only depending on the convolutional neural network and based on the equivalent transformation technology, the model is small, the image recovery quality is good, the training speed is high, and the method is suitable for scenes of a mobile terminal.

In order to realize the purpose, the technical scheme adopted by the invention is as follows:

a mobile terminal lightweight image super-resolution reconstruction method based on a convolutional neural network comprises the following steps:

s1: using a data set DIV2K, wherein the DIV2K data set comprises hundreds of pictures, and generating low-resolution pictures by double-triple downsampling of high-resolution pictures to obtain a training data set;

s2: constructing an image super-resolution network for a mobile terminal

S21: constructing an image super-resolution network in a training stage, wherein the network comprises the following components:

the characteristic extraction part is used for extracting the characteristics of the picture by using the convolution layer and the re-parameterization module;

an image reconstruction unit that reconstructs the extracted features by using pixel reconstruction and adds a global residual error; finally, the features are converted into the [0,255] range using the operator Clip to be suitable for int8 quantization scenarios;

s22: equivalently converting the super-resolution network in the training stage in the step S21 to obtain an image super-resolution network in the inference stage, that is, an image super-resolution network for the mobile terminal, specifically including:

for the operator repeat, the

To obtain

The operator repeat is replaced with a convolution with the convolution kernel repeat (I, n) where x is the input tensor, I is the identity matrix, n represents the input tensor repeated n times,

is a convolution operation;

for the operator add, there are two convolutional networks in the network in the training phase, i.e.

And

wherein W ₁ And W ₂ Are two different convolution kernels, x and y are two different input tensors, b ₁ And b ₂ Is the tensor corresponding to the convolution kernel;

the operator add is converted such that the convolution kernel becomes [ W ] ₁ ,W ₂ ]Bias becomes b ₁ +b ₂

/>

For the operator concat, conv2d _ ReLU layer in the trained network, then the transformation is as follows, the convolution kernel becomes

Is biased to be->

For the operator clip, according to the equivalent transformation relationship between the operator clip and the ReLU, the following steps are carried out:

clip(x)＝ReLU(-ReLU(-x+255)+255)

equivalently converting the operator clip into two convolution layers with a convolution kernel of-I and an offset of 255

S3: image super-resolution network for training mobile terminal

Inputting the training data set obtained in the step S1 into a built image super-resolution network of the mobile terminal, and outputting a high-resolution picture; randomly rotating and turning pictures in the data set, comparing the loss of the original pictures in the data set with the loss of the generated high-resolution pictures, and performing back propagation calculation based on the loss until the training is finished; the loss function is L1 loss, i.e., MAE

Compared with the prior art, the invention has the advantages that:

1) On the basis of maintaining good peak signal to noise ratio (PSNR) super-resolution accuracy, the speed on the mobile terminal device is quite high, single picture can realize super-resolution of multiple times of 2 and 3 within 30ms, and meanwhile, the method can be operated on mobile portable equipment.

2) Compared with the prior model with the same parameter quantity, the method has larger improvement in the accuracy of the measurement by using the peak signal-to-noise ratio (PSNR), can realize the inference speed with the PNSR of 31.1 and 14.6ms on the test of 3 times of the Set5 data Set, and has larger improvement compared with the prior ECBSR method which is optimized for the mobile terminal equipment, namely the PSNR of 30.8 and the inference speed of 13.3 ms.

3) Compared with a re-parameterization technology and the like, the method provided by the invention performs equivalent transformation on the convolution layer and the ReLU, and converts the convolution layer and the ReLU into a network when a model is simpler to reason.

4) Clip operators are optimized, namely int8 quantization models of the current intelligent television platform are optimized and adapted correspondingly.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a diagram of a super-resolution image model according to the present invention;

fig. 3 is a diagram of an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings by way of examples.

Referring to fig. 1, the present invention provides a mobile terminal lightweight image super-resolution reconstruction method based on a convolutional neural network, aiming at the defects of the prior art. Only depending on the convolutional neural network and based on the equivalent transformation technology, the model is small, the image recovery quality is good, the training speed is high, and the method is suitable for scenes of a mobile terminal. Specifically, the method does not comprise the following steps:

s1: using a data set DIV2K, wherein the DIV2K data set comprises hundreds of pictures, and generating a low-resolution picture by double-triple downsampling of a high-resolution picture to obtain a training data set;

s2: and constructing an image super-resolution network suitable for the mobile terminal.

S3: and training the image super-resolution network of the mobile terminal, inputting the training set into the constructed image super-resolution network, outputting high-resolution pictures, and randomly rotating and overturning the pictures in the data set.

S4: and comparing the loss of the original picture in the data set with the loss of the generated high-resolution picture, and performing back propagation calculation based on the loss until the training is finished. The loss function is L1 loss, i.e., MAE

Referring to fig. 2, the construction of the image super-resolution network applicable to the mobile terminal disclosed by the present invention specifically includes:

s22: the image super-resolution network in the training stage mainly comprises the following parts. The characteristic extraction part is used for extracting the characteristics of the picture by using the convolution layer and the re-parameterization module; in the image reconstruction part, the extracted features are reconstructed using pixel reconstruction, and a global residual is added. Finally, the features are transformed into the appropriate range using the operator clip to fit into the int8 quantization model.

S23: and the image super-resolution model in the inference stage is obtained by equivalent transformation of the image super-resolution model obtained in the step S22, and the operator repeat, the operator add, the operator concat and the operator clip are respectively subjected to equivalent transformation. Specific equivalent transformation procedures are as follows. For the operator repeat, the

Can obtain

Then the operator repeat can be replaced with a convolution with the convolution kernel repeat (I, n), where x is the input tensor, I is the identity matrix, n represents the input tensor repeated n times,

is a convolution operation;

for the operator add, the operator add is,preceding the network in the training phase are two convolutional networks, i.e.

And

two convolutions of which W ₁ And W ₂ Are two different convolution kernels, x and y are two different input tensors, b ₁ And b ₂ Is the tensor for the convolution kernel. The operator add can be translated as follows, i.e. the convolution kernel becomes [ W ] ₁ ,W ₂ ]Bias becomes b ₁ +b ₂

For the operator concat, preceded in the trained network by a Conv2d _ ReLU layer, we can translate as follows, the convolution kernel becomes

Is biased to be->

For the operator clip, the equivalent transformation relation between the operator clip and the ReLU is

clip(x)＝ReLU(-ReLU(-x+255)+255)

The operator clip can be equivalently converted into two convolution layers, the convolution kernel is-I, and the bias is 255

Through the equivalent transformation of the four operators, the model in the training process can be transformed into the model in reasoning, and the training speed in reasoning is greatly accelerated.

Examples

Referring to fig. 3, the present embodiment specifically includes the following steps:

s1: and obtaining a training data set, wherein the data set DIV2K is used, the DIV2K data set comprises 800 pictures, and the low-resolution pictures are generated by double-triple downsampling of the high-resolution pictures. Specifically, (a) in fig. 3 is a real picture, and (b) in fig. 3 is a low resolution picture generated using bicubic downsampling;

s2: constructing an image super-resolution network suitable for a mobile terminal;

s3: and training an image super-resolution network of the mobile terminal, inputting the training set into the constructed image super-resolution network, outputting high-resolution pictures, randomly rotating and overturning the pictures in the data set, comparing the loss of the original pictures in the data set with the loss of the generated high-resolution pictures, and performing back propagation calculation based on the loss until the training is finished. The loss function is L1 loss, i.e., MAE

S4: when the picture to be tested is input into the inference network, the low-resolution picture generated by using the bicubic downsampling (fig. 3 b) is input into the inference network, so that the picture (c) can be obtained, and the picture (c) in fig. 3 realizes better PSNR and better visual effect compared with the picture (b) in fig. 3, and is closer to the real picture (a). When better PSNR and visual effect are obtained, the model optimizes the mobile terminal, so the speed is higher and the weight is lighter.

It will be appreciated by those of ordinary skill in the art that the examples described herein are intended to assist the reader in understanding the manner in which the invention is practiced, and it is to be understood that the scope of the invention is not limited to such specifically recited statements and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. A mobile terminal lightweight image super-resolution reconstruction method based on a convolutional neural network is characterized by comprising the following steps:

s2: constructing an image super-resolution network for a mobile terminal

an image reconstruction unit that reconstructs the extracted features by using pixel reconstruction and adds a global residual error; finally, the feature is transformed into the [0,255] range using the operator Clip to apply to int8 quantization scenarios;

for the operator repeat, the

To obtain

is a convolution operation;

And &>

For the operator concat, conv2d _ ReLU layer in the trained network, then the transformation is such that the convolution kernel becomes

Is biased to be->

clip(x)＝ReLU(-ReLU(-x+255)+255)

S3: image super-resolution network for training mobile terminal

Inputting the training data set obtained in the step S1 into a built image super-resolution network of the mobile terminal, and outputting a high-resolution picture; randomly rotating and overturning the pictures in the data set, comparing the loss of the original pictures in the data set with the loss of the generated high-resolution pictures, and performing back propagation calculation based on the loss until the training is finished; the loss function is L1 loss, i.e. MAE

/>