CN115965527A - Mobile terminal lightweight image super-resolution reconstruction method based on convolutional neural network - Google Patents

Mobile terminal lightweight image super-resolution reconstruction method based on convolutional neural network Download PDF

Info

Publication number
CN115965527A
CN115965527A CN202211647267.0A CN202211647267A CN115965527A CN 115965527 A CN115965527 A CN 115965527A CN 202211647267 A CN202211647267 A CN 202211647267A CN 115965527 A CN115965527 A CN 115965527A
Authority
CN
China
Prior art keywords
resolution
network
mobile terminal
training
operator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211647267.0A
Other languages
Chinese (zh)
Inventor
周洲
晁佳豪
高洪帆
龚嘉礼
杨争峰
曾振柄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN202211647267.0A priority Critical patent/CN115965527A/en
Publication of CN115965527A publication Critical patent/CN115965527A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Image Processing (AREA)

Abstract

The invention discloses a mobile terminal lightweight image super-resolution reconstruction method based on a convolutional neural network. And then constructing a lightweight image super-resolution network suitable for the mobile terminal, wherein the network comprises a network during training and a network during reasoning, and the network during reasoning is converted by using an equivalent conversion method during training. The equivalent transformation method replaces the operator which is time-consuming at the mobile end with the convolution which is less time-consuming. During training, the loss of the original pictures in the data set and the loss of the generated high-resolution pictures are compared by using the training data set and the constructed lightweight super-resolution network, and back propagation calculation is carried out based on the loss until the training is finished. During reasoning, a more concise network of equivalent transformation is used, the model is small in size, and the output speed is high. The invention has the advantages that: the scene based on the mobile terminal is optimized and adapted, and a simple and efficient super-resolution network is constructed by using an equivalent transformation method.

Description

Mobile terminal lightweight image super-resolution reconstruction method based on convolutional neural network
Technical Field
The invention relates to the technical field of digital image processing, in particular to a mobile-end lightweight image super-resolution reconstruction method only depending on a convolutional neural network and based on an equivalent transformation technology.
Background
Image and video super-resolution converts low-resolution to high-resolution and has wide application in many fields including medical imaging, satellite imaging, medical imaging, and the like. For example, companies have advanced Standard Definition Television (SDTV) to High Definition Television (HDTV) using video super resolution techniques based on deep learning. In recent years, with the popularization and performance improvement of mobile terminal devices, it is very important to optimize and deploy an existing image super-resolution model to a mobile terminal.
The image super-resolution model is divided into a traditional super-resolution model, and is based on interpolation such as bilinear interpolation and bicubic interpolation, so that the implementation is simple but the effect is poor, the super-resolution model based on deep learning is complex, a large amount of cost and computing resources are needed, and the super-resolution effect is good. However, existing super-resolution models are rarely optimized and adapted in relation to the hardware of the terminal device.
Convolutional Neural Networks (CNNs) generally consist of Convolutional layers, pooling layers, and full-link layers, and can extract picture features and reduce a large number of parameters into a small number of parameters. At present, the convolutional neural network has an excellent effect in the field of image super-resolution. More and more complex convolutional networks have also emerged in recent years for image super-resolution (i.e., deeper and more convolutional layers) along with more and more excellent performance. For example, multi-scale Deep Super Resolution (MDSR) (Enhanced Deep Resolution Networks for Single Image Super resolution.2017IEEE Resolution on Computer Vision and Pattern Recognition Works (CVPRW)) by Lim et al has 160 layers of Convolutional Networks, while Super Resolution Conditional Neural Network (SRCNN) (Image Super-Resolution Using Deep conditional Networks. IEEE Transactions on Pattern Analysis and Machine interpretation, 295 (2), 295-307) by Dong et al, which was originally proposed.
https:// doi.org/10.1109/tpami.2015.2439281) has only 3 layers of convolutional networks. It has the following disadvantages:
1. most of the image super-resolution based on deep learning has good accuracy of image recovery, but the complexity, storage and time consumption of model training and prediction are high. For example, the SwinIR model has a better Image super-resolution effect based on Swin transform (SwinIR: image storage Using Swin transform.2021IEEE/CVF International Conference on Computer Vision workstations (ICCVW), 1833-1844.Https:// doi.org/10.1109/ICCVW54120.2021.00210), but the model has a parameter amount of about 12M, and is not suitable for the case of a mobile terminal.
2. Some smaller convolutional neural network super-resolution networks can achieve near-real-time speed at the mobile terminal device, but the super-resolution accuracy measured by PSNR is relatively limited.
The structural parameterization technology is that a larger model is used in the training process, and the model is converted into another group of parameters for reasoning through equivalent conversion of the parameters, so that the model used in the reasoning is smaller, less resources are consumed, and the precision of the large model is reserved. The structure parameterization technology has a good application scene in a mobile terminal scene. For example, zhang et al propose Edge-oriented gradient Block For Real-time Super Resolution on Mobile devices, procedents of the 29th ACM International Conference on multimedia,4034-4043 https:// doi.org/10.1145/3474085.3475291) a re-parameterization module ECB For the task, which contains 3 x 3,1 x 1 Convolution, the relevant gradient information is folded into a 3 x 3 Convolution at inference, thereby reducing the volume of the module at inference time and further speeding up the inference speed of the move end. It has the following disadvantages:
1. the re-parameterization technique combines several convolutions into one convolution, and is not suitable for the case where a nonlinear layer such as ReLU exists in the middle of the convolution.
2. The operator of the existing equipment at the mobile terminal is not specifically analyzed by the existing re-parameterization technology, so that the existing re-parameterization technology is not suitable for the scenes of the mobile terminal. For example, no relevant optimization is performed on the int8 quantization model of the existing smart tv platform.
Disclosure of Invention
The invention aims to provide a mobile terminal lightweight image super-resolution reconstruction method based on a convolutional neural network, aiming at the defects of the prior art. Only depending on the convolutional neural network and based on the equivalent transformation technology, the model is small, the image recovery quality is good, the training speed is high, and the method is suitable for scenes of a mobile terminal.
In order to realize the purpose, the technical scheme adopted by the invention is as follows:
a mobile terminal lightweight image super-resolution reconstruction method based on a convolutional neural network comprises the following steps:
s1: using a data set DIV2K, wherein the DIV2K data set comprises hundreds of pictures, and generating low-resolution pictures by double-triple downsampling of high-resolution pictures to obtain a training data set;
s2: constructing an image super-resolution network for a mobile terminal
S21: constructing an image super-resolution network in a training stage, wherein the network comprises the following components:
the characteristic extraction part is used for extracting the characteristics of the picture by using the convolution layer and the re-parameterization module;
an image reconstruction unit that reconstructs the extracted features by using pixel reconstruction and adds a global residual error; finally, the features are converted into the [0,255] range using the operator Clip to be suitable for int8 quantization scenarios;
s22: equivalently converting the super-resolution network in the training stage in the step S21 to obtain an image super-resolution network in the inference stage, that is, an image super-resolution network for the mobile terminal, specifically including:
for the operator repeat, the
Figure BDA0004010229200000021
To obtain
Figure BDA0004010229200000022
The operator repeat is replaced with a convolution with the convolution kernel repeat (I, n) where x is the input tensor, I is the identity matrix, n represents the input tensor repeated n times,
Figure BDA0004010229200000031
is a convolution operation;
for the operator add, there are two convolutional networks in the network in the training phase, i.e.
Figure BDA0004010229200000032
And
Figure BDA0004010229200000033
wherein W 1 And W 2 Are two different convolution kernels, x and y are two different input tensors, b 1 And b 2 Is the tensor corresponding to the convolution kernel;
the operator add is converted such that the convolution kernel becomes [ W ] 1 ,W 2 ]Bias becomes b 1 +b 2
Figure BDA0004010229200000034
/>
For the operator concat, conv2d _ ReLU layer in the trained network, then the transformation is as follows, the convolution kernel becomes
Figure BDA0004010229200000035
Is biased to be->
Figure BDA0004010229200000036
Figure BDA0004010229200000037
For the operator clip, according to the equivalent transformation relationship between the operator clip and the ReLU, the following steps are carried out:
clip(x)=ReLU(-ReLU(-x+255)+255)
equivalently converting the operator clip into two convolution layers with a convolution kernel of-I and an offset of 255
Figure BDA0004010229200000038
Figure BDA0004010229200000039
S3: image super-resolution network for training mobile terminal
Inputting the training data set obtained in the step S1 into a built image super-resolution network of the mobile terminal, and outputting a high-resolution picture; randomly rotating and turning pictures in the data set, comparing the loss of the original pictures in the data set with the loss of the generated high-resolution pictures, and performing back propagation calculation based on the loss until the training is finished; the loss function is L1 loss, i.e., MAE
Figure BDA00040102292000000310
Compared with the prior art, the invention has the advantages that:
1) On the basis of maintaining good peak signal to noise ratio (PSNR) super-resolution accuracy, the speed on the mobile terminal device is quite high, single picture can realize super-resolution of multiple times of 2 and 3 within 30ms, and meanwhile, the method can be operated on mobile portable equipment.
2) Compared with the prior model with the same parameter quantity, the method has larger improvement in the accuracy of the measurement by using the peak signal-to-noise ratio (PSNR), can realize the inference speed with the PNSR of 31.1 and 14.6ms on the test of 3 times of the Set5 data Set, and has larger improvement compared with the prior ECBSR method which is optimized for the mobile terminal equipment, namely the PSNR of 30.8 and the inference speed of 13.3 ms.
3) Compared with a re-parameterization technology and the like, the method provided by the invention performs equivalent transformation on the convolution layer and the ReLU, and converts the convolution layer and the ReLU into a network when a model is simpler to reason.
4) Clip operators are optimized, namely int8 quantization models of the current intelligent television platform are optimized and adapted correspondingly.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram of a super-resolution image model according to the present invention;
fig. 3 is a diagram of an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings by way of examples.
Referring to fig. 1, the present invention provides a mobile terminal lightweight image super-resolution reconstruction method based on a convolutional neural network, aiming at the defects of the prior art. Only depending on the convolutional neural network and based on the equivalent transformation technology, the model is small, the image recovery quality is good, the training speed is high, and the method is suitable for scenes of a mobile terminal. Specifically, the method does not comprise the following steps:
s1: using a data set DIV2K, wherein the DIV2K data set comprises hundreds of pictures, and generating a low-resolution picture by double-triple downsampling of a high-resolution picture to obtain a training data set;
s2: and constructing an image super-resolution network suitable for the mobile terminal.
S3: and training the image super-resolution network of the mobile terminal, inputting the training set into the constructed image super-resolution network, outputting high-resolution pictures, and randomly rotating and overturning the pictures in the data set.
S4: and comparing the loss of the original picture in the data set with the loss of the generated high-resolution picture, and performing back propagation calculation based on the loss until the training is finished. The loss function is L1 loss, i.e., MAE
Figure BDA0004010229200000041
Referring to fig. 2, the construction of the image super-resolution network applicable to the mobile terminal disclosed by the present invention specifically includes:
s22: the image super-resolution network in the training stage mainly comprises the following parts. The characteristic extraction part is used for extracting the characteristics of the picture by using the convolution layer and the re-parameterization module; in the image reconstruction part, the extracted features are reconstructed using pixel reconstruction, and a global residual is added. Finally, the features are transformed into the appropriate range using the operator clip to fit into the int8 quantization model.
S23: and the image super-resolution model in the inference stage is obtained by equivalent transformation of the image super-resolution model obtained in the step S22, and the operator repeat, the operator add, the operator concat and the operator clip are respectively subjected to equivalent transformation. Specific equivalent transformation procedures are as follows. For the operator repeat, the
Figure BDA0004010229200000042
Can obtain
Figure BDA0004010229200000043
Then the operator repeat can be replaced with a convolution with the convolution kernel repeat (I, n), where x is the input tensor, I is the identity matrix, n represents the input tensor repeated n times,
Figure BDA0004010229200000051
is a convolution operation;
for the operator add, the operator add is,preceding the network in the training phase are two convolutional networks, i.e.
Figure BDA0004010229200000052
And
Figure BDA0004010229200000053
two convolutions of which W 1 And W 2 Are two different convolution kernels, x and y are two different input tensors, b 1 And b 2 Is the tensor for the convolution kernel. The operator add can be translated as follows, i.e. the convolution kernel becomes [ W ] 1 ,W 2 ]Bias becomes b 1 +b 2
Figure BDA0004010229200000054
For the operator concat, preceded in the trained network by a Conv2d _ ReLU layer, we can translate as follows, the convolution kernel becomes
Figure BDA0004010229200000055
Is biased to be->
Figure BDA0004010229200000056
Figure BDA0004010229200000057
For the operator clip, the equivalent transformation relation between the operator clip and the ReLU is
clip(x)=ReLU(-ReLU(-x+255)+255)
The operator clip can be equivalently converted into two convolution layers, the convolution kernel is-I, and the bias is 255
Figure BDA0004010229200000058
Figure BDA0004010229200000059
Through the equivalent transformation of the four operators, the model in the training process can be transformed into the model in reasoning, and the training speed in reasoning is greatly accelerated.
Examples
Referring to fig. 3, the present embodiment specifically includes the following steps:
s1: and obtaining a training data set, wherein the data set DIV2K is used, the DIV2K data set comprises 800 pictures, and the low-resolution pictures are generated by double-triple downsampling of the high-resolution pictures. Specifically, (a) in fig. 3 is a real picture, and (b) in fig. 3 is a low resolution picture generated using bicubic downsampling;
s2: constructing an image super-resolution network suitable for a mobile terminal;
s3: and training an image super-resolution network of the mobile terminal, inputting the training set into the constructed image super-resolution network, outputting high-resolution pictures, randomly rotating and overturning the pictures in the data set, comparing the loss of the original pictures in the data set with the loss of the generated high-resolution pictures, and performing back propagation calculation based on the loss until the training is finished. The loss function is L1 loss, i.e., MAE
Figure BDA00040102292000000510
S4: when the picture to be tested is input into the inference network, the low-resolution picture generated by using the bicubic downsampling (fig. 3 b) is input into the inference network, so that the picture (c) can be obtained, and the picture (c) in fig. 3 realizes better PSNR and better visual effect compared with the picture (b) in fig. 3, and is closer to the real picture (a). When better PSNR and visual effect are obtained, the model optimizes the mobile terminal, so the speed is higher and the weight is lighter.
It will be appreciated by those of ordinary skill in the art that the examples described herein are intended to assist the reader in understanding the manner in which the invention is practiced, and it is to be understood that the scope of the invention is not limited to such specifically recited statements and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims (1)

1. A mobile terminal lightweight image super-resolution reconstruction method based on a convolutional neural network is characterized by comprising the following steps:
s1: using a data set DIV2K, wherein the DIV2K data set comprises hundreds of pictures, and generating low-resolution pictures by double-triple downsampling of high-resolution pictures to obtain a training data set;
s2: constructing an image super-resolution network for a mobile terminal
S21: constructing an image super-resolution network in a training stage, wherein the network comprises the following components:
the characteristic extraction part is used for extracting the characteristics of the picture by using the convolution layer and the re-parameterization module;
an image reconstruction unit that reconstructs the extracted features by using pixel reconstruction and adds a global residual error; finally, the feature is transformed into the [0,255] range using the operator Clip to apply to int8 quantization scenarios;
s22: equivalently converting the super-resolution network in the training stage in the step S21 to obtain an image super-resolution network in the inference stage, that is, an image super-resolution network for the mobile terminal, specifically including:
for the operator repeat, the
Figure FDA0004010229190000011
To obtain
Figure FDA0004010229190000012
The operator repeat is replaced with a convolution with the convolution kernel repeat (I, n) where x is the input tensor, I is the identity matrix, n represents the input tensor repeated n times,
Figure FDA0004010229190000013
is a convolution operation;
for the operator add, there are two convolutional networks in the network in the training phase, i.e.
Figure FDA0004010229190000014
And &>
Figure FDA0004010229190000015
Wherein W 1 And W 2 Are two different convolution kernels, x and y are two different input tensors, b 1 And b 2 Is the tensor corresponding to the convolution kernel;
the operator add is converted such that the convolution kernel becomes [ W ] 1 ,W 2 ]Bias becomes b 1 +b 2
Figure FDA0004010229190000016
For the operator concat, conv2d _ ReLU layer in the trained network, then the transformation is such that the convolution kernel becomes
Figure FDA0004010229190000017
Is biased to be->
Figure FDA0004010229190000018
Figure FDA0004010229190000019
For the operator clip, according to the equivalent transformation relationship between the operator clip and the ReLU, the following steps are carried out:
clip(x)=ReLU(-ReLU(-x+255)+255)
equivalently converting the operator clip into two convolution layers with a convolution kernel of-I and an offset of 255
Figure FDA00040102291900000110
Figure FDA00040102291900000111
S3: image super-resolution network for training mobile terminal
Inputting the training data set obtained in the step S1 into a built image super-resolution network of the mobile terminal, and outputting a high-resolution picture; randomly rotating and overturning the pictures in the data set, comparing the loss of the original pictures in the data set with the loss of the generated high-resolution pictures, and performing back propagation calculation based on the loss until the training is finished; the loss function is L1 loss, i.e. MAE
Figure FDA0004010229190000021
/>
CN202211647267.0A 2022-12-21 2022-12-21 Mobile terminal lightweight image super-resolution reconstruction method based on convolutional neural network Pending CN115965527A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211647267.0A CN115965527A (en) 2022-12-21 2022-12-21 Mobile terminal lightweight image super-resolution reconstruction method based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211647267.0A CN115965527A (en) 2022-12-21 2022-12-21 Mobile terminal lightweight image super-resolution reconstruction method based on convolutional neural network

Publications (1)

Publication Number Publication Date
CN115965527A true CN115965527A (en) 2023-04-14

Family

ID=87362621

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211647267.0A Pending CN115965527A (en) 2022-12-21 2022-12-21 Mobile terminal lightweight image super-resolution reconstruction method based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN115965527A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116205284A (en) * 2023-05-05 2023-06-02 北京蔚领时代科技有限公司 Super-division network, method, device and equipment based on novel re-parameterized structure

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116205284A (en) * 2023-05-05 2023-06-02 北京蔚领时代科技有限公司 Super-division network, method, device and equipment based on novel re-parameterized structure

Similar Documents

Publication Publication Date Title
Liu et al. Self-attention negative feedback network for real-time image super-resolution
Li et al. Deep learning methods in real-time image super-resolution: a survey
CN108492249B (en) Single-frame super-resolution reconstruction method based on small convolution recurrent neural network
Fu et al. Image super-resolution based on generative adversarial networks: A brief review
CN109949221B (en) Image processing method and electronic equipment
Luo et al. Lattice network for lightweight image restoration
Xia et al. A group variational transformation neural network for fractional interpolation of video coding
Yang et al. License plate image super-resolution based on convolutional neural network
CN115936985A (en) Image super-resolution reconstruction method based on high-order degradation cycle generation countermeasure network
CN115965527A (en) Mobile terminal lightweight image super-resolution reconstruction method based on convolutional neural network
Yang et al. MRDN: A lightweight Multi-stage residual distillation network for image Super-Resolution
Wang et al. Multi-scale fast Fourier transform based attention network for remote-sensing image super-resolution
CN112150356A (en) Single compressed image super-resolution reconstruction method based on cascade framework
CN110288529A (en) A kind of single image super resolution ratio reconstruction method being locally synthesized network based on recurrence
Cheng et al. Adaptive feature denoising based deep convolutional network for single image super-resolution
Kasem et al. Revised spatial transformer network towards improved image super-resolutions
Guo et al. A novel lightweight multi-dimension feature fusion network for single-image super-resolution reconstruction
CN116503251A (en) Super-resolution reconstruction method for generating countermeasure network remote sensing image by combining hybrid expert
Li et al. Compression artifact removal with stacked multi-context channel-wise attention network
Qiu et al. Nested Dense Attention Network for Single Image Super-Resolution
Wang et al. Multi-scale detail enhancement network for image super-resolution
Kasem et al. DRCS-SR: Deep robust compressed sensing for single image super-resolution
Zhao et al. Tree-Like Branching Network for Single Image Super-Resolution with Divide-and-Conquer
Wu et al. Lightweight convolutional neural network with SE module for image super-resolution
Du et al. Augmented global attention network for image super‐resolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination