CN115965527A - Mobile terminal lightweight image super-resolution reconstruction method based on convolutional neural network - Google Patents
Mobile terminal lightweight image super-resolution reconstruction method based on convolutional neural network Download PDFInfo
- Publication number
- CN115965527A CN115965527A CN202211647267.0A CN202211647267A CN115965527A CN 115965527 A CN115965527 A CN 115965527A CN 202211647267 A CN202211647267 A CN 202211647267A CN 115965527 A CN115965527 A CN 115965527A
- Authority
- CN
- China
- Prior art keywords
- resolution
- network
- mobile terminal
- training
- operator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Image Processing (AREA)
Abstract
The invention discloses a mobile terminal lightweight image super-resolution reconstruction method based on a convolutional neural network. And then constructing a lightweight image super-resolution network suitable for the mobile terminal, wherein the network comprises a network during training and a network during reasoning, and the network during reasoning is converted by using an equivalent conversion method during training. The equivalent transformation method replaces the operator which is time-consuming at the mobile end with the convolution which is less time-consuming. During training, the loss of the original pictures in the data set and the loss of the generated high-resolution pictures are compared by using the training data set and the constructed lightweight super-resolution network, and back propagation calculation is carried out based on the loss until the training is finished. During reasoning, a more concise network of equivalent transformation is used, the model is small in size, and the output speed is high. The invention has the advantages that: the scene based on the mobile terminal is optimized and adapted, and a simple and efficient super-resolution network is constructed by using an equivalent transformation method.
Description
Technical Field
The invention relates to the technical field of digital image processing, in particular to a mobile-end lightweight image super-resolution reconstruction method only depending on a convolutional neural network and based on an equivalent transformation technology.
Background
Image and video super-resolution converts low-resolution to high-resolution and has wide application in many fields including medical imaging, satellite imaging, medical imaging, and the like. For example, companies have advanced Standard Definition Television (SDTV) to High Definition Television (HDTV) using video super resolution techniques based on deep learning. In recent years, with the popularization and performance improvement of mobile terminal devices, it is very important to optimize and deploy an existing image super-resolution model to a mobile terminal.
The image super-resolution model is divided into a traditional super-resolution model, and is based on interpolation such as bilinear interpolation and bicubic interpolation, so that the implementation is simple but the effect is poor, the super-resolution model based on deep learning is complex, a large amount of cost and computing resources are needed, and the super-resolution effect is good. However, existing super-resolution models are rarely optimized and adapted in relation to the hardware of the terminal device.
Convolutional Neural Networks (CNNs) generally consist of Convolutional layers, pooling layers, and full-link layers, and can extract picture features and reduce a large number of parameters into a small number of parameters. At present, the convolutional neural network has an excellent effect in the field of image super-resolution. More and more complex convolutional networks have also emerged in recent years for image super-resolution (i.e., deeper and more convolutional layers) along with more and more excellent performance. For example, multi-scale Deep Super Resolution (MDSR) (Enhanced Deep Resolution Networks for Single Image Super resolution.2017IEEE Resolution on Computer Vision and Pattern Recognition Works (CVPRW)) by Lim et al has 160 layers of Convolutional Networks, while Super Resolution Conditional Neural Network (SRCNN) (Image Super-Resolution Using Deep conditional Networks. IEEE Transactions on Pattern Analysis and Machine interpretation, 295 (2), 295-307) by Dong et al, which was originally proposed.
https:// doi.org/10.1109/tpami.2015.2439281) has only 3 layers of convolutional networks. It has the following disadvantages:
1. most of the image super-resolution based on deep learning has good accuracy of image recovery, but the complexity, storage and time consumption of model training and prediction are high. For example, the SwinIR model has a better Image super-resolution effect based on Swin transform (SwinIR: image storage Using Swin transform.2021IEEE/CVF International Conference on Computer Vision workstations (ICCVW), 1833-1844.Https:// doi.org/10.1109/ICCVW54120.2021.00210), but the model has a parameter amount of about 12M, and is not suitable for the case of a mobile terminal.
2. Some smaller convolutional neural network super-resolution networks can achieve near-real-time speed at the mobile terminal device, but the super-resolution accuracy measured by PSNR is relatively limited.
The structural parameterization technology is that a larger model is used in the training process, and the model is converted into another group of parameters for reasoning through equivalent conversion of the parameters, so that the model used in the reasoning is smaller, less resources are consumed, and the precision of the large model is reserved. The structure parameterization technology has a good application scene in a mobile terminal scene. For example, zhang et al propose Edge-oriented gradient Block For Real-time Super Resolution on Mobile devices, procedents of the 29th ACM International Conference on multimedia,4034-4043 https:// doi.org/10.1145/3474085.3475291) a re-parameterization module ECB For the task, which contains 3 x 3,1 x 1 Convolution, the relevant gradient information is folded into a 3 x 3 Convolution at inference, thereby reducing the volume of the module at inference time and further speeding up the inference speed of the move end. It has the following disadvantages:
1. the re-parameterization technique combines several convolutions into one convolution, and is not suitable for the case where a nonlinear layer such as ReLU exists in the middle of the convolution.
2. The operator of the existing equipment at the mobile terminal is not specifically analyzed by the existing re-parameterization technology, so that the existing re-parameterization technology is not suitable for the scenes of the mobile terminal. For example, no relevant optimization is performed on the int8 quantization model of the existing smart tv platform.
Disclosure of Invention
The invention aims to provide a mobile terminal lightweight image super-resolution reconstruction method based on a convolutional neural network, aiming at the defects of the prior art. Only depending on the convolutional neural network and based on the equivalent transformation technology, the model is small, the image recovery quality is good, the training speed is high, and the method is suitable for scenes of a mobile terminal.
In order to realize the purpose, the technical scheme adopted by the invention is as follows:
a mobile terminal lightweight image super-resolution reconstruction method based on a convolutional neural network comprises the following steps:
s1: using a data set DIV2K, wherein the DIV2K data set comprises hundreds of pictures, and generating low-resolution pictures by double-triple downsampling of high-resolution pictures to obtain a training data set;
s2: constructing an image super-resolution network for a mobile terminal
S21: constructing an image super-resolution network in a training stage, wherein the network comprises the following components:
the characteristic extraction part is used for extracting the characteristics of the picture by using the convolution layer and the re-parameterization module;
an image reconstruction unit that reconstructs the extracted features by using pixel reconstruction and adds a global residual error; finally, the features are converted into the [0,255] range using the operator Clip to be suitable for int8 quantization scenarios;
s22: equivalently converting the super-resolution network in the training stage in the step S21 to obtain an image super-resolution network in the inference stage, that is, an image super-resolution network for the mobile terminal, specifically including:
The operator repeat is replaced with a convolution with the convolution kernel repeat (I, n) where x is the input tensor, I is the identity matrix, n represents the input tensor repeated n times,is a convolution operation;
for the operator add, there are two convolutional networks in the network in the training phase, i.e.Andwherein W 1 And W 2 Are two different convolution kernels, x and y are two different input tensors, b 1 And b 2 Is the tensor corresponding to the convolution kernel;
the operator add is converted such that the convolution kernel becomes [ W ] 1 ,W 2 ]Bias becomes b 1 +b 2
For the operator concat, conv2d _ ReLU layer in the trained network, then the transformation is as follows, the convolution kernel becomesIs biased to be->
For the operator clip, according to the equivalent transformation relationship between the operator clip and the ReLU, the following steps are carried out:
clip(x)=ReLU(-ReLU(-x+255)+255)
equivalently converting the operator clip into two convolution layers with a convolution kernel of-I and an offset of 255
S3: image super-resolution network for training mobile terminal
Inputting the training data set obtained in the step S1 into a built image super-resolution network of the mobile terminal, and outputting a high-resolution picture; randomly rotating and turning pictures in the data set, comparing the loss of the original pictures in the data set with the loss of the generated high-resolution pictures, and performing back propagation calculation based on the loss until the training is finished; the loss function is L1 loss, i.e., MAE
Compared with the prior art, the invention has the advantages that:
1) On the basis of maintaining good peak signal to noise ratio (PSNR) super-resolution accuracy, the speed on the mobile terminal device is quite high, single picture can realize super-resolution of multiple times of 2 and 3 within 30ms, and meanwhile, the method can be operated on mobile portable equipment.
2) Compared with the prior model with the same parameter quantity, the method has larger improvement in the accuracy of the measurement by using the peak signal-to-noise ratio (PSNR), can realize the inference speed with the PNSR of 31.1 and 14.6ms on the test of 3 times of the Set5 data Set, and has larger improvement compared with the prior ECBSR method which is optimized for the mobile terminal equipment, namely the PSNR of 30.8 and the inference speed of 13.3 ms.
3) Compared with a re-parameterization technology and the like, the method provided by the invention performs equivalent transformation on the convolution layer and the ReLU, and converts the convolution layer and the ReLU into a network when a model is simpler to reason.
4) Clip operators are optimized, namely int8 quantization models of the current intelligent television platform are optimized and adapted correspondingly.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram of a super-resolution image model according to the present invention;
fig. 3 is a diagram of an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings by way of examples.
Referring to fig. 1, the present invention provides a mobile terminal lightweight image super-resolution reconstruction method based on a convolutional neural network, aiming at the defects of the prior art. Only depending on the convolutional neural network and based on the equivalent transformation technology, the model is small, the image recovery quality is good, the training speed is high, and the method is suitable for scenes of a mobile terminal. Specifically, the method does not comprise the following steps:
s1: using a data set DIV2K, wherein the DIV2K data set comprises hundreds of pictures, and generating a low-resolution picture by double-triple downsampling of a high-resolution picture to obtain a training data set;
s2: and constructing an image super-resolution network suitable for the mobile terminal.
S3: and training the image super-resolution network of the mobile terminal, inputting the training set into the constructed image super-resolution network, outputting high-resolution pictures, and randomly rotating and overturning the pictures in the data set.
S4: and comparing the loss of the original picture in the data set with the loss of the generated high-resolution picture, and performing back propagation calculation based on the loss until the training is finished. The loss function is L1 loss, i.e., MAE
Referring to fig. 2, the construction of the image super-resolution network applicable to the mobile terminal disclosed by the present invention specifically includes:
s22: the image super-resolution network in the training stage mainly comprises the following parts. The characteristic extraction part is used for extracting the characteristics of the picture by using the convolution layer and the re-parameterization module; in the image reconstruction part, the extracted features are reconstructed using pixel reconstruction, and a global residual is added. Finally, the features are transformed into the appropriate range using the operator clip to fit into the int8 quantization model.
S23: and the image super-resolution model in the inference stage is obtained by equivalent transformation of the image super-resolution model obtained in the step S22, and the operator repeat, the operator add, the operator concat and the operator clip are respectively subjected to equivalent transformation. Specific equivalent transformation procedures are as follows. For the operator repeat, theCan obtain
Then the operator repeat can be replaced with a convolution with the convolution kernel repeat (I, n), where x is the input tensor, I is the identity matrix, n represents the input tensor repeated n times,is a convolution operation;
for the operator add, the operator add is,preceding the network in the training phase are two convolutional networks, i.e.Andtwo convolutions of which W 1 And W 2 Are two different convolution kernels, x and y are two different input tensors, b 1 And b 2 Is the tensor for the convolution kernel. The operator add can be translated as follows, i.e. the convolution kernel becomes [ W ] 1 ,W 2 ]Bias becomes b 1 +b 2
For the operator concat, preceded in the trained network by a Conv2d _ ReLU layer, we can translate as follows, the convolution kernel becomesIs biased to be->
For the operator clip, the equivalent transformation relation between the operator clip and the ReLU is
clip(x)=ReLU(-ReLU(-x+255)+255)
The operator clip can be equivalently converted into two convolution layers, the convolution kernel is-I, and the bias is 255
Through the equivalent transformation of the four operators, the model in the training process can be transformed into the model in reasoning, and the training speed in reasoning is greatly accelerated.
Examples
Referring to fig. 3, the present embodiment specifically includes the following steps:
s1: and obtaining a training data set, wherein the data set DIV2K is used, the DIV2K data set comprises 800 pictures, and the low-resolution pictures are generated by double-triple downsampling of the high-resolution pictures. Specifically, (a) in fig. 3 is a real picture, and (b) in fig. 3 is a low resolution picture generated using bicubic downsampling;
s2: constructing an image super-resolution network suitable for a mobile terminal;
s3: and training an image super-resolution network of the mobile terminal, inputting the training set into the constructed image super-resolution network, outputting high-resolution pictures, randomly rotating and overturning the pictures in the data set, comparing the loss of the original pictures in the data set with the loss of the generated high-resolution pictures, and performing back propagation calculation based on the loss until the training is finished. The loss function is L1 loss, i.e., MAE
S4: when the picture to be tested is input into the inference network, the low-resolution picture generated by using the bicubic downsampling (fig. 3 b) is input into the inference network, so that the picture (c) can be obtained, and the picture (c) in fig. 3 realizes better PSNR and better visual effect compared with the picture (b) in fig. 3, and is closer to the real picture (a). When better PSNR and visual effect are obtained, the model optimizes the mobile terminal, so the speed is higher and the weight is lighter.
It will be appreciated by those of ordinary skill in the art that the examples described herein are intended to assist the reader in understanding the manner in which the invention is practiced, and it is to be understood that the scope of the invention is not limited to such specifically recited statements and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.
Claims (1)
1. A mobile terminal lightweight image super-resolution reconstruction method based on a convolutional neural network is characterized by comprising the following steps:
s1: using a data set DIV2K, wherein the DIV2K data set comprises hundreds of pictures, and generating low-resolution pictures by double-triple downsampling of high-resolution pictures to obtain a training data set;
s2: constructing an image super-resolution network for a mobile terminal
S21: constructing an image super-resolution network in a training stage, wherein the network comprises the following components:
the characteristic extraction part is used for extracting the characteristics of the picture by using the convolution layer and the re-parameterization module;
an image reconstruction unit that reconstructs the extracted features by using pixel reconstruction and adds a global residual error; finally, the feature is transformed into the [0,255] range using the operator Clip to apply to int8 quantization scenarios;
s22: equivalently converting the super-resolution network in the training stage in the step S21 to obtain an image super-resolution network in the inference stage, that is, an image super-resolution network for the mobile terminal, specifically including:
The operator repeat is replaced with a convolution with the convolution kernel repeat (I, n) where x is the input tensor, I is the identity matrix, n represents the input tensor repeated n times,is a convolution operation;
for the operator add, there are two convolutional networks in the network in the training phase, i.e.And &>Wherein W 1 And W 2 Are two different convolution kernels, x and y are two different input tensors, b 1 And b 2 Is the tensor corresponding to the convolution kernel;
the operator add is converted such that the convolution kernel becomes [ W ] 1 ,W 2 ]Bias becomes b 1 +b 2
For the operator concat, conv2d _ ReLU layer in the trained network, then the transformation is such that the convolution kernel becomesIs biased to be->
For the operator clip, according to the equivalent transformation relationship between the operator clip and the ReLU, the following steps are carried out:
clip(x)=ReLU(-ReLU(-x+255)+255)
equivalently converting the operator clip into two convolution layers with a convolution kernel of-I and an offset of 255
S3: image super-resolution network for training mobile terminal
Inputting the training data set obtained in the step S1 into a built image super-resolution network of the mobile terminal, and outputting a high-resolution picture; randomly rotating and overturning the pictures in the data set, comparing the loss of the original pictures in the data set with the loss of the generated high-resolution pictures, and performing back propagation calculation based on the loss until the training is finished; the loss function is L1 loss, i.e. MAE
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211647267.0A CN115965527A (en) | 2022-12-21 | 2022-12-21 | Mobile terminal lightweight image super-resolution reconstruction method based on convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211647267.0A CN115965527A (en) | 2022-12-21 | 2022-12-21 | Mobile terminal lightweight image super-resolution reconstruction method based on convolutional neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115965527A true CN115965527A (en) | 2023-04-14 |
Family
ID=87362621
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211647267.0A Pending CN115965527A (en) | 2022-12-21 | 2022-12-21 | Mobile terminal lightweight image super-resolution reconstruction method based on convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115965527A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116205284A (en) * | 2023-05-05 | 2023-06-02 | 北京蔚领时代科技有限公司 | Super-division network, method, device and equipment based on novel re-parameterized structure |
-
2022
- 2022-12-21 CN CN202211647267.0A patent/CN115965527A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116205284A (en) * | 2023-05-05 | 2023-06-02 | 北京蔚领时代科技有限公司 | Super-division network, method, device and equipment based on novel re-parameterized structure |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | Self-attention negative feedback network for real-time image super-resolution | |
Li et al. | Deep learning methods in real-time image super-resolution: a survey | |
CN108492249B (en) | Single-frame super-resolution reconstruction method based on small convolution recurrent neural network | |
Fu et al. | Image super-resolution based on generative adversarial networks: A brief review | |
CN109949221B (en) | Image processing method and electronic equipment | |
Luo et al. | Lattice network for lightweight image restoration | |
Xia et al. | A group variational transformation neural network for fractional interpolation of video coding | |
Yang et al. | License plate image super-resolution based on convolutional neural network | |
CN115936985A (en) | Image super-resolution reconstruction method based on high-order degradation cycle generation countermeasure network | |
CN115965527A (en) | Mobile terminal lightweight image super-resolution reconstruction method based on convolutional neural network | |
Yang et al. | MRDN: A lightweight Multi-stage residual distillation network for image Super-Resolution | |
Wang et al. | Multi-scale fast Fourier transform based attention network for remote-sensing image super-resolution | |
CN112150356A (en) | Single compressed image super-resolution reconstruction method based on cascade framework | |
CN110288529A (en) | A kind of single image super resolution ratio reconstruction method being locally synthesized network based on recurrence | |
Cheng et al. | Adaptive feature denoising based deep convolutional network for single image super-resolution | |
Kasem et al. | Revised spatial transformer network towards improved image super-resolutions | |
Guo et al. | A novel lightweight multi-dimension feature fusion network for single-image super-resolution reconstruction | |
CN116503251A (en) | Super-resolution reconstruction method for generating countermeasure network remote sensing image by combining hybrid expert | |
Li et al. | Compression artifact removal with stacked multi-context channel-wise attention network | |
Qiu et al. | Nested Dense Attention Network for Single Image Super-Resolution | |
Wang et al. | Multi-scale detail enhancement network for image super-resolution | |
Kasem et al. | DRCS-SR: Deep robust compressed sensing for single image super-resolution | |
Zhao et al. | Tree-Like Branching Network for Single Image Super-Resolution with Divide-and-Conquer | |
Wu et al. | Lightweight convolutional neural network with SE module for image super-resolution | |
Du et al. | Augmented global attention network for image super‐resolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |