CN108629737B

CN108629737B - Method for improving JPEG format image space resolution

Info

Publication number: CN108629737B
Application number: CN201810435569.9A
Authority: CN
Inventors: 颜波; 李可; 林楚铭; 马晨曦
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2018-05-09
Filing date: 2018-05-09
Publication date: 2022-11-18
Anticipated expiration: 2038-05-09
Also published as: CN108629737A

Abstract

The invention belongs to the technical field of image processing, and particularly relates to a method for improving the spatial resolution of a JPEG-format image. The hyper-resolution ratio reconstruction technology of the JPEG-format image is an effective way for improving the resolution ratio of the JPEG-format image, and a signal processing-based method is adopted to improve the resolution ratio of the image. As the traditional JPEG-format image super-resolution method based on the lossless image super-resolution method can increase the image resolution and make the image block effect obvious at the same time. The method combines the image deblocking process with the image super-resolution process, and realizes the super-resolution of the JPEG-format images while eliminating the image block effect through an end-to-end network model. Experimental results show that the method effectively improves the resolution of the JPEG-formatted image, so that the JPEG-formatted image has clearer visual quality, richer contents and higher research and application values.

Description

Method for improving JPEG format image space resolution

Technical Field

The invention belongs to the technical field of image editing, particularly relates to a JPEG format image super-resolution method, and more particularly relates to a method for improving the spatial resolution of the JPEG format image.

Background

The image resolution is an important index for evaluating the image quality, so that the high-resolution image has important application value and wide application prospect in various fields. But the resulting images typically suffer from varying degrees of quality degradation due to hardware limitations or extrinsic interference during acquisition, storage, transmission.

The image hyper-resolution ratio reconstruction technology is an effective way for improving the image resolution ratio, and adopts a method based on signal processing to improve the image resolution ratio. The conventional image super-segmentation method is based on processing an image in BMP (abbreviation of Bitmap, a lossless Bitmap format of an image) format, which generally occupies a large storage space. With the rapid development of image acquisition devices such as social networks and smart phones, a large number of images fill our lives, and in order to save storage space and bandwidth in a network transmission process, block-based transform coding is inevitably performed on an original image to realize image compression. Among these compression encoding methods, JPEG (Joint Photographic Experts Group, the first international image compression standard) is widely used because of its clear specification and simple hardware implementation. Therefore, the super-classification of the JPEG format image has greater application value.

Although JPEG is very efficient in saving memory space and transmission bandwidth, it causes blocking artifacts that affect the visual effect of compressed images, and thus there is a contradiction between reducing the number of coded bits and maintaining the visual effect of images.

Many deblocking algorithms have been proposed to solve this problem. The traditional image deblocking algorithm based on deep learning is a classic image processing method in the field of computer vision, and the method learns the mapping relation between a compressed image and a lossless image so as to remove the blocking effect existing in the compressed image. The contradiction between saving the coding bit number and maintaining the image quality in the image compression process is effectively relieved. ARCNN ^[1] The method is a deblocking method based on a deep convolutional network, and the method utilizes the advantages of deep learning and uses a convolutional network-based model to learn the image deblocking process. The model takes the compressed image as network input, outputs an image with an approximate lossless image effect after multi-layer convolution layer processing, eliminates the block effect in the compressed image and improves the visual quality of the image. The ARCNN model can only handle one Quality Factor (QF) type, however, requiring different models to be trained for multiple different quality factors. In addition, the network model is shallow, and still has room for improvement in the aspect of enhancing the effect of the deblocking effect. This deblocking method does not effectively achieve the effect of enhancing the resolution of an image.

Also, with the recent application of Convolutional Neural Networks (CNNs) in the field of computer vision, many CNN-based lossless image hyper-segmentation methods have emerged. These methods achieve this breakthrough development in SRCNN ^[2] And VDSR ^[4] The method is most representative.

Dong et al proposed a convolutional neural network-based image hyper-segmentation method (srnnn) in 2015 to reconstruct a high-resolution image by learning a mapping relationship between low-resolution and high-resolution images. The map is represented as a CNN with the low resolution image as input and the high resolution image as output. The method utilizes the superiority of the neural network, models the image over-resolution problem into a neural network structure, and trains a proper neural network by optimizing an objective function to obtain a simple and effective model for enhancing the image resolution.

The neural network is easy to learn and obtain a large amount of training set data, and in addition, once a super-resolution model is trained, the reconstruction of a high-resolution image is a simple feedforward process, so the calculation complexity is greatly reduced. Dong et al also improved SRCNN method, and proposed FSRCNN ^[3] The method improves the structure of the neural network to realize a faster overdivision effect.

In 2016, kim J et al have achieved a better effect on image resolution by deepening the neural network structure, and utilize residual learning to improve network efficiency and accelerate the training speed of the network. However, if the traditional image super-division method is used for super-dividing the JPEG-format image, the image block effect becomes obvious while the image resolution is increased, so that the method has more challenges and practical significance for providing a super-division model specially based on the JPEG-format image.

Disclosure of Invention

In order to achieve better super-resolution of JPEG-formatted images, the present invention aims to provide a method for improving the spatial resolution of JPEG-formatted images, so as to simultaneously achieve the suppression of blocking effect and the improvement of the quality of low-resolution images.

The method for improving the spatial resolution of the JPEG-format image comprises two stages: an image deblocking stage and an image super-segmentation stage; the method comprises the following specific steps:

(1) Image deblocking stage

Firstly, carrying out bicubic interpolation on an input low-resolution JPEG format image to obtain a JPEG image q with a blocking effect on a high-resolution space;

then, the image q is sequentially passed through 20 convolutional layers (see fig. 2), the 20 convolutional layers constitute a training network, and after each convolutional layer, a regular Linear Unit (modified Linear activation Unit) is used to deeply learn a commonly used activation function, where the formula is Φ (x) = max (0, x)) as the activation function. In addition, a total of 6 summation layers are used by the skip connection after the 7 th, 10 th, 13 th, 16 th, 19 th and 20 th convolutional layers, and the characteristics of the current layer output are added to the output of the previous convolutional layer. Three convolutional layers are included between the two inputs of each summation layer except the last summation layer, the last summation layer adds the outputs of the current layer and the previous layer, and each convolutional layer is followed by an activation function. In this way, the network is more easily converged and when training the network backhaul gradient, the 6 summation layers can effectively pass the deep gradient stream to the shallower network layer, thereby making the training of the network easier. Here, the summation layer simply adds the two inputs, only slightly increasing the computational complexity of the network;

(2) Image hyper-segmentation stage

In the stage, firstly, the output of the previous stage is preprocessed to obtain three characteristic graphs, and then the three characteristic graphs are combined to obtain an input signal of the stage; the combined input signal is input into a neural network composed of d (d > =1, the effect graphs provided are all obtained in the case of d = 7) Gated Highway units (Gated Highway units), and a residual graph r is obtained; and finally adding the residual image r and the input of the stage to obtain a final high-resolution image.

In the step (1), the loss function uses a universal mean square error MSE in the network training process. The loss function of the proposed method is defined as follows:

wherein, I _LR And I _HR Respectively representing a low resolution JPEG-formatted image and a true high resolution JPEG-formatted image, f (I) _LR And theta) represents a hyper-divided JPEG formatted image of the model output.

In step (2) of the invention, the operation steps of preprocessing the input q of the step are as follows: and processing q to obtain a horizontal gradient map, a vertical gradient map and a brightness map, and then combining the three maps to obtain the input of the reconstruction network.

In the step (2), the invention adopts a new gated high way unit-based neural network, and can realize better hyper-resolution effect. The neural network structure is shown in fig. 3, and includes: d gated high way units connected in sequence, and deconvolution (deconvolution), d > =1.

In step (2), the gated high way unit used in the image super-resolution network has a structure shown in fig. 4: the device comprises an upper channel and a lower channel between an input and an output; the upper channel is sequentially a convolutional layer and a sigmoid (a deep learning common S-type activation function with the formula of phi (x) = 1/(1 + e) ^-x ) An active layer; the lower channel contains three convolutional layers, one random discard (dropout) layer, and two ReLU layers, which are in turn: random discard (dropout) layer, convolutional layer 1, reLU layer 1, convolutional layer 2, reLU layer 2, convolutional layer 3; the upper and lower channels are finally merged in the polymerization layer; and combining the output g of the upper layer channel, the output y of the lower layer channel and the input signal x through the aggregation layer to finally obtain the output of the road unit.

Firstly, the signal output by the previous layer of the unit is taken as input x and sent to an upper channel and a lower channel, the upper channel and the lower channel are processed, the output g of the upper channel, the output y of the lower channel and the input signal x are converged to an aggregation layer, the aggregation layer combines the three, and finally the output of the highway unit is obtained, namely the output of the aggregation layer is as follows:

。

the essence of this unit is that a weight g is learned from the upper channel to combine the input of the unit with the output y of the lower channel. Each gated high way unit output has its input signal taken into account before it, which is why the proposed method performs better than other methods with the same parameters. The network design has an innovation point that a random drop (dropout) layer is arranged at the forefront of a channel under each road unit, and other methods put the random drop (dropout) at the last in the network structure. According to the invention, the trained network can be converged no matter how many road units are set, and the performance is better than that of other network structures with the same parameters.

The method combines the image deblocking process with the image super-resolution process, and realizes the super-resolution of the JPEG-format images while eliminating the image block effect through an end-to-end network model. Experimental results show that the method effectively improves the resolution of the JPEG-formatted image, so that the JPEG-formatted image has clearer visual quality, richer contents and higher research and application values.

The method not only effectively eliminates the blocking effect of the JPEG-format image, but also realizes the reconstruction of the high-resolution image. The method directly inputs the low-resolution JPEG image into a trained network structure, performs forward operation to obtain a final image, does not need other additional operation, and simplifies the super-resolution process.

Drawings

Fig. 1 is a flow chart (overall network) of the present invention.

Fig. 2 is a diagram of the deblocking stage.

FIG. 3 is an illustration of an image hyper-segmentation stage.

Fig. 4 is a structural diagram of a gated high way unit used in the image super-resolution network.

FIG. 5 is a comparison graph of the effect of the method on the hyper-segmentation of JPEG-formatted images. The method comprises the following steps of (a) taking an original image, (b) taking a low-resolution image, (c) taking direct interpolation, (d) taking a deblocking effect, (e) taking a directly amplified deblocking image, and (f) taking a super-resolution effect of the method.

Detailed Description

Fig. 1 is a process of hyper-separating a low-resolution JPEG format image, specifically including the steps of:

(1) Firstly, carrying out bicubic interpolation on an input low-resolution JPEG format image to obtain a JPEG image q with a blocking effect on a high-resolution space;

(2) And (4) sequentially carrying out image deblocking effect and image super-division network processing on the input q to reconstruct a high-resolution JPEG-format image.

Fig. 2 shows a process of image deblocking, which comprises the following steps:

the input q is passed through 20 convolutional layers in sequence, using the ReLU as the activation function after each layer. In addition, a total of 6 summation layers are used after 7, 10, 13, 16, 19 and 20 convolutional layers by skip connection to add the characteristics of the current layer output to the output of the previous convolutional layer. Three convolutional layers are included between the two inputs of each summation layer except the last summation layer, and the last summation layer adds the output of the current layer and the previous layer, and each convolutional layer is followed by an activation function.

Fig. 3 is an image hyper-segmentation process, which comprises the following specific steps:

the output of the previous stage is processed to obtain three characteristic graphs, and then the three characteristic graphs are combined to obtain the input signal of the stage. The combined input signal is input into d (d > =1, the effect graphs provided are all obtained in the case of d = 7) Gated Highway units (Gated Highway units) to obtain a residual graph r. And finally adding the residual image r and the input of the stage to obtain a final high-resolution image.

Fig. 4 is a structure of a gated high way unit used in an image hyper-division network, and the specific steps are as follows:

the input and the output of the road unit comprise an upper channel and a lower channel, and the signals output in the previous step are respectively input into the two channels of the road unit. The structure of the upper channel is sequentially a convolution layer and a sigmoid layer, the lower channel comprises three convolution layers, a random discard (dropout) layer and two ReLU layers, and the structure of the upper channel is sequentially: layers are randomly discarded (dropout), convolutional layer 1, reLU layer 1, convolutional layer 2, reLU layer 2, convolutional layer 2. And finally, the upper channel and the lower channel are converged at the aggregation layer, and the output g of the upper channel, the output y of the lower channel and the input signal x are combined together through the aggregation layer to obtain the output of the road unit.

Fig. 5 shows a comparison between an effect map using the present method and an original map, a low resolution input map, a deblocking result map, an effect map in which an input image is directly enlarged, and an image in which deblocking is directly performed. Wherein (a) is the original high resolution image. (b) The input of the network is a low-resolution image obtained by performing a downsampling operation on an original high-resolution image. And (c) amplifying the effect of (b) by using an interpolation method. (d) The effect after the image deblocking stage, (e) the effect image processed by the image deblocking is directly interpolated and amplified, and (f) the effect after the super-resolution by using the method is obtained. The method can well realize the reconstruction of the detail information of the JPEG-format image, relatively completely recover the texture contents such as numbers, letters and the like in the original image, simultaneously inhibit the blocking effect in the JPEG-format image and obviously improve the image quality.

Reference documents:

[1] C. Dong, Y. Deng, C. C. Loy, and X. Tang, “Compression artifacts reduction by a deep convolutional network,” in 2015 IEEE International Conference on Computer Vision (ICCV), Dec 2015, pp. 576–584.

[2] C. Dong, C. C. Loy, K. He, and X. Tang. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 38(2):295–307, 2015.（SRCNN）

[3] C. Dong, C. C. Loy, and X. Tang. Accelerating the super-resolution convolutional neural network. In European Conference on Computer Vision (ECCV), pages 391–407. Springer International Publishing, 2016.（FSRCNN）

[4] Kim J, Lee J K, Lee K M. Accurate Image Super-Resolution Using Very Deep Convolutional Networks[C]// IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2016:1646-1654.（VDSR）。

Claims

1. a method for improving the spatial resolution of a JPEG-format image is characterized by comprising the following specific steps:

(1) Image deblocking stage

Firstly, carrying out bicubic interpolation on an input low-resolution JPEG-format image to obtain a JPEG image q with a blocking effect on a high-resolution space;

then, sequentially passing the image q through 20 convolutional layers, wherein the 20 convolutional layers form a training network, and processing each convolutional layer by using a ReLU; in addition, the characteristics of the current layer output are added to the outputs of the previous convolutional layers using 6 summation layers by skip connection after 7 th, 10 th, 13 th, 16 th, 19 th and 20 th convolutional layers; the two inputs of each summation layer except the last summation layer comprise three convolution layers, the last summation layer adds the output of the current layer and the output of the previous layer, and an activation unit is connected behind each convolution layer; in this way, the network is more easily converged and the 6 summation layers can effectively deliver the deep gradient stream to the shallower network layer when training the network backhaul gradient;

(2) An image super-segmentation stage;

firstly, preprocessing the output of the previous stage to obtain three characteristic diagrams, and then combining the three diagrams to obtain an input signal of the stage; inputting the combined input signals into a neural network formed by d gate control road units to obtain a residual error graph r, d > =1; finally, adding the residual image r and the input of the stage to obtain a final high-resolution image;

the gated road unit in the step (2) has the structure that: the device comprises an upper channel and a lower channel between an input and an output; the upper channel is sequentially a convolution layer and a Sigmoid activation layer, and the lower channel comprises three convolution layers, a random discard layer and two ReLU layers; sequentially comprises the following steps: a random discard layer, a first convolution layer, a first ReLU layer, a second convolution layer, a second ReLU layer, a third convolution layer; the upper and lower channels are finally merged in the polymerization layer; combining the output g of the upper layer channel, the output y of the lower layer channel and the input signal x through a polymerization layer to finally obtain the output of the highway unit; the output expression of the aggregation layer is:

。

2. a method for improving the spatial resolution of an image in JPEG format, according to claim 1, characterized in that the operation of preprocessing the input q in step (2) is as follows: and processing q to obtain a horizontal gradient map, a vertical gradient map and a brightness map, and then combining the three maps to obtain the input of the reconstruction network.

3. A method for increasing the spatial resolution of JPEG formatted images in accordance with claim 2, wherein said neural network in step (2) comprises: d gate road units connected in sequence, and deconvolution operation, d > =1.

4. A method for improving the spatial resolution of JPEG formatted images in accordance with claim 1, characterized in that the loss function used in the training optimization process of the whole network structure uses the universal mean square error MSE, the loss function being defined as follows:

wherein, I _LR And I _HR Respectively representing a low-resolution JPEG-format image and a true high-resolution JPEG-format image, f (I) _LR And theta) represents a hyper-divided JPEG format image of the model output.