CN112509094A

CN112509094A - JPEG image compression artifact elimination algorithm based on cascade residual error coding and decoding network

Info

Publication number: CN112509094A
Application number: CN202011530001.9A
Authority: CN
Inventors: 张译; 禹冬晔; 牟轩沁
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2021-03-16
Anticipated expiration: 2040-12-22
Also published as: CN112509094B

Abstract

The invention discloses a JPEG image compression artifact elimination algorithm based on a cascade residual error encoder-decoder network (CRED-Net). The invention firstly utilizes a Quality Factor (QF) prediction network to give QF value estimation of a compressed image, and then selects a proper cascade residual error encoder-decoder network according to the estimated QF value to realize effective removal of image compression artifacts. The method provided by the invention does not depend on the compression coding information of JPEG images, performs multi-scale feature learning in a pixel domain, a Discrete Cosine Transform (DCT) domain and a Discrete Wavelet Transform (DWT) domain, and introduces a correction network to further correct recovered images when the QF value of the images is not in the consideration range of a trained CRED-Net model. The experimental results prove that compared with other methods, the method provided by the invention can obtain better performance within a wider compression level range.

Description

JPEG image compression artifact elimination algorithm based on cascade residual error coding and decoding network

Technical Field

The invention belongs to the field of image processing, and particularly relates to a JPEG image compression artifact elimination algorithm based on a cascade residual encoder-decoder network (CRED-Net).

Background

In recent years, the rapid development of digital imaging technology has provided an important foundation for the capture, storage, and sharing of images. In order to save bandwidth and device resources, lossy compression methods such as JPEG compression have been widely used in image transmission and storage processes. However, lossy compression introduces image artifacts such as blocking, ringing, and blurring, which have a large negative impact on various image processing and computer vision tasks that take the compressed image as input. Designing an algorithm capable of effectively removing image compression artifacts becomes an important research topic in the field of computer vision. At present, most algorithms need to predict coding information of compressed images, and the application range of the algorithms is limited. Therefore, the image recovery algorithm independent of the compressed coding information is designed, and the method has wide application prospect and practical value.

Disclosure of Invention

The invention aims to overcome the defects and provide a JPEG image compression artifact elimination algorithm based on a cascade residual coding and decoding network, which can improve the performance of the algorithm in a wider compression level range.

In order to achieve the above object, the present invention comprises the steps of:

s1, dividing the image into a series of overlapped image blocks;

s2, estimating the QF value of each image block through a Quality Factor (QF) prediction network, and taking the rounded average value of the QF values of the image blocks with local standard deviations within a threshold value as the overall QF value of the test image;

s3, when the estimated QF value is within the preset range, selecting a CRED-Net network processing image corresponding to the QF value, and outputting the network to obtain a final result; when the estimated QF value is not within the preset range, firstly selecting a corresponding CRED-Net network to remove the compression artifact of the input image, and then inputting the network output and the original compressed image together into a correction network for further correction to generate a final recovery image.

In the training stage of the QF prediction network, a random integer value is used for compressing a brightness channel of each image, then an image with a specific pixel size is extracted from the compressed brightness image to serve as training data, and the network optimization is carried out by using an L1 loss function of a QF predicted value and a real value.

The calculation method of the local standard deviation comprises the following steps:

wherein the content of the first and second substances,

ω_k,l(K-K, …, K; L-L, …, L) is a two-dimensional normalized circularly symmetric gaussian weighting function with a standard deviation of 1.5; K-L-5 denotes the normalized window size.

The overall QF value of the test image is the rounded average of the QF values of the individual image patches whose local standard deviation means lie within the threshold as follows:

wherein, QF_iExpressing the QF predicted value of each image block, wherein N is the number of the image blocks meeting the condition, i is 1,2, and N; round (·) denotes rounding.

In S3, in the end-to-end multi-domain CRED-Net network, the compressed image is subjected to DCT automatic encoder in a Discrete Cosine Transform (DCT) branch to restore DCT coefficients;

recovering wavelet coefficients by a DWT automatic encoder in a Discrete Wavelet Transform (DWT) branch;

the DCT branch and the DWT branch are combined with an original input image, input into a CRED-Net network together, connected through an end-to-end residual error, and output a de-artifact image.

The specific calculation process of the DCT branch is as follows:

setting a sliding window for compressing an image, starting from the first pixel at the upper left corner of the image, moving one pixel at a time in the transverse or longitudinal direction, abandoning an image area outside the window every time the image area moves once, thus obtaining a cut image, carrying out DCT (discrete cosine transformation) on each cut image, and connecting the obtained DCT coefficient images into a multi-channel tensor;

inputting the multichannel tensor into an automatic encoder, and introducing a DCT correction unit (DRU) to constrain DCT coefficient values by referring to specific JPEG prior information;

the output of the DCT auto-encoder is transformed to the pixel domain by an Inverse Discrete Cosine Transform (IDCT).

And introducing a DCT correction unit to constrain the DCT coefficient value, wherein the expression is as follows:

wherein Q is a quantization table; x (u, v) represents a DCT coefficient of the compressed image; y (u, v) represents the corresponding DCT autoencoder output; u and v represent spatial position indices of the DCT domain.

The DWT branch is specifically calculated as follows:

convolving four filters corresponding to the Daubechies 3 wavelet with the compressed image; performing downsampling on the convolution result to obtain four sub-band images; connecting four wavelet sub-band images into a four-channel tensor input DWT automatic encoder; the output of the DWT auto-encoder is inverse wavelet transformed using an Inverse Discrete Wavelet Transform (IDWT).

Compared with the prior art, the method has the advantages that firstly, QF value estimation of the compressed image is given by utilizing the QF prediction network, and then the image compression artifact is effectively removed by selecting a proper cascade residual coding and decoding network according to the estimated QF value. The method provided by the invention does not depend on the compression coding information of JPEG images, performs multi-scale feature learning in a pixel domain, a Discrete Cosine Transform (DCT) domain and a Discrete Wavelet Transform (DWT) domain, and introduces a correction network to further correct recovered images when the QF value of the images is not in the consideration range of a trained CRED-Net model. The experimental results prove that compared with other methods, the method provided by the invention can obtain better performance within a wider compression level range.

Drawings

FIG. 1 is a process framework diagram of the present invention;

FIG. 2 is a diagram of an end-to-end multi-domain concatenated residual error coding and decoding network proposed by the present invention;

FIG. 3 is a network structure diagram of the DCT/DWT automatic encoder proposed by the present invention;

FIG. 4 is a DCT transform method for overlapped image blocks proposed by the present invention;

FIG. 5 is a diagram of a CRED-Net network architecture according to the present invention;

FIG. 6 shows the actual effect of the present invention and other methods on restoring a QF-10 JPEG compressed image; wherein, (a) is a JPEG compressed image; (b) recovering the result of the ARCNN algorithm; (c) recovering the result for the TNRD algorithm; (d) recovering results for the DnCNN algorithm; (e) restoring the results for the method of the invention; (f) is an original uncompressed image;

FIG. 7 shows the actual effect of the present invention and other methods on restoring a QF-20 JPEG compressed image; wherein, (a) is a JPEG compressed image; (b) restoring the results for the D2SD algorithm; (c) recovering the result of the ARCNN algorithm; (d) restoring the result for the FastARCNN algorithm; (e) recovering results for the DnCNN algorithm; (f) recovering the result for the TNRD algorithm; (g) restoring the results for the method of the invention; (h) is the original uncompressed image.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, the JPEG image compression artifact removal algorithm based on the concatenated residual coding and decoding network of the present invention includes the following steps:

the method comprises the following steps: dividing a test image into a series of overlapped image blocks, estimating the QF value of each image block through a QF prediction network, and taking the rounded average value of the QF values of the image blocks with Local Standard Deviation (LSD) positioned at the top 20% -50% as the overall QF value of the test image.

The structure of the QF prediction network is shown in table 1. In the training stage, 45000 images of a training set of an MS-COCO database are used for training a QF value prediction network, and the specific method comprises the following steps: compressing the brightness channel (namely Y channel of YCbCr space) of each image by using a random integer value (namely QF), wherein the value range of the QF is [1,95 ]; then, image blocks with the interval of 48 pixels and the size of 144 x 144 pixels are extracted from the compressed luminance image to be used as training data, and the network optimization is carried out by using an L1 loss function of a QF predicted value and a real value.

The LSD calculation formula is:

wherein the content of the first and second substances,

The overall QF value of the test image is a rounded average of the QF values of the image blocks with LSD mean values in the first 20% -50%:

wherein, QF_i(i ═ 1,2, · · ·, N) denotes the QF prediction value for each image block (N is the number of image blocks that satisfy the condition); round (·) denotes rounding.

Step two: when the estimated QF value is equal to a preset value or within a preset range, selecting a CRED-Net network corresponding to the QF value to process an image, and outputting the network to obtain a final result; when the estimated QF value does not meet the conditions, firstly, a proper CRED-Net network is selected to remove the compression artifacts from the input image, and then the network output and the original compressed image are input into a correction network together for further correction to generate a final recovery image.

As shown in FIG. 2, in the end-to-end multi-domain CRED-Net network, the compressed image passes through a DCT automatic encoder in a DCT branch to recover DCT coefficients; recovering wavelet coefficients through a DWT automatic encoder in the DWT branch; and finally, combining the DCT branch, the DWT branch and the original input image, inputting the combined image into a CRED-Net network, connecting the CRED-Net network with the original input image through an end-to-end residual error, and outputting an artifact-removed image.

As shown in FIG. 3, the network structure of the DCT/DWT automatic encoder is as follows:

two convolution layers are used as an encoder, two convolution layers are used as a decoder, and three expanded convolution (scaled convolution) layers are arranged in the middle for feature extraction; the encoder and decoder perform convolution operations using a 3 x 3 convolution kernel with step size and padding of 1 pixel each; the three expansion convolution layers are all subjected to convolution operation by using a 3 multiplied by 3 convolution kernel, wherein expansion factors are respectively set to be 2, 4 and 8, corresponding filling is respectively 2, 4 and 8 pixels, the step length is 1 pixel, and the number of channels of all the convolution kernels is 64; except the last convolution layer of the decoder, each convolution layer is provided with a corrected linear unit (ReLU) layer.

The specific calculation process of the DCT branch is as follows:

1) as shown in fig. 4, the DCT transform is performed on the overlapped image blocks, and the specific method is as follows: assuming that a compressed image of size WXH pixels is given, it is first set

In the sliding window (wherein

Indicating rounding down), starting with the first pixel in the upper left corner of the image, one pixel at a time, either laterally or longitudinally, and up to 8 pixels in both directions, for a total of 64 shifts. With each shift, the out-of-window image area is discarded, resulting in 64 cropped images. And performing DCT transformation on each cut image, and connecting the obtained 64 DCT coefficient graphs into a 64-channel tensor.

2) Inputting the 64-channel tensor into an automatic encoder; by referring to specific JPEG prior information, a DCT correction unit (DRU) is introduced to constrain DCT coefficient values, and the expression is as follows:

3) The output of the DCT auto-encoder is transformed to the pixel domain by an Inverse Discrete Cosine Transform (IDCT).

The DWT branch is specifically calculated as follows:

four filters (f) corresponding to the Daubechies 3(db3) wavelets_LL、f_LH、f_HL、f_HH) Convolving with the compressed image; downsampling the convolution result to obtain 4 sub-band images; connecting 4 wavelet sub-band images into a 4-channel tensor input DWT automatic encoder; the output of the DWT auto-encoder is inverse wavelet transformed using an Inverse Discrete Wavelet Transform (IDWT).

As shown in fig. 5, the CRED-Net network structure can be summarized as:

the multi-layer U-Net structure is formed by cascading two sub-networks, wherein each sub-network adopts an embedded multi-layer U-Net structure and comprises an encoder and a decoder. The encoder contains several "convolutional layer + PReLU layer" structures, after the PReLU layer an average pooling layer (2 × 2 pixels) is set for downsampling; after convolution and deconvolution operations are applied to the features of each downsampling layer, the output of the downsampling layer is superposed with the features in the same proportion in the downsampling path, and the data are processed through a structure of a convolutional layer and a PReLU layer, so that the decoding of the data is realized. In each nested U-Net network, the deconvolution times and the down-sampling times are the same, and the down-sampling times of different U-Net networks are different, so that the multi-scale extraction and expression of image information are realized. And finally, connecting all the nested U-Net network output characteristics together, and inputting the output characteristics into a subsequent network, thereby realizing the cascade connection of the two sub-networks. The convolution kernel sizes of the first layer and the last layer of the CRED-Net network are 9 multiplied by 9 and 5 multiplied by 5 pixels respectively, the filling is 0 and 2 pixels respectively, the convolution kernels of other layers are 3 multiplied by 3 pixels respectively, and the filling is 1 pixel.

The correction network structure is basically the same as the CRED-Net network structure, and the main difference is that:

1) the dimension of the network input is changed from 136 × 136 × 66 to 128 × 128 × 2, so to ensure the same size of the input and output images, the filling of the 9 × 9 convolutional layer is 4 pixels;

2) the maximum number of downsampling increases to 4 to cope with more complex image compression artifacts.

As a further improvement of the present invention, in step two, the networks to be trained may be summarized as:

training 7 multi-domain CRED-Net networks by using compressed images with QF values of 5, 10, 20, 30, 40, 60 and 80 respectively; with QF values in the range of [1,4 ] respectively]，[6,7]，[8,9]，[11,14]，[15,19]，[21,24]，[25,29]And [86,95 ]]8 correction networks were trained. The selected network model for the compressed images with different QF values is shown in Table 2, wherein QF_estThe predicted QF values are indicated and the brackets indicate the range of QF values of the compressed images required to train the network model.

TABLE 2 network model to be trained for compressed images of different QF values

As a further improvement of the present invention, the training method of the network may be summarized as follows:

1) 45000 original images are selected from a training set of an MS-COCO database, a standard MATLAB-JPEG encoder is used for compressing a brightness channel of each image, and the value range of a quality factor (namely QF) is [1,95 ]. Each compressed image is divided into 144 x 144 image blocks which are not overlapped to serve as training data, and the QF prediction network is trained.

2) 400 original images (200 images in each training set and test set) of a Berkeley Segmentation Database (BSD) and 45000 images in the step 1) are selected to construct a large training image set. The luminance channel for each image is compressed using the same method as step 1), so that 45400 compressed images can be created for each QF value. And extracting non-overlapped 144 x 144 image blocks from each compressed image as training data, and respectively training the multi-domain CRED-Net network corresponding to each QF value.

3) 10000 original images in the test set of the MS-COCO database and 400 original images in the BSD database are selected to construct training data of the correction network. The specific method comprises the following steps: compressing the brightness channel of each original image by using the method in the step 1), and inputting the compressed images with different QF values into a corresponding multi-domain CRED-Net network; the network output is divided into 128 x 128 tiles that do not overlap and serve as training data for the correction network along with the co-located tiles of the original compressed image.

4) The loss function for training the QF prediction network is L1 loss, and the loss function for training the multi-domain CRED-Net network and the correction network is pixel mean square error loss (L)_MSE) And loss of structural similarity (l)_SSIM) Linear combinations of (3). Loss of pixel mean square error (l)_MSE) The calculation formula is as follows:

wherein, I (I, j) and I_R(I, j) denote a reference image I and a restored image I, respectively_RThe spatial position is a pixel value of (i, j); w and H represent the width and height of the image, respectively. Loss of structural similarity (l)_SSIM) The calculation formula is as follows:

wherein the content of the first and second substances,

denotes SSIM (I, I)_R) The calculation formula of (a) is:

in the formula (I), the compound is shown in the specification,

and

respectively represent I (I)_R) Local mean and local standard deviation of; c₁And C₂Is a constant, and the value thereof is the same as that of the SSIM method. The final loss function is:

L＝l_MSE+λ·l_SSIM (7)

wherein λ is 0.005.

5) The invention uses PyTorch deep learning framework to perform experiments on Intel Xeon 2.67GHz CPU and NVIDIA GeForce GTX1080Ti GPU workstations. The network initialization parameter is interval [0,1 ]]Uniformly distributed sample values; the initialization parameter of the PReLU slope is 0.1; optimizing by using an Adam algorithm; the initial learning rate is 2 × 10^-4And the exponential decay rates of the first/second order moment estimates are set to 0.9 and 0.999, respectively. When training the multi-domain CRED-Net network and the correction network, the batch size is set to be 16, and the learning rate is reduced to 3/4 before every 16000 times of iteration; when the QF prediction network is trained, the batch size is set to 64, the first 200 epoch learning rates are kept unchanged, and the last 200 epoch learning rates linearly drop to zero. The multi-domain CRED-Net network and the correction network both adopt a hard-to-easy training mode, namely, a network model is firstly trained on the image blocks with smaller QF values, and then the training parameters of the network model are used for initializing the parameters of the network model corresponding to the larger QF values.

As a further improvement of the present invention, the network testing method may be summarized as follows: selecting an original image of five data sets including LIVE, CSIQ, BSD100 (100 images in a BSD verification set), Classic5 (five images including baboon, barbarbara, beds, lena and peppers) and Urban100, carrying out JPEG compression on a brightness channel of each image, inputting the compressed brightness image into a network model, and carrying out algorithm performance test on the decompression artifact image by using three indexes of peak signal to noise ratio (PSNR), PSNR-B and Structural Similarity (SSIM). Table 3 shows the comparison of the test performance of the present invention with other methods on different image databases (the bold data represents better performance), and fig. 6 and 7 show the practical effect of the present invention on the recovery of compressed images with different QF values with other methods. The experimental results show that compared with other methods, the method can obtain better image recovery performance in a wider compression level range.

TABLE 3 comparison of the test Performance of the method of the present invention with other methods on LIVE, BSD100, Classic5, CSIQ, Urban100 databases

In summary, the algorithm for removing the JPEG image compression artifacts based on the cascade residual coding and decoding network firstly estimates the QF value of a compressed image by using the QF prediction network, and then realizes effective removal of the image compression artifacts by selecting a proper cascade residual coding and decoding network according to the estimated QF value. The method provided by the invention does not depend on the coding information of the compressed image, performs multi-scale feature learning in a pixel domain, a DCT domain and a DWT domain, and further corrects the image by using a correction network when the QF value of the image exceeds a preset range. The experimental results prove that compared with other methods, the method can obtain better image recovery performance in a wider compression level range.

Although the embodiments of the present invention have been described above with reference to the accompanying drawings, the present invention is not limited to the above-described embodiments. The above embodiments are intended to be illustrative, not limiting. Those skilled in the art, having the benefit of this disclosure, may implement a variety of compression artifact removal algorithms without departing from the scope of the present invention as defined by the appended claims.

Claims

1. A JPEG image compression artifact elimination algorithm based on a cascade residual error coding and decoding network is characterized by comprising the following steps:

s1, dividing the image into a series of overlapped image blocks;

s2, estimating the QF value of each image block through a quality factor QF prediction network, and taking the rounded average value of the QF values of the image blocks with local standard deviations within a threshold value as the overall QF value of the test image;

s3, when the estimated QF value is within a preset range, selecting a CRED-Net processing image of a cascade residual error coding and decoding network corresponding to the QF value, wherein the network output is a final result; when the estimated QF value is not within the preset range, firstly selecting a corresponding CRED-Net network to remove the compression artifact of the input image, and then inputting the network output and the original compressed image together into a correction network for further correction to generate a final recovery image.

2. The JPEG image compression artifact elimination algorithm based on the cascade residual coding and decoding network as claimed in claim 1, wherein during the training phase, the QF prediction network compresses a luminance channel of each image by using a random integer value, extracts an image with a specific pixel size from the compressed luminance image as training data, and performs network optimization by using an L1 loss function of a QF prediction value and a true value.

3. The algorithm for eliminating the JPEG image compression artifacts based on the cascade residual coding and decoding network as claimed in claim 1 or 2, wherein the calculation method of the local standard deviation is as follows:

wherein the content of the first and second substances,

ω_k，l(K-K, …, K; L-L, …, L) is a two-dimensional normalized circularly symmetric gaussian weighting function with a standard deviation of 1.5; K-L-5 denotes the normalized window size.

4. The algorithm for removing the JPEG image compression artifacts based on the cascaded residual coding and decoding network as claimed in claim 1, wherein the QF value of the test image is a rounded average value of the QF values of the image blocks with the local standard deviation mean value within the threshold as follows:

wherein, QF_iThe QF predicted value of each image block is represented, N is the number of the image blocks meeting the condition, and i is 1,2, …, N; round (·) denotes rounding.

5. The JPEG image compression artifact removal algorithm based on the cascade residual error coding and decoding network as claimed in claim 1, wherein in S3, in an end-to-end multi-domain CRED-Net network, the compressed image is subjected to DCT automatic encoder in DCT branch to recover DCT coefficients;

recovering wavelet coefficients by a DWT automatic encoder in a discrete wavelet transform DWT branch;

6. The JPEG image compression artifact elimination algorithm based on the cascade residual coding and decoding network as claimed in claim 5, wherein the DCT branches are specifically calculated as follows:

setting a sliding window for compressing the image, starting from the first pixel at the upper left corner of the image, moving one pixel at a time in the transverse or longitudinal direction, and abandoning an image area outside the window every time the image area is moved, thereby obtaining a cut image; performing DCT transformation on each cut image, and connecting the obtained DCT coefficient images into a multi-channel tensor;

inputting the multichannel tensor into a DCT automatic encoder, and introducing a DCT correction unit to constrain DCT coefficient values by referring to specific JPEG prior information;

the output of the DCT auto-encoder is transformed to the pixel domain by an inverse discrete cosine transform, IDCT.

7. The JPEG image compression artifact removal algorithm based on the cascade residual coding and decoding network as claimed in claim 6, wherein a DCT correction unit is introduced to constrain DCT coefficient values, and the expression is as follows:

8. The JPEG image compression artifact elimination algorithm based on the cascade residual coding and decoding network as claimed in claim 5, wherein the DWT branch comprises the following specific calculation processes:

convolving four filters corresponding to the Daubechies 3 wavelet with the compressed image; performing downsampling on the convolution result to obtain four sub-band images; connecting four wavelet sub-band images into a four-channel tensor input Discrete Wavelet Transform (DWT) automatic encoder; the output of the DWT automatic encoder is subjected to inverse wavelet transform by using inverse discrete wavelet transform.