WO2016127271A1

WO2016127271A1 - An apparatus and a method for reducing compression artifacts of a lossy-compressed image

Info

Publication number: WO2016127271A1
Application number: PCT/CN2015/000093
Authority: WO
Inventors: Xiaoou Tang; Chao Dong; Chen Change Loy
Original assignee: Xiaoou Tang
Priority date: 2015-02-13
Filing date: 2015-02-13
Publication date: 2016-08-18
Also published as: CN107251053A; CN107251053B

Abstract

Disclosed is an apparatus for reducing compression artifacts of a lossy-compressed image. The apparatus may comprise: a feature extraction device comprising a first set of filters configured to extract patches from the lossy-compressed image and map the extracted patches to a first set of high dimensional feature vectors, a feature enhancement device electronically communicated with the feature extraction device and comprising a second set of filters configured to denoise each high dimensional feature vector in the first set and map the denoised high dimensional feature vectors to a second set of high dimensional feature vectors, a mapping device coupled to the feature enhancement device and comprising a third set of filters configured to map nonlinearly each high dimensional vector in the second set onto a restored patch-wise representation, and an aggregating device electronically communicated with the mapping device and configured to aggregate the patch-wise representations to generate a restored clear image.

Description

AN APPARATUS AND A METHOD FOR REDUCING COMPRESSION ARTIFACTS OF A LOSSY-COMPRESSED IMAGE

Technical Field

The present application generally relates to a field of image processing, more particularly, to an apparatus and a method for reducing compression artifacts of a lossy-compressed image.

Background

Lossy compression is the class of data encoding methods that uses inexact approximations or partial data discarding for representing the content that has been encoded. Such compression techniques are used to reduce the amount of data that would otherwise be needed to store, handle, and/or transmit the represented content. There are different kinds of lossy image compression formats, e.g. JPEG, WebP, JPEG XR, and HEVC-MSP. JPEG remains the most widely adopted format among the various alternatives.

Lossy compression introduces compression artifacts, especially when used in low bit rates/quantization levels. For instance, JPEG compression artifacts are a complex combination of different specific artifacts comprising blocking artifacts, ringing effects and blurring. Blocking artifacts arise when each block is encoded without considering the correlation with the adjacent blocks, resulting in discontinuities at the borders. Ringing effects along the edges occur due to the coarse quantization of the high-frequency components. Blurring happens due to the loss of high-frequency components.

Existing algorithms for eliminating the artifacts can be classified into deblocking oriented and restoration oriented methods. The deblocking oriented methods focus on removing blocking and ringing artifacts. However, most deblocking oriented methods could not reproduce sharp edges, and tend to over smooth texture regions. The restoration oriented methods regard the compression operation as distortion and propose restoration algorithms. The restoration oriented methods tend to reconstruct the original image directly, thus the sharpened output is often accompanied with ringing effects around edges and abrupt transition in smooth regions.

Summary

The following presents a simplified summary of the disclosure in order to provide an apparatus for reducing compression artifacts of a lossy-compressed image of some aspects of the disclosure. This summary is not an extensive overview of the disclosure. It is intended to neither identify key or critical elements of the disclosure nor delineate any scope of particular embodiments of the disclosure, or any scope of the claims. Its sole purpose is to present some concepts of the disclosure in a simplified form as a prelude to the more detailed description that is presented later.

According to an embodiment of the present application, disclosed is an apparatus for reducing compression artifacts of a lossy-compressed image. The apparatus may comprise: a feature extraction device comprising a first set of filters configured to extract patches from the lossy-compressed image and map the extracted patches to a first set of high dimensional feature vectors, and a feature enhancement device electronically communicated with the feature extraction device and comprising a second set of filters configured to denoise each high dimensional feature vector in the first set and map the denoised high dimensional feature vectors to a second set of high dimensional feature vectors. The apparatus further comprises a mapping device coupled to the feature enhancement device and comprising a third set of filters configured to map nonlinearly each high dimensional vector in the second set onto a restored patch-wise representation, and an aggregating device electronically communicated with the mapping device and configured to aggregate patch-wise representations mapped from all high dimensional vectors in the second set to generate a restored clear image.

In an aspect, the first set of filters may be configured to extract patches from the lossy-compressed image and map nonlinearly each of the extracted patches as a high dimensional feature vector, and the mapped vectors for all the patches form said first set of high dimensional feature vectors.

In yet another aspect, the second set of filters may be configured to denoise each high dimensional feature vector in the first set and map nonlinearly the denoised high dimensional feature vectors to a second set of high dimensional feature vectors.

In an aspect, the first, second and third set of filters and the aggregating device may map the vectors based on predetermined first, second and third parameters, respectively, or may aggregate the patch-wise representations based on fourth parameter.

In yet another aspect, the apparatus may further comprise a comparing device, it may be coupled to the aggregating device and configured to sample a ground truth uncompressed image corresponding to the lossy-compressed image from a predetermined training set and compare a dissimilarity between the aggregated restored clear image received from the aggregating device and the corresponding ground truth uncompressed image to generate a reconstruction error, wherein the reconstruction error is back-propagated in order to optimize the first, second, third and fourth parameters.

According to an embodiment of the present application, the apparatus may further comprise a training set preparation device coupled to the comparing device, in which the training set preparation device further comprises: a cropper configured to crop randomly a plurality of sub-images from a randomly selected training image to generate a set of ground truth uncompressed sub-images and a lossy-compressed sub-image generator electronically communicated with the cropper and configured to generate a set of lossy-compressed sub-images based on the set of ground truth uncompressed sub-images received from the cropper. Furthermore, the training set preparation device comprises a pairing device electronically communicated with the cropper and generator and configured to pair each of the ground truth uncompressed sub-images with a corresponding lossy-compressed sub-image and a collector electronically communicated with the pairing device and configured to collect the paired ground truth uncompressed sub-images and the lossy-compressed sub-image to form the predetermined training set.

In an aspect, the lossy-compressed sub-image generator further comprises a compressing device electronically communicated with the cropper and generator and configured to encode and decode the ground truth sub-image with Compression encoder and decoder to generate the set of lossy-compressed sub-images.

In an aspect, the reconstruction error comprises a mean squared error.

According to an embodiment of the present application, disclosed is a method for reducing compression artifacts of a lossy-compressed image, the method may comprise: extracting patches from the lossy-compressed image and mapping the extracted patches to a first set of high dimensional feature vectors by feature extraction device comprising a first set of filters； denoising each high dimensional feature vector in the first set and mapping the denoised high dimensional feature vectors to a second set of high dimensional feature vectors by a feature enhancement device electronically communicated with the feature extraction device and comprising a second set of filters； mapping nonlinearly each high dimensional vector in the second set onto a restored patch-wise representation by a mapping device coupled to the feature enhancement device and comprising a third set of filters； and aggregating patch-wise representations mapped from all high dimensional vectors in the second set to generate a restored clear image by an aggregating device electronically communicated with the mapping device.

According to an embodiment of the present application, disclosed is an apparatus for reducing compression artifacts of a lossy-compressed image. The apparatus may comprise a reconstructing unit configured to reconstruct the lossy-compressed image to a restored clear image based on predetermined parameters and a training unit configured to train the convolutional neural network system with a predetermined training set so as to determine the parameters used by the reconstructing unit. The reconstructing unit may comprise: feature extraction device comprising a first set of filters configured to extract patches from the lossy-compressed image and map the extracted patches to a first set of high dimensional feature vectors； a feature enhancement device electronically communicated with the feature extraction device and comprising a second set of filters configured to denoise each high dimensional feature vector in the first set and map the denoised high dimensional feature vectors to a second set of high dimensional feature vectors； a mapping device coupled to the feature enhancement device and comprising a third set of filters configured to map nonlinearly each high dimensional vector in the second set onto a restored patch-wise representation； and an aggregating device electronically communicated with the mapping device and configured to aggregate patch-wise representations mapped from all high dimensional vectors in the second set to generate a restored clear image. The feature extraction device, the feature enhancement device, the mapping device and the aggregating device comprise at least one convolutional layer, respectively. The convolutional layers are sequentially connected to each other to form a convolutional neural network system.

According to an embodiment of the present application, disclosed is a system for reducing compression artifacts of a lossy-compressed image. The system may comprise a memory that stores executable components and a processor executes the executable components to perform operations of the system. The executable components comprise: a feature extraction component configured to extract patches from the lossy-compressed image and map the extracted patches to a first set of high dimensional feature vectors； a feature enhancement component configured to denoise each high dimensional feature vector in the first set and map the denoised high dimensional feature vectors to a second set of high dimensional feature vectors； a mapping component configured to map nonlinearly each high dimensional vector in the second set onto a restored patch-wise representation； and an aggregating component configured to aggregate patch-wise representations mapped from all high dimensional vectors in the second set to generate a restored clear image.

The following description and the annexed drawings set forth certain illustrative aspects of the disclosure. These aspects are indicative, however, of but a few of the various ways in which the principles of the disclosure may be employed. Other aspects of the disclosure will become apparent from the following detailed description of the disclosure when considered in conjunction with the drawings.

Brief Description of the Drawing

Exemplary non-limiting embodiments of the present invention are described below with reference to the attached drawings. The drawings are illustrative and generally not to an exact scale. The same or similar elements on different figures are referenced with the same reference numbers.

Fig. 1 is a schematic diagram illustrating an apparatus for reducing compression artifacts of a lossy-compressed image consistent with an embodiment of the present application.

Fig. 2 is a schematic diagram illustrating an apparatus for reducing compression artifacts of a lossy-compressed image consistent with another embodiment of the present application.

Fig. 3 is a schematic diagram illustrating a convolutional neural network system, consistent with some disclosed embodiments.

Fig. 4. is a schematic diagram illustrating a training unit of the apparatus, consistent with some disclosed embodiments.

Fig. 5. is a schematic diagram illustrating a training set preparation device of the training unit, consistent with some disclosed embodiments.

Fig. 6 is a schematic flowchart illustrating a method for reducing compression artifacts of a lossy-compressed image, consistent with some disclosed embodiments.

Fig. 7 is a schematic flowchart illustrating a method for training a convolutional neural network system for reducing compression artifacts of a lossy-compressed image, consistent with some disclosed embodiments.

Fig. 8 is a schematic diagram illustrating a system for reducing compression artifacts of a lossy-compressed image consistent with an embodiment of the present application.

Detailed Description

Reference will now be made in detail to some specific embodiments of the invention including the best modes contemplated by the inventors for carrying out the invention. Examples of these specific embodiments are illustrated in the accompanying drawings. While the invention is described in conjunction with these specific embodiments, it will be understood that it is not intended to limit the invention to the described embodiments. On the contrary, it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be practiced without some or all of these specific details. In other instances, well-known process operations have not been described in detail in order not to unnecessarily obscure the present invention.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising, " when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Referring to Fig. 1, the apparatus 1000 may comprise a feature extraction device 100, a feature enhancement device 200, a mapping device 300 and an aggregating device 400. Hereinafter, the feature extraction device 100, the feature enhancement device 200, the mapping device 300 and the aggregating device 400 will be further discussed in detail. For convenience of description, the lossy-compressed image is denoted by Y, and the restored clear image is denoted by F (Y) which is as similar as possible to a ground truth uncompressed image X.

According to an embodiment, the feature extraction device 100 comprises a first set of filters. The first set of filters is configured to extract patches from the lossy-compressed image and map the extracted patches to a first set of high dimensional feature vectors. For example, the first set of filters map the extracted patches to a first set of high dimensional feature vectors by rule of a function of F' (first parameters) , where F' (x) is a nonlinear function (e.g., max (0, x) , 1/ (1+exp (-x) ) or tanh (x) ) and the first parameters are determined from predetermined parameters associated with the lossy-compressed image.

In an embodiment, these first set of high dimensional feature vectors may comprise a set of feature maps, of which the number equals to the dimensionality of the vectors. A popular strategy in image restoration is to densely extract patches and then represent them by a set of pre-trained bases such as PCA (Principal Component Analysis) , DCT (Discrete Cosine Transformation) , Haar, etc.

According to an embodiment, the operations for the feature extraction device 100 may be formulated as:

F₁(Y) ＝ F’ (W₁*Y + B₁) , (1)

where W₁ and B₁ represent the filters and biases respectively. F' (x) is a nonlinear function (e.g., max (0, x) , 1/ (1+exp (-x) ) or tanh (x) ) . Here W₁ is of a size c×f₁×f₁×n₁, where c is the number of channels in the input image, f₁ is the spatial size of a filter, and n₁ is the number of filters. Intuitively, W₁ applies n₁ convolutions on the image, and each convolution has a kernel size c×f₁×f₁. The output is composed of n₁ feature maps. B₁ is an n₁-dimensional vector, whose each element is associated with a filter.

The feature enhancement device 200 may be electronically communicated with the feature extraction device 100, and may comprise a second set of filters configured to denoise each high dimensional feature vector in the first set and map the denoised high dimensional feature vectors to a second set of high dimensional feature vectors, for example, a set of relatively cleaner feature vectors.

According to an embodiment, the feature enhancement device 200 is configured to denoise each high dimensional feature vector in the first set and map the denoised high dimensional feature vectors to the second set of high dimensional feature vectors by rule of a function of F' (second parameters) , where F' (x) is a nonlinear function (e.g., max (0, x) , 1/(1+exp (-x) ) or tanh (x) ) and the second parameters are determined from predetermined parameters associated with the first set of high dimensional feature vectors.

In the embodiment, the feature extraction device 100 extracts an n₁-dimensional feature for each patch. The second set of filters maps these n₁-dimensional vectors into a set of n₂-dimensional vectors. Each mapped vector is conceptually a relatively cleaner feature vector. These vectors comprise another set of feature maps.

According to an embodiment, the feature enhancement may be formulated as:

F₂(Y) ＝ F’ (W₂*Y + B₂) , (2)

where W₂ is of a size n₁×f₂×f₂×n₂ and B₂ is a n₂-dimensional vector.

As shown, the apparatus 1000 may further comprise a mapping device 300. The mapping device 300 may be coupled to the feature enhancement device 200 and comprise a third set of filters configured to map nonlinearly each high dimensional vector in the second set onto a restored patch-wise representation.

According to an embodiment, the mapping device 300 is configured to map nonlinearly each of the high dimensional vectors onto a patch-wise representation by rule of a function of F' (third parameters) , where F' (x) is a nonlinear function (e.g., max (0, x) , 1/(1+exp (-x) ) or tanh (x) ) and the third parameters are determined from predetermined parameters associated with the second set of high dimensional feature vectors, i.e. the cleaner high dimensional feature vectors.

In an embodiment, the feature enhancement device 200 generates a set of n₂-dimensional feature vectors. The mapping device 300 maps each of these n₂-dimensional vectors into an n₃-dimensional vector. Each mapped vector is conceptually the representation of a restored patch. These vectors comprise another set of feature maps.

According to an embodiment, the mapping may be formulated as:

F₃(Y) ＝ F’ (W₃*F₂ (Y) + B₃) , (3)

where W₃ is of a size n₂×f₃×f₃×n₃, and B₃ is a n₃-dimensional vector. Each of the output n₃-dimensional vectors is conceptually a representation of a restored patch that will be used for reconstruction.

As shown, the apparatus 1000 may further comprise an aggregating device 400. The aggregating device 400 may be electronically communicated with the mapping device 300 and configured to aggregate the patch-wise representations to generate a restored clear image

The aggregating device 400 aggregates the restored patch-wise representations to generate a restored clear image. The aggregating may be formulated as:

F(Y) ＝ W₄*F₃ (Y) + B₄, (4)

where W₄ is of a size n₃×f₄×f₄×c, and B₄ is a c-dimensional vector.

According to the embodiment, the apparatus 1000 may further comprise a comparing device (not shown) which is coupled to the aggregating device 400 and configured to sample a ground truth uncompressed sub-image corresponding to the lossy-compressed sub-image from a predetermined training set and compare dissimilarity between the aggregated restored clear sub-image received from the aggregating device 400 and the sampled ground truth uncompressed sub-image to generate a reconstruction error. For example, the reconstruction error comprises a mean squared error. The reconstruction error is back-propagated in order to determine the parameters, i.e., W₁, W₂, W₃, W₄, B₁, B₂, B₃ and B₄.

Fig. 2 is a schematic diagram illustrating an apparatus 1000’ for reducing compression artifacts of a lossy-compressed image consistent with another embodiment of the present application. As shown in Fig. 2, the apparatus 1000’ may comprise a reconstructing unit 100’a nd a training unit 200’ . The reconstructing unit 100’ is configured to reconstruct the lossy-compressed image to a restored clear image based on predetermined parameters.

According to an embodiment shown in Fig. 2, the reconstructing unit 100’ may further comprise a feature extraction device 110’ , a feature enhancement device 120’ , a mapping device 130’a nd a aggregating device 140’ . In an embodiment, the feature extraction device 110’ , the feature enhancement device 120’ , the mapping device 130’a nd the aggregating device 140’ may comprise at least one convolutional layer, respectively, and the convolutional layers are sequentially connected to each other to form a convolutional neural network system.

Fig. 3 illustrates the layer configuration of the convolutional neural network system in mathematic simulation model. In one embodiment, each of the feature extraction device 110’ , the feature enhancement device 120’ , the mapping device 130’ and the aggregating device 140’ may be simulated as at least one convolutional layer, respectively. Different operations are performed at different convolutional layers, respectively.

In the embodiment, the feature extraction device 110’ is configured to extract patches from the lossy-compressed image and map the extracted patches to a first set of high dimensional feature vectors. This is equivalent to convolving the image by a set of filters as mentioned above.

The feature enhancement device 120’ is configured to be electronically communicated with the feature extraction device 110’ and denoise each high dimensional feature vector in the first set and map the denoised high dimensional feature vectors to a second set of high dimensional feature vectors, for example, a set of relatively cleaner feature vectors. This is equivalent to applying a second set of filters as mentioned above.

The mapping device 130’ is configured to be coupled to the feature enhancement device 120’a nd map nonlinearly each high dimensional vector in the second set onto a restored patch-wise representation. This is equivalent to applying a third set of filters as mentioned above.

The aggregating device 140’ is configured to be electronically communicated with the mapping device 130’a nd aggregate patch-wise representations mapped from all high dimensional vectors in the second set to generate a restored clear image.

In an embodiment, the feature extracting device 110’ , the feature enhancement device 120’ , the mapping device 130’a nd the aggregating device 140’ comprise at least one convolutional layer, respectively, and the convolutional layers are sequentially connected to each other to form a convolutional neural network system. The convolutional neural network system dates back decades and has recently shown an explosive popularity partially due to its success in image classification. The convolutional neural network system is usually applied for natural image denoising and removing noisy patterns (dirt/rain) .

Alternatively, it is possible to add more convolutional layers to increase the non-linearity. But this can significantly increase the complexity of the convolutional neural network system, and thus demands more training data and time.

The training unit 200’ is configured to train the convolutional neural network system with a predetermined training set so as to optimize the parameters, for example W₁, W₂, W₃, W₄, B₁, B₂, B₃, B₄used by the reconstructing unit. According to an embodiment as shown in Fig. 4, the training unit 200’ may comprise a sampling device 210’ , a comparing device 220’ , and a back-propagating device 230’ .

The sampling device 210’ may be configured to sample a lossy-compressed sub-image and its corresponding ground truth uncompressed sub-image from a predetermined training set and input the lossy-compressed sub-image to the convolutional neural network system. Here, “sub-images” means these samples are treated as small “images” rather than “patches” , in the sense that “patches” are overlapping and require some averaging as post-processing but “sub-images” need not.

The comparing device 220’ may be configured to compare dissimilarity between the reconstructed clear sub-image based on the input lossy-compressed sub-image from the convolutional neural network system and the corresponding ground truth uncompressed sub-image to generate a reconstruction error. For example, the reconstruction error may comprise a mean squared error, and the error is minimized by using stochastic gradient descent with the standard back propagation.

The back-propagating device 230’ is configured to back-propagate the reconstruction error through the convolutional neural network system so as to adjust weights on connections between neurons of the convolutional neural network system.

It should be noted that the convolutional neural network system do not preclude the usage of other kinds of reconstruction error, if only the reconstruction error are derivable. If a better perceptually motivated metric is given during the training, it is flexible for the convolutional neural network system to adapt to that metric.

In one embodiment, the apparatus 1000 and 1000’ may further comprise a training set preparation device coupled to the comparing device and configured to prepare the predetermined training set for training the convolutional neural network system. Fig. 5 is a schematic diagram illustrating the training set preparation device. As shown, the training set preparation device may comprise a cropper 241’ , a lossy-compressed sub-image generator 242’ , a pairing device 243’ and a collector 244’ .

The cropper 241’ may be configured to crop randomly a plurality of sub-images from a randomly selected training image to generate a set of ground truth uncompressed sub-images. For example, the cropper 241’ may crop n sub-images of m×m pixels each. The lossy-compressed sub-image generator 242’ may be electronically communicated with the cropper 241’a nd configured to generate a set of lossy-compressed sub-images based on the set of ground truth uncompressed sub-images received from the cropper 241’ . The pairing device 243’ may be electronically communicated with the cropper 241’ and generator 242’ and configured to pair each of the ground truth uncompressed sub-images with a corresponding lossy-compressed sub-image. The collector 244’ may be electronically communicated with the pairing device 243’ and configured to collect all the pairs to form the predetermined training set.

According to an embodiment, the lossy-compressed sub-image generator 242’ may comprise a compressing device electronically communicated with the cropper 241’ and configured to encode and decode the ground truth sub-image with compression encoder and decoder to generate the set of lossy-compressed sub-images.

Fig. 6 is a schematic flowchart illustrating a method 2000 for reducing compression artifacts of a lossy-compressed image, consistent with some disclosed embodiments. Hereafter, the method 2000 may be described in detail with respect to Fig. 6.

At step S210, patches are extracted from the lossy-compressed image and each of the extracted patches is mapped into a high dimensional feature vector, by the feature extraction device comprising the first set of filters, such that a first set of high dimensional feature vectors is formed. In an embodiment, these vectors comprise a set of feature maps, of which the number equals to the dimensionality of the vectors. A popular strategy in image restoration is to densely extract patches and then represent them by a set of pre-trained bases such as PCA, DCT, Haar, etc.

At step S220, each high dimensional feature vector in the first set is denoised and the denoised high dimensional feature vectors are mapped into a second set of high dimensional feature vectors by a feature enhancement device electronically communicated with the feature extraction device and comprising a second set of filters. In the embodiment, the feature extraction device extracts an n₁-dimensional feature for each patch. The second set of filters maps these n₁-dimensional vectors into a set of n₂-dimensional vectors. Each mapped vector is conceptually a relatively cleaner feature vector. These vectors comprise another set of feature maps.

At step S230, each high dimensional vector in the second set is mapped nonlinearly onto a restored patch-wise representation by a mapping device coupled to the feature enhancement device and comprising a third set of filters. In the embodiment, the feature enhancement device generates a set of n₂-dimensional feature vectors. The mapping device maps each of these n₂-dimensional vectors into an n₃-dimensional vector. Each mapped vector is conceptually the representation of a restored patch. These vectors comprise another set of feature maps.

At step S240, patch-wise representations mapped from all high dimensional vectors in the second set are aggregated to generate a restored clear image by an aggregating device electronically communicated with the mapping device. In an embodiment, these steps S210-S230 may be simulated by the above-mentioned formulae (1) - (3) .

According to an embodiment, the patches may be extracted from the lossy-compressed image and each of the extracted patches may be mapped as a high dimensional feature vector by rule of a function of F' (first parameters) , where F' (x) is a nonlinear function (e.g., max (0, x) , 1/ (1+exp (-x) ) or tanh (x) ) and the first parameters are determined from predetermined parameters associated with the lossy-compressed image.

According to an embodiment, the first set of high dimensional feature vectors may be denoised and the denoised high dimensional feature vectors may be mapped nonlinearly to a second set of high dimensional feature vectors, i.e. a set of relatively cleaner feature vectors by rule of a function of F' (second parameters) , where F' (x) is a nonlinear function (e.g., max (0, x) , 1/ (1+exp (-x) ) or tanh (x) ) and the second parameters are determined from predetermined parameters associated with the first set of high dimensional feature vectors.

According to an embodiment, each high dimensional vector in the second set may be mapped nonlinearly onto a restored patch-wise representation by rule of a function of F'(third parameters) , where F' (x) is a nonlinear function (e.g., max (0, x) , 1/ (1+exp (-x)) or tanh (x) ) and the third parameters are determined from predetermined parameters associated with the second set of high dimensional vectors.

According to an embodiment, after the patch-wise representations are aggregated to generate a restored clear image, the method 2000 may further comprise a step of sampling a ground truth uncompressed sub-image corresponding to the lossy-compressed sub-image from a predetermined training set and a step of comparing dissimilarity between the aggregated restored clear sub-image and the corresponding ground truth uncompressed sub-image to generate a reconstruction error. The reconstruction error is back-propagated in order to optimize the parameters, i.e., W₁, W₂, W₃, W₄, B₁, B₂, B₃and B₄.

According to an embodiment, before a ground truth uncompressed sub-image corresponding to the lossy-compressed sub-image is sampled from a predetermined training set, the method 2000 further comprises a step of preparing the predetermined training set. In particular, a plurality of sub-images is first cropped from a randomly selected training image to generate a set of ground truth uncompressed sub-images. For example, n sub-images of m×m pixels each may be cropped. Next, a set of lossy-compressed sub-images are generated based on the set of ground truth uncompressed sub-images. Then, each of the ground truth uncompressed sub-images is paired with a corresponding lossy-compressed sub-image. Then, all the pairs are collected to form the predetermined training set.

According to an embodiment, a method 3000 for training a convolutional neural network system for reducing compression artifacts of a lossy-compressed image is illustrated. Hereafter, the method 3000 may be described in detail with respect to Fig. 7.

As shown in Fig. 7, a lossy-compressed sub-image and its corresponding ground truth uncompressed sub-image are sampled from a predetermined training set at step S310. As step S320, a restored clear sub-image is reconstructed from the lossy-compressed sub-image by the convolutional neural network system. At step S330, a reconstruction error is generated by comparing dissimilarity between the reconstructed clear sub-image and the ground truth uncompressed sub-image. At step S340, the reconstruction error is back-propagated through the convolutional neural network system so as to adjust weights on connections between neurons of the convolutional neural network system. Repeating steps S310-S340 until an average value of the reconstruction error is lower than a preset threshold, for example, half of the mean square error between the lossy-compressed sub-images and ground truth uncompressed sub-image in the predetermined training set.

Referring to Fig. 8, a system 4000 is illustrated. The system 4000 comprises a memory 402 that stores executable components and a processor 404, coupled to the memory 402, that executes the executable components to perform operations of the system 4000. The executable components may comprise: a feature extraction component 410 configured to extract patches from the lossy-compressed image and map the extracted patches to a first set of high dimensional feature vectors； and a feature enhancement component 420 configured to denoise each high dimensional feature vector in the first set and map the denoised high dimensional feature vectors to a second set of high dimensional feature vectors. In addition, the executable components may further comprise: a mapping component 430 configured to map nonlinearly each high dimensional vector in the second set onto a restored patch-wise representation； and an aggregating component 440 configured to aggregate patch-wise representations mapped from all high dimensional vectors in the second set to generate a restored clear image.

In an aspect, the feature extraction component 410 is configured to extract patches from the lossy-compressed image and map nonlinearly each of the extracted patches as a high dimensional feature vector, and the mapped vectors for all the patches forming said first set of high dimensional feature vectors.

In an embodiment, the feature enhancement component 420 is configured to denoise each high dimensional feature vector in the first set and map nonlinearly the denoised high dimensional feature vectors to a second set of high dimensional feature vectors.

In an embodiment, the feature extraction component 410, feature enhancement component 420 and mapping component 430 map the vectors based on predetermined first, second and third parameters, respectively.

According to another embodiment, the executable components further comprises a comparing component coupled to the aggregating component and configured to sample a ground truth uncompressed image corresponding to the lossy-compressed image from a predetermined training set and compare a dissimilarity between the aggregated restored clear image received from the aggregating component and the corresponding ground truth uncompressed image to generate a reconstruction error, wherein the reconstruction error is back-propagated in order to optimize the first, second and third parameters.

In an embodiment, the executable components further comprise a training set preparation component coupled to the comparing component. The training set preparation component further comprises: a cropper configured to crop randomly a plurality of sub-images from a randomly selected training image to generate a set of ground truth uncompressed sub-images； a lossy-compressed sub-image generator electronically communicated with the cropper and configured to generate a set of lossy-compressed sub-images based on the set of ground truth uncompressed sub-images received from the cropper； a pairing module electronically communicated with the cropper and generator and configured to pair each of the ground truth uncompressed sub-images with a corresponding lossy-compressed sub-image； and a collector electronically communicated with the pairing module and configured to collect the paired ground truth uncompressed sub-images and the lossy-compressed sub-image to form the predetermined training set.

In an embodiment, the lossy-compressed sub-image generator further comprises a compressing module electronically communicated with the cropper and generator and configured to encode and decode the ground truth sub-image with compression encoder and decoder to generate the set of lossy-compressed sub-images.

In contrast to existing methods, the present application does not explicitly learn the dictionaries or manifolds for modeling the patch space. These are implicitly achieved via the convolutional layers. Furthermore, the feature extraction, feature enhancement and aggregation are also formulated as convolutional layers, so are involved in the optimization. The method and apparatus of the present application reveals different kinds of compression artifacts and provide an efficient reduction of various compression artifacts in different image regions. In the method and apparatus of the present application, the entire convolutional neural network is fully obtained through training, with no pre/post-processing. With a lightweight structure, the apparatus and method of the present application have achieved superior performance than the state-of-the-art methods.

Embodiments within the scope of the present invention can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations thereof. Apparatus within the scope of the present invention can be implemented in a computer program product tangibly embodied in a machine-readable storage device for execution by a programmable processor； and method actions within the scope of the present invention can be performed by a programmable processor executing a program of instructions to perform functions of the invention by operating on input data and generating output.

Embodiments within the scope of the present invention be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. Each computer program can be implemented in a high-level procedural or object oriented programming language, or in assembly or machine language if desired； and in any case, the language can be a compiled or interpreted language. Suitable processors include, by way of example, both general and special purpose microprocessors. Generally, a processor will receive instructions and data from a read-only memory and/or a random access memory. Generally, a computer will include one or more mass storage devices for storing data files.

Embodiments within the scope of the present invention include computer-readable media for carrying or having computer-executable instructions, computer-readable instructions, or data structures stored thereon. Such computer-readable media may be any available media, which is accessible by a general-purpose or special-purpose computer system. Examples of computer-readable media may include physical storage media such as RAM, ROM, EPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other media which can be used to carry or store desired program code means in the form of computer-executable instructions, computer-readable instructions, or data structures and which may be accessed by a general-purpose or special-purpose computer system. Any of the foregoing can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits) . While particular embodiments of the present invention have been shown and described, changes and modifications may be made to such embodiments without departing from the true scope of the invention.

Although the preferred examples of the present invention have been described, those skilled in the art can make variations or modifications to these examples upon knowing the basic inventive concept. The appended claims are intended to be considered as comprising the preferred examples and all the variations or modifications fell into the scope of the present invention.

Obviously, those skilled in the art can make variations or modifications to the present invention without departing the spirit and scope of the present invention. As such, if these variations or modifications belong to the scope of the claims and equivalent technique, they may also fall into the scope of the present invention.

Claims

An apparatus for reducing compression artifacts of a lossy-compressed image, comprising:

a feature extraction device comprising a first set of filters configured to extract patches from the lossy-compressed image and map the extracted patches to a first set of high dimensional feature vectors；

a feature enhancement device electronically communicated with the feature extraction device and comprising a second set of filters configured to denoise each high dimensional feature vector in the first set and map the denoised high dimensional feature vectors to a second set of high dimensional feature vectors；

a mapping device electronically coupled to the feature enhancement device and comprising a third set of filters configured to map nonlinearly each high dimensional vector in the second set onto a restored patch-wise representation； and

an aggregating device electronically communicated with the mapping device and configured to aggregate patch-wise representations mapped from all high dimensional vectors in the second set to generate a restored clear image.
The apparatus according to claim 1, wherein the first set of filters is configured to extract patches from the lossy-compressed image and map nonlinearly each of the extracted patches as a high dimensional feature vector, and the mapped vectors for all the patches form said first set of high dimensional feature vectors.
The apparatus according to claim 1, wherein the second set of filters is configured to denoise each high dimensional feature vector in the first set and map nonlinearly the denoised high dimensional feature vectors to a second set of high dimensional feature vectors.
The apparatus according to any of claims 1-3, wherein the first, second and third set of filters map the vectors based on predetermined first, second and third parameters, respectively.
The apparatus according to claim 4, further comprising:

a comparing device electronically coupled to the aggregating device and configured to sample a ground truth uncompressed image corresponding to the lossy-compressed image from a predetermined training set and compare a dissimilarity between the aggregated restored clear image received from the aggregating device and the corresponding ground truth uncompressed image to generate a reconstruction error, wherein the reconstruction error is back-propagated in order to optimize the first, second and third parameters.
The apparatus according to claim 5, further comprising a training set preparation device electronically coupled to the comparing device, wherein the training set preparation device further comprises:

a cropper configured to crop randomly a plurality of sub-images from a randomly selected training image to generate a set of ground truth uncompressed sub-images；

a lossy-compressed sub-image generator electronically communicated with the cropper and configured to generate a set of lossy-compressed sub-images based on the set of ground truth uncompressed sub-images received from the cropper；

a pairing device electronically communicated with the cropper and generator and configured to pair each of the ground truth uncompressed sub-images with a corresponding lossy-compressed sub-image； and

a collector electronically communicated with the pairing device and configured to collect the paired ground truth uncompressed sub-images and the lossy-compressed sub-image to form the predetermined training set.
The apparatus according to claim 6, wherein the lossy-compressed sub-image generator further comprises a compressing device electronically communicated with the cropper and configured to encode and decode the ground truth sub-image with Compression encoder and decoder to generate the set of lossy-compressed sub-images.
The apparatus according to claim 5, wherein the reconstruction error comprises a mean squared error.
A method for reducing compression artifacts of a lossy-compressed image, comprising:

extracting patches from the lossy-compressed image and mapping the extracted patches to a first set of high dimensional feature vectors by feature extraction device comprising a first set of filters；

denoising each high dimensional feature vector in the first set and mapping the denoised high dimensional feature vectors to a second set of high dimensional feature vectors by a feature enhancement device electronically communicated with the feature extraction device and comprising a second set of filters；

mapping nonlinearly each high dimensional vector in the second set onto a restored patch-wise representation by a mapping device electronically coupled to the feature enhancement device and comprising a third set of filters； and

aggregating patch-wise representations mapped from all high dimensional vectors in the second set to generate a restored clear image by an aggregating device electronically communicated with the mapping device.
The method according to claim 9, wherein the extracting patches from the lossy-compressed image and mapping the extracted patches to a first set of high dimensional feature vectors further comprises:

extracting patches from the lossy-compressed image and mapping nonlinearly each of the extracted patches as a high dimensional feature vector, and the mapped vectors for all the patches forming said first set of high dimensional feature vectors.
The method according to claim 9, wherein the denoising each high dimensional feature vector in the first set and mapping the denoised high dimensional feature vectors to a second set of high dimensional feature vectors further comprises:

denoising each high dimensional feature vector in the first set and mapping nonlinearly the denoised high dimensional feature vectors to a second set of high dimensional feature vectors.
The method according to any of claims 9-11, wherein the first, second and third set of filters map the vectors based on predetermined first, second and third parameters, respectively.
The method according to claim 12, after the aggregating, further comprising:

sampling a ground truth uncompressed image corresponding to the lossy-compressed image from a predetermined training set； and

comparing a dissimilarity between the aggregated restored clear image and the corresponding ground truth uncompressed image to generate a reconstruction error, wherein the reconstruction error is back-propagated in order to optimize the first, second and third parameters.
The method according to claim 13, wherein before sampling a ground truth uncompressed image corresponding to the lossy-compressed image from a predetermined training set, further comprising:

cropping randomly a plurality of sub-images from a randomly selected training image to generate a set of ground truth uncompressed sub-images；

generating a set of lossy-compressed sub-images based on the set of ground truth uncompressed sub-images；

pairing each of the ground truth uncompressed sub-images with a corresponding lossy-compressed sub-image； and

collecting the paired ground truth uncompressed sub-images and the lossy-compressed sub-image to form the predetermined training set.
The method according to claim 14, wherein the generating a set of lossy-compressed sub-images based on the set of ground truth uncompressed sub-images further comprises:

encoding and decoding the ground truth sub-image with Compression encoder and decoder to generate the set of lossy-compressed sub-images.
The method according to claim 13, wherein the reconstruction error comprises a mean squared error.
An apparatus for reducing compression artifacts of a lossy-compressed image, comprising:

a reconstructing unit configured to reconstruct the lossy-compressed image to a restored clear image based on predetermined parameters, wherein the reconstructing unit comprises:

a feature extraction device comprising a first set of filters configured to extract patches from the lossy-compressed image and map the extracted patches to a first set of high dimensional feature vectors；

a feature enhancement device electronically communicated with the feature extraction device and comprising a second set of filters configured to denoise each high dimensional feature vector in the first set and map the denoised high dimensional feature vectors to a second set of high dimensional feature vectors；

a mapping device electronically coupled to the feature enhancement device and comprising a third set of filters configured to map nonlinearly each high dimensional vector in the second set onto a restored patch-wise representation； and

an aggregating device electronically communicated with the mapping device and configured to aggregate patch-wise representations mapped from all high dimensional vectors in the second set to generate a restored clear image；

wherein the feature extracting device, the feature enhancement device, the mapping device and the aggregating device comprise at least one convolutional layer, respectively, and the convolutional layers are sequentially connected to each other to form a convolutional neural network system；

a training unit electronically communicated with the reconstructing unit and configured to train the convolutional neural network system with a predetermined training set so as to modify the predetermined parameters used by the reconstructing unit.
A system for reducing compression artifacts of a lossy-compressed image, comprising:

a memory that stores executable components； and

a processor, electronically coupled to the memory, that executes the executable components to perform operations of the system, the executable components comprising:

a feature extraction component configured to extract patches from the lossy-compressed image and map the extracted patches to a first set of high dimensional feature vectors；

a feature enhancement component configured to denoise each high dimensional feature vector in the first set and map the denoised high dimensional feature vectors to a second set of high dimensional feature vectors；

a mapping component configured to map nonlinearly each high dimensional vector in the second set onto a restored patch-wise representation； and

an aggregating component configured to aggregate patch-wise representations mapped from all high dimensional vectors in the second set to generate a restored clear image.
The system according to claim 18, wherein feature extraction component is configured to extract patches from the lossy-compressed image and map nonlinearly each of the extracted patches as a high dimensional feature vector, and the mapped vectors for all the patches forming said first set of high dimensional feature vectors.
The system according to claim 18, wherein feature enhancement component is configured to denoise each high dimensional feature vector in the first set and map nonlinearly the denoised high dimensional feature vectors to a second set of high dimensional feature vectors.