EP2281396A1

EP2281396A1 - Image coding method with texture synthesis

Info

Publication number: EP2281396A1
Application number: EP09757605A
Authority: EP
Inventors: Fabien Racape; Dominique Thoreau; Jérôme Vieron; Aurélie Martin; Gabrielle Ombrouck
Original assignee: Thomson Licensing SAS
Current assignee: Thomson Licensing SAS
Priority date: 2008-06-05
Filing date: 2009-06-04
Publication date: 2011-02-09
Also published as: JP2011522496A; KR20110020242A; CN102047663A; US20110081093A1; WO2009147224A1

Abstract

The invention relates to a coding method using a technique comprising the synthesis of images and image regions employing a synthesis algorithm operating on a set of patches, this operation being performed with a low-resolution image. The invention is characterized in that it comprises the following steps: decision to code or not to code the regions of the synthesized image by comparing the rendition with the source image, according to a quality metric; for the regions synthesized with decision to code, coding of the patches and of the low-resolution image in a conventional manner, and, for the regions synthesized with decision not to code, coding according to a conventional coding scheme.

Description

IMAGE ENCODING METHOD WITH TEXTURE SYNTHESIS

The invention lies in the context of image synthesis and more particularly in the field of video compression. The synthesis method applies to the encoder and the decoder.

The method consists of synthesizing the content of an image from texture patches, the patches in question being: • blocks of images of reduced dimensions,

Representative blocks, from the point of view of texture, of different regions composing the image.

Moreover, on the basis of a quality metric, the rendering of the synthesis thus obtained is compared with the coded source; the parts of the reconstructed image that do not meet a quality level judged to be acceptable by the criterion are then encoded by a more conventional technique; as examples:

• the metric can be the ssim,

• the classic coding H264-avc.

Algorithm synthesis

As regards the known methods of synthesis, mention may be made inter alia of pixel-based techniques, in that the pixels are constructed one by one; one of the algorithms developed by L. -Y. Wei and M. Levoy "Fast texture synthesis using tree-structured vector Quantization". Proceedings of SIG-GRAPH 2000 (JuIy 2000), 479-488. [1] The goal here is to synthesize a large texture area from a smaller patch that contains all the required information about patterns. The quality of the algorithm lies in the fact that this synthesized image must not show visible boundaries or periodicities.

Figure 1 represents the principle of the algorithm. It has two inputs, a texture patch and an image with the desired dimensions, initialized by a noise to avoid periodicities. It outputs a synthesized image from the texture.

Characteristics of the search of the best pixel The neighborhood comparison is done "pixel by pixel" by the norm L2. So the minimized error here is of the form:

^ ^"" 2 ^ 2 ^ ( ^X synth ^{~ X} patch) pixelsRGB

With x _synth and x _patch the values of each RGB color of the considered pixel of the current image and the patch. Each pixel of the vicinity of the current pixel is therefore compared with its vis-à-vis the neighborhood of the pixel tested in the patch.

The neighborhood consists of the pixels surrounding the current pixel, it is included in a given square of dimension [dxd]. It is called "causal" when it contains only the pixels already synthesized in the current image. It is therefore here causal neighborhoods that are used since the non-causal part of the neighborhood in the current image comprises only noise pixels and is not interesting for the comparison.

Figure 2 shows such causal neighborhoods. For the first pixels, first lines and first and last columns, the output image is periodised so the pixels taken into account are on the other side of the image as shown for the first corner pixel (x) and its neighborhood located at the four corners of the image.

Multi-resolution approach

The main problem raised by the exhaustive approach remains the computation time necessary to synthesize images of reasonable size. This computing time is correlated to the size of the neighborhood, this multi-resolution approach will improve performance. The main idea introduced in [1] is to use images of lower resolutions so that 5x5 or 3x3 neighborhoods extend on the texture like 15x15 neighborhoods in single resolution. For that, we start by creating pyramids, one for the patch and one for the synthesized image using a sub-sampler filter, as shown in Figure 3.

The algorithm then synthesizes the pyramid of the current image, from the lowest resolution to the highest resolution, as follows: • The lowest resolution image is synthesized in the same way as in the case of simple technical resolution.

• The other images are synthesized in the same way, except that the neighborhoods do not only contain the pixels of the current resolution, but also pixels of the neighborhood of the pixel corresponding to the current at the lower resolution.

• The last image is the output image synthesized from the patch and lower resolution images.

Figure 4 shows a multi-resolution neighborhood. This neighborhood contains the pixels of the causal neighborhood of the current resolution of level n, represented in dark in the diagram on the left, plus the pixels contained in the non-causal neighborhood of the higher resolution of level n + 1, pixels represented in dark plus the parent in the center shown in more light, in the diagram on the right. In this example, the neighborhood contains 12 + 9 = 21 pixels. Figure 5 shows the order of the multi-resolution synthesis. The upper image, level 2, corresponds to the synthesis of the first level, causal neighborhood. The lower images, level 1 and level 0, correspond to the synthesis of the second level, causal neighborhood.

Quality metric: SSIM

Since the object of the invention is to synthesize an image via texture patches for the purpose of image compression, it is of course necessary to estimate the quality of reproduction of the parts of synthesized images in comparison with the source image. (encoder side). These synthesis-based reconstruction techniques tend to implicitly generate a reconstructed signal that moves away from the original signal in terms of classical sse (sum of squared differences) distortion, but on the other hand offer a visual rendering that can be quite acceptable; it is here that we come up against quality metrics. At the moment there is a lot of work on the subject, however we will focus on a slightly more psycho-visual measurement called Structural Similarity (SSIM) described for example in the document Z.Wang, L Lu, AC Bovik, "Video quality assessment based on structural distortion" Signal processing image communication vol 19 n ° 2, pp 121-132, feb 2004.

This measure is composed of three terms and makes it possible to estimate disparities. The formulation of the SSIM is as follows:

ssi _{M (s, r)} = pμ ^, μ _e ⁺ qχto ^{, +} Q ⁾ ₍₅₎

(μ ² + μ ² + C ₁ ) (C * + σ _c ² + C ₂ )

with:

• μ _s : average of the luminance of the source pixels, • σ _s : vahance of the source pixels,

• μ _c : average of the luminance of the synthesized pixels,

• σ _c : vahance of the reconstructed pixels,

• σ _sc : covariance of the source and synthesized pixels,

• c _ι = (k _ι L) ² , c ₂ = (k ₂ L) ² : two variables intended to stabilize the division when the denominator is very weak.

• L is the dynamics of the values of the pixels, so here 256 for the colors coded on δbits,

• / C ₁ = 0.01 and / C ₂ = 0.03 by default.

The SSIM is applied by 8x8 block in the image, relative to each pixel of the image.

One of the aims of the invention is to overcome the aforementioned drawbacks. It relates to an image decoding method using a technique of image synthesis and image regions using a synthesis algorithm that operates on a set of patches, this operation being done by via a low resolution image, characterized in that it comprises the following steps:

decoding of the patches as well as the low resolution image, the patches possibly coming from previously decoded or decodable images independently of the images themselves,

reconstruction of regions according to a synthesis algorithm using these patches and this low resolution image as supports,

decoding conventionally, for the regions not coded by synthesis, the decoded regions replacing the one already possibly reconstructed in the synthesized image.

According to one particular implementation, the synthesis technique is of the pyramidal type.

According to a particular implementation, the low resolution image is in a form of spatial scalability type so that the synthesis algorithm is punctually guided to pyramid levels other than the lower resolution level.

According to one particular implementation, the synthesis algorithm operates on an RGB image signal, a YUV image signal or a luminance signal Y alone, the U and V signals being subjected to the same processing as the applied luminance processing.

The subject of the invention is also an image compression method using a technique for image synthesis and image regions using a synthesis algorithm that operates on a set of patches, this operation being carried out via a low resolution image, characterized in that it comprises the following steps:

deciding or non-coding the regions of the synthesized image in comparison with the rendering with the source image, according to a quality metric,

for the synthesized regions with coding decision, coding of the patches as well as the low-resolution image in a conventional way,

for regions synthesized with a non-coding decision, coding these regions according to a conventional coding scheme, According to one particular implementation, the synthesis technique is of the pyramidal type.

According to one particular embodiment, the synthesis algorithm operates on an RGB image signal, a YUV image signal or a luminance signal Y alone, the U and V signals being subjected to the same processing as the applied luminance processing.

According to a particular implementation, the quality metric is the SSIM (Structural SIMilarity).

The invention makes it possible to improve the synthesis of images and image regions by using a synthesis algorithm that operates on a set of patches, this operation being done via a low resolution image. The target application is video compression, a quality metric intervenes to classically code the areas of the poorly reconstructed image or leave the areas in question. A first advantage of the invention is thus to allow an acceptable visual rendering (based on quality metrics) of reconstructed image regions via a synthesis algorithm, this synthesis being guided to the encoder and decoder by a transmitted image of low resolution, in order to in the end, to reduce the bit rate with a given visual quality, and vice versa. It should be noted that this technique does not require a segmentation map as such to be transmitted to the decoder, the synthesis algorithm naturally operating the distribution of the information contained in the different patches via the guidance image. . In addition, the rendering imperfections by the synthesis technique are corrected by conventional coding which areas of imperfection are detected by a quality metric, this metric may be ssim. A second advantage of the invention is the scalability of the representation, which makes it possible to decode the signal at a chosen resolution. Another advantage is the possibility of coding the low resolution image according to an existing coding technique, for example H.264, thus ensuring backward compatibility with these coding techniques.

Guided synthesis

The idea is to transmit to the hierarchical synthesis algorithm the subsampled version of the reference image that will serve as a guide for the synthesis of the lowest resolution of the pyramid. The synthesis of this low resolution image will be done with a non-causal neighborhood. We choose, for example, the exhaustive approach of L. Y. Wei and M. Levoy, which consists in comparing this neighborhood with all those in the patch to determine the best candidate. The various steps of the method, illustrated in FIG. 6, which represents a block diagram of the guided synthesis, are then as follows:

1) The algorithm sub-samples the reference image as many times as there are stages in the Gaussian pyramid used in the multi-resolution algorithm. 2) This low resolution image is then copied as initialization of the synthesized image, replacing the proposed white initialization noise in the approach of LY Wei and M. Levoy. 3) Several patches corresponding to the different textured parts of the image are provided to the algorithm. 4) The low resolution image is then synthesized with a (non-causal) square neighborhood: the non-causal part of the neighborhood computed on the image being constructed then rests on the subsampled reference image. The exhaustive algorithm then tests all the neighborhoods of all the patches provided. The non-causal part of the current neighborhood will then guide the synthesis to the patch that has the characteristics closest to the current part of the subsampled image. 5) The algorithm keeps in memory which patch comes from each synthesized pixel.

6) For the higher levels, the synthesis technique remains unchanged, seeking only in the patch memorized at the previous resolution, this in order to accelerate the synthesis., Neamoins in one of the variants of the method the algorithm of synthesis can punctually guided / Contained at pyramid levels other than the lower resolution level.

As an example, to illustrate this type of synthesis, an image from a football match. This reference image is shown in Figure 7. Note that this image has two areas where synthesis could be a good way to keep the high frequencies generally sacrificed in conventional coding algorithms: the lawn and the public. We therefore choose to transmit to the algorithm 3 input images, represented in FIG. 8, the twice subsampled version, a public sample and a lawn sample.

The synthesized image of dimensions 768x512, represented in FIG. 9, is obtained by this algorithm with the following characteristics: • Voisinages of current resolution: 5 × 5 pixels

• Neighbors of resolution n + 1: 3x3 pixels

• Number of levels in the pyramid: 3

Associated metric In order to measure whether the texture synthesis is relevant to the regions of the image produced, a quality metric is used that can reveal the rendering of the structure.

Using the previous example and a possible metric, the SSIM, we obtain a mapping of the SSIM as represented in Figure 10. Several decision modes are applicable:

- use of a threshold, applied on the metric to distinguish the elements of the image to be encoded or not encoded - putting into competition the measurement obtained and that obtained with the "classical" coding modes

Figure 11 shows the general block diagram of the coding method.

The applications concerned are those related to video compression. More specifically, very low and low speed applications (ex HD for mobile) as well as super resolution (HD and +).

Claims

1. An image decoding method using a technique for synthesizing images and image regions using a synthesis algorithm that operates on a set of patches, this operation being performed via a low resolution image , characterized in that it comprises the following steps:

2. Method according to claim 1, characterized in that the synthesis technique is pyramidal type.

3. Method according to claim 2, characterized in that the low resolution image is in a form of spatial scalability type so that the synthesis algorithm is punctually guided to levels of the pyramid other than the level of lower resolution.

4. Method according to claim 1, characterized in that the synthesis algorithm operates on an RGB image signal, a YUV image signal or a luminance signal Y alone, the U and V signals undergoing the same treatment as the applied processing. to luminance.

5. Image compression method using a technique of image synthesis and image regions using a synthesis algorithm that operates on a set of patches, this operation being done via a low resolution image , characterized in that it comprises the following steps: deciding or non-coding the regions of the synthesized image in comparison with the rendering with the source image, according to a quality metric,

for regions synthesized with a non-coding decision, coding these regions according to a conventional coding scheme,

6. Method according to claim 5, characterized in that the synthesis technique is of the pyramidal type.

7. Method according to claim 6, characterized in that the low resolution image is in a form of spatial scalability type so that the synthesis algorithm is punctually guided to levels of the pyramid other than the level of lower resolution.

8. Method according to claim 5, characterized in that the synthesis algorithm operates on an RGB image signal, a YUV image signal or a luminance signal Y alone, the U and V signals undergoing the same treatment as the applied processing. to luminance.

9. Method according to claim 5, characterized in that the quality metric is the SSIM (Structural SIMilarity).